-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from NVD json feeds to API #328
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #328 +/- ##
==========================================
+ Coverage 76.61% 77.44% +0.84%
==========================================
Files 51 52 +1
Lines 6372 6572 +200
==========================================
+ Hits 4881 5089 +208
+ Misses 1491 1483 -8
... and 6 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@J08nY Could you pls expose The URLs are in the settings: sec-certs/src/sec_certs/configuration.py Lines 72 to 80 in fc638a8
The Basically, now we just have to decide the URLs. Can you do that and change settings keys accordingly? |
Where do I get the CPE match feed? What processing do I need to do to obtain it? |
If you have processed dataset available, the json should sit in sec-certs/src/sec_certs/dataset/dataset.py Line 400 in d3d470e
You can either copy the contents of the method, or just create new dataset at some path and call the method right away. E.g., from sec_certs.dataset import CCDataset
cc_dset = CCDataset(root_dir="/whatever/path")
cpe_match_dict = cc_dset._prepare_cpe_match_dict()
with gzip.open("/path/to/store/cpe_match_dict.json", "w") as handle:
json_str = json.dumps(cpe_match_dict, indent=4)
handle.write(json_str.encode("utf-8")) To get the datasets from NVD, you need to obtain the NVD API key and set the following two keys in your yaml settings: nvd_api_key: <actual-api-key>
preferred_source_nvd_datasets: "api" |
Regarding import time optimization, this post has a nice summary of different approachis that you can use to adress this: https://adamj.eu/tech/2023/03/02/django-profile-and-improve-import-time/ I did some profiling. As of now: (venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.dataset'
python -c 'import sec_certs.dataset' 3.28s user 0.54s system 111% cpu 3.413 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.sample'
python -c 'import sec_certs.sample' 1.79s user 0.34s system 125% cpu 1.700 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.model'
python -c 'import sec_certs.model' 3.38s user 0.53s system 111% cpu 3.493 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.utils'
python -c 'import sec_certs.utils' 0.03s user 0.01s system 93% cpu 0.041 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs'
python -c 'import sec_certs' 0.03s user 0.01s system 93% cpu 0.043 total I deferred few imports, see: 88f4630 Profiling after: (venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.datas
et'
python -c 'import sec_certs.dataset' 1.48s user 0.28s system 131% cpu 1.343 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.sample'
python -c 'import sec_certs.sample' 1.47s user 0.29s system 131% cpu 1.336 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.model'
python -c 'import sec_certs.model' 1.50s user 0.29s system 131% cpu 1.365 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs.utils'
python -c 'import sec_certs.utils' 0.03s user 0.01s system 92% cpu 0.044 total
(venv) ~/phd/projects/certificates/sec-certs $ time python -c 'import sec_certs'
python -c 'import sec_certs' 0.03s user 0.01s system 93% cpu 0.041 total So, from 3.3 seconds we go to 1.5. Any further reduction would require:
I did the profiling with I consider this to be an OK result and I will invest no more effort into this unless you promote the issue. Edit: Also note that the imports called from functions should be called only once AFAIK. |
fd241ea
to
0c181f6
Compare
Closes #324
TODO
Start using cached CPEs again(makes no sense sinceCVEDataset
no longer works with them)Endpoints to use:
New tests
Notes: