-
Notifications
You must be signed in to change notification settings - Fork 276
feat: Begin populating new database fields #3708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The reputting is after merging right? |
another-rex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
I'm not sure how to review models_test.py though...
| bucket = gcs.get_osv_bucket() | ||
| pb_blob = bucket.get_blob(os.path.join(gcs.VULN_PB_PATH, vuln_id + '.pb')) | ||
| if pb_blob is None: | ||
| if osv.Vulnerability.get_by_id(vuln_id) is not None: | ||
| logging.error('vulnerability not in GCS - %s', vuln_id) | ||
| # TODO(michaelkedar): send pub/sub message to reimport | ||
| return | ||
| try: | ||
| vuln_proto = osv.vulnerability_pb2.Vulnerability.FromString( | ||
| pb_blob.download_as_bytes()) | ||
| except Exception: | ||
| logging.exception('failed to download %s protobuf from GCS', vuln_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can all of this be abstracted to a single get by ID function? I feel like this would be used in quite a few different places, and if we ever decide to put the .pb directly in datastore or another db we can just change the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the new gcs module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created functions in the gcs module for reading and writing
| # TODO: Raise exception | ||
| return | ||
| if alias_group is None: | ||
| modified = datetime.datetime.now(datetime.UTC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should modified be unconditionally now? If aliases doesn't change we probably don't want to update modified date right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the function doc does have that assumption that something was just deleted, but if it's easy / there's no additional side effects, we should just only update when there was an alias group in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the only way _update_vuln_with_group will be called in main with alias_group = None is if the vuln was in an alias group already.
osv/models.py
Outdated
|
|
||
|
|
||
| class AffectedVersions(ndb.Model): | ||
| """AffectedVersions entry.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, please expand this a bit more, like the ListedVulnerability model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added what it is to be used for to the docstring
| (vulnerability_pb2.Severity.Type.Name(sev.type), sev.score)) | ||
|
|
||
| search_indices = set() | ||
| search_indices.update(_tokenize(vulnerability.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Since we are doing this big reput anyway, we could just .lower() all of this and make our search case insensitive which we should have done.
But probably a good idea to not change behavior too much for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mostly lower()ed - _tokenize() does lower-case the tokens. The only thing not made lowercase is the repo URLs, but they probably are already lowercase anyway.
In #3708, writing both JSON and protos is proving to be fairly expensive. Just write the `.pb`s - we can make other jobs make the JSON files later.
This PR begins our database table migrations!
This adds a few new entities into Datastore:
Vulnerability: containing a small amount of information of a vulnerability, with the (overall) modified date, as well as some raw fields that are overwritten by our enrichment.ListedVulnerability: Minimal vulnerability information that is needed for our website's/listpageAffectedVersions: An index for API matching, which should help simplify our API logic and improve performance.Additionally, full vulnerabilities are stored in the GCS bucket, which will eventually become the new source of truth for the vulnerability records (deprecating the Bug entity)
To populate these I've added a new
_post_put_hookon the Bug entity, and added some logic to the alias & upstream cron jobs to update the entities/GCS also.I've only set this to run on the test instance (and when running tests with the emulator) because I don't want this to block releases on prod if it turns out this has significant performance impacts (especially with the blocking writing to the bucket in the alias & upstream crons).
Some other misc things to note:
.pband.jsonfiles to to bucket, because the proto files are a bit easier to work with in code, but the JSON is more usable externally. It's probably better to just write the.pband have the exporter create the.jsonfilesgoogle.cloud.storageoperations, and added that to the datastore emulator starter code