-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of cherry picks in the specification #216
Comments
Hi! Thanks for the interest and question. The spec makes no mention of cherrypicks, because in the context of the evaluation algorithm, all that matters are the events that are there in the data source. Adding cherrypicks explicitly into the algorithm is also difficult, because it's hard if not impossible to guarantee that all cherrypicks can be found algorithmically. Re the OSV.dev implementation -- OSV.dev does have a best effort cherry pick detection as an additional validation /augmentation step separate from the evaluation. That is, OSV.dev infrastructrure will perform best effect detection of cherrypicks, and add them to the source OSV advisory. There are other similar things that OSV.dev performs here to augment the source OSV entry. Then, evaluation happens after this, as per the OSV evaluation rules. Does this answer your question? |
Hello, and thanks a lot for your quick answer. It is unfortunate that, due to cherry picks (and maybe other features?), the OSV.dev implementation cannot be fully compared with the specification. As far as I understand, when API users receive an answer about whether a commit is vulnerable or not, they don't know if cherry picks were involved in that answer. So there is currently no way of detecting if the answer is "wrong" due to a bug or if it is "good" due to OSV.dev going the extra mail of taking cherry picks into account. Maybe that's just the way it could be. Or maybe we can make things a bit better on this front by including in API answers whether cherry picks were involved in the decision (arguably this should be discussed in an issue related to the OSV.dev implementation, not here). Still, I'd like to offer a counter-argument to this:
The specification could assume there exists a general notion of "equivalent commits" (equivalence classes if you want), that could be implemented in practice by cherry pick detection or any other means. And then, based on that, specify how equivalent commits should be handled in the evaluation algorithm. This way you can decouple how one does cherry pick detection from how they are used by the evaluation. Is this something you could consider? (Yes, I realize this would not completely solve the problem of the OSV.dev/specification divergence, but it would reduce the uncertainty margin for any specification implementer.) |
Aside from what to say about cherry picks in the spec or not, I have a followup question about the augmentation mentioned above: does it result in a combinatorial explosion of ranges to be considered? Consider this case:
In terms of commit ranges to be traversed for evaluation, does one need to consider all the 4 combinations: If commit equivalence ends up being discussed in the spec, this is one of the tricky cases that would need to be clarified. |
Thanks for pointing this out. I believe we can potentially resolve this by adding flags/settings to get our implementation to fully conform. Would you also be able to help us understand the motivation behind this comparison you're looking into so we can better understand how to make this fit your needs?
Our goal with the OSV spec is to provide an unambiguous evaluation algorithm, given a fixed set of inputs (i.e. events). The fact that there is no definition for "equivalent" commits would complicate this algorithm. I absolutely agree cherrypick detection is crucial for git and something that needs to be solved for producers of OSV entries. Perhaps this would be a better fit for a documented "guide" of sorts for DB publishers where we talk about this issue and hte importance of including all cherrypicked/equivalent commits in the data? |
> It is unfortunate that, due to cherry picks (and maybe other features?), the OSV.dev implementation cannot be fully compared with the specification.
Thanks for pointing this out. I believe we can potentially resolve this by adding flags/settings to get our implementation to fully conform. Would you also be able to help us understand the motivation behind this comparison you're looking into so we can better understand how to make this fit your needs?
Good point about flags/settings. As with any good spec, having a "reference implementation" would indeed be great. If the OSV.dev implementation can be turned, either by default or with optional flags, into such a reference implementation, awesome!
Regarding my motivation for this. I'm co-founder and tech lead of the Software Heritage project; we are considering applying the OSV.dev data and evaluation logic to our archive. Our data model is Git-compatible, but not Git itself (because the Software Heritage archive is VCS-agnostic). Hence we can't just run the OSV.dev evaluation code on the archive. We are considering providing a separate open source implementation of the OSV spec that works on our data model, and would like to use the OSV.dev data and code base as a comparison point. Uncertainty/under-specification of how to handle cherry picks is problematic in this respect.
> The specification could assume there exists a general notion of "equivalent commits" (equivalence classes if you want),
Our goal with the OSV spec is to provide an unambiguous evaluation algorithm, given a fixed set of inputs (i.e. events). The fact that there is no definition for "equivalent" commits would complicate this algorithm.
I absolutely agree cherrypick detection is crucial for git and something that needs to be solved for producers of OSV entries. Perhaps this would be a better fit for a documented "guide" of sorts for DB publishers where we talk about this issue and hte importance of including all cherrypicked/equivalent commits in the data?
That would be welcome, yes. (But then, if the guide becomes a set of SHOULD recommendations, they could also go in the spec, no? ;-) A clear MUST/SHOULD split in requirements is pretty common in specs.)
Happy to review something such or contribute in other ways you consider useful.
|
Thank you for the background!
We already have text in the spec to prefer Would you mind contributing a PR for that? |
Rationale: reduce the amount of false negatives in commits belonging to branches forked in-between `introduced` and `fixed`/`limit`. Closes ossf#216.
Rationale: reduce the amount of false negatives in commits belonging to branches forked in-between `introduced` and `fixed`/`limit`. Closes ossf#216. Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
Rationale: reduce the amount of false negatives in commits belonging to branches forked in-between `introduced` and `fixed`/`limit`. Closes ossf#216. Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
Rationale: reduce the amount of false negatives in commits belonging to branches forked in-between `introduced` and `fixed`/`limit`. Closes ossf#216. Signed-off-by: Stefano Zacchiroli <zack@upsilon.cc>
Currently, the specification (I'm looking at version 1.6.1, at the time of writing) does not mention how to handle cherry pick commits at all. However, as I understand it, the osv.dev implementation of the evaluation logic does handle them, to some extent.
It would be nice to specify properly how cherry picks should be handled, so that independent implementations of the evaluation logic can be created, compared, etc.
I have a few specific examples of cherry picks cases that the semantics non-trivial, which I can post here.
But before doing so I'd like to understand if there is an interest in making the OSV format specification evolve to describe how cherry picks should be handled.
The text was updated successfully, but these errors were encountered: