Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSV data model compatibility #3

Closed
frasertweedale opened this issue May 16, 2023 · 16 comments
Closed

OSV data model compatibility #3

frasertweedale opened this issue May 16, 2023 · 16 comments
Assignees

Comments

@frasertweedale
Copy link
Collaborator

Ensure that our advisory data model is compatible with the OSV schema.

We can implement export / conversion later. For now, let's just make sure that our data type can be correctly converted to the OSV schema.

OSV schema: https://ossf.github.io/osv-schema/

@mihaimaruseac mihaimaruseac self-assigned this May 17, 2023
@TristanCacqueray
Copy link
Collaborator

Using yajsv along with the schema.json, we can validate the data model, for example, this data:

id: HSEC-0000-0000
modified: 2021-01-31T00:00:00Z

... is valid according to:

$ yajsv -s schema.json test.yaml
test.yaml: pass

Here is an example of how we can fit the example advisory for this data model:

id: HSEC-0000-0000
modified: 2021-01-31T00:00:00Z
aliases: ["CVE-XXX"]
affected:
   - package:
       ecosystem: hackage
       name: package-name
       purl: https://example.com
     severity:
       - type: CVSS_V3
         score: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
     ranges:
       - type: SEMVER
         events:
           - introduced: "1.1.0"
           - fixed: "1.2.0.5"

@akacase
Copy link
Collaborator

akacase commented May 31, 2023

I'll review this tonight and tomorrow; feel free to assist, as I might overlook something @mihaimaruseac. I'll comment here on issues or if I don't find any.

@TristanCacqueray
Copy link
Collaborator

@mihaimaruseac
Copy link
Collaborator

Oh, I verified this, was planning to report on it on today's meeting but I guess that got cancelled?

I think we should be good here.

@TristanCacqueray
Copy link
Collaborator

@mihaimaruseac we had a chat about the affected version string, which might be complicated to convert into the OSV ranges. For example:

  • Bug introduced in 1.8 (stable release)
  • Affect 2.0 and 2.1
  • Bug discovered and fixed in 1.9 and 2.2

Should we parse (>= 1.8 && < 1.9) || (>= 2.0 && < 2.2)? Or should we change the schema to something closer to the desired OSV range, e.g.:

affects:
  - introduced: 1.8
    fixed: 1.9
  - introduced: 2.0
    fixed: 2.2

@frasertweedale
Copy link
Collaborator Author

Should we parse (>= 1.8 && < 1.9) || (>= 2.0 && < 2.2)? Or should we change the schema to something closer to the desired OSV range, e.g.:

affects:
  - introduced: 1.8
    fixed: 1.9
  - introduced: 2.0
    fixed: 2.2

I prefer the second approach. There will be some redundancy in some cases. For example, when a bug is introduced in a common ancestor and fixed in multiple release branches:

affects:
  - introduced: "1.6"
    fixed: "1.6.4"
  - introduced: "1.6"
    fixed: "1.7.1"

The same could be represented in "cabal constraints" form as: >= 1.6 && (< 1.6.4 || < 1.7.1). But then we have to parse that, and calculate the resulting affects array, applying the distributive property, etc etc. Better to just avoid it, for what (I think) will be a less common scenario.

One complication: OSV spec says:

Ranges listed with type SEMVER should not overlap: since SEMVER is a strict linear ordering, it is always possible to simplify to non-overlapping ranges.

I do not know what the consequences of violating this restriction are. I don't see any reason why it would be required in practice, and (as we see) there are plausible scenarios where an overlap would be the most natural way of expressing the affected range. Nevertheless, we should probably follow suit and implement the same restriction. That means that the above scenario would properly be expressed as:

affects:
  - introduced: "1.6"
    fixed: "1.6.4"
  - introduced: "1.7"
    fixed: "1.7.1"

For vulnerabilities that are not (yet) fixed in the older branch, the OSV "limit" event closes the range. So in the above scenario, if it were fixed in 1.7.1 but not backported to the 1.6 release branch, we could say:

affects:
  - introduced: "1.6"
    limit: "1.7"
  - introduced: "1.7"
    fixed: "1.7.1"

Regarding projects living in Git, OSV schema defines a GIT range type. So there is no blocker on that end. If/when we decide to accept vulnerabilities for such projects, we can update our data model at that time.

@mihaimaruseac
Copy link
Collaborator

I am also +1 on the second approach.

@akacase
Copy link
Collaborator

akacase commented May 31, 2023

I agree with @frasertweedale; this seems more straight-forward to implement.

@frasertweedale
Copy link
Collaborator Author

frasertweedale commented Jun 27, 2023

I think we're nearly there and I have a WIP for an OSV export command (PR in the next day or so).

Separately, I have engaged with osv.dev about getting them to ingest our data, and what
export format works best for them. It seems that the best approach is to export our OSV data
to a dedicated branch in our git repo. I will work on the CI for that in the next week or so.

@oliverchang
Copy link

Hey there! Chiming in here from the OSV side. Re "limit" in the OSV schema, this is essentially the same as "fixed" for linear version ranges. It was introduced primarily to handle git ranges (see the git example).

One more comment on the last example in #3 (comment). OSV uses a flat event approach rather than individual pair ranges, so you don't actually need to close the range for the branch that doesn't have a fixed range.

i.e. in OSV we could encode the final example like:

events:
- introduced: 1.6
- introduced: 1.7
- fixed: 1.7.4

Without needing to close the 1.6.x range, because these "events" describe a version timeline that "colors" the timeline with what's affected and what's not affected. You could also drop the "introduced: 1.7" part completely and have the same meaning algorithmically, but it may be helpful to keep that for informational purposes.

This way, ranges also can't ever technically overlap according to our evaluation algorithm as well. This part of the spec talking about overlapping ranges is outdated and we need to remove that.

@frasertweedale
Copy link
Collaborator Author

frasertweedale commented Jun 27, 2023

@oliverchang what if the issue was introduced in v1.6.5 and v1.7.1, but v1.7 was not affected, and the issue was not yet fixed in the 1.6.x series. In this case...

events:
- introduced: 1.6.5
- introduced: 1.7.1
- fixed: 1.7.4

... does not seem to be correct. It seems necessary to explicitly close the "first" range so that v1.7 is not included.

Unless I am missing something, limit seems necessary in more cases than just the Git scenario.

@frasertweedale
Copy link
Collaborator Author

@oliverchang follow-up question: does the OSV system handle multiple ranges (of the same type), e.g.:

"ranges": [ {
    "type": "ECOSYSTEM",
    "events": [ { "introduced": "1.6.5" }, { "limit": "1.7" } ]
  },
  {
    "type": "ECOSYSTEM"
    "events": [ { "introduced": "1.7.1" }, { "fixed": "1.7.4" } ]
} ]

Or is it required to "flatten" them into a single array of events? (The osv-schema certainly admits the above structure).

@oliverchang
Copy link

That case would have to be expressed as:

events:
- introduced: 1.6.5
- fixed: 1.7
- introduced: 1.7.1
- fixed: 1.7.4

My comment around not requiring limit is more that it's equivalent to fixed for linear version ranges. It's only for more complex version trees (i.e. git) where it makes a difference, and we typically recommend just sticking with "fixed". See the examples for "limit" here.

@oliverchang
Copy link

oliverchang commented Jun 27, 2023

@oliverchang follow-up question: does the OSV system handle multiple ranges (of the same type), e.g.:

"ranges": [ {
    "type": "ECOSYSTEM",
    "events": [ { "introduced": "1.6.5" }, { "limit": "1.7" } ]
  },
  {
    "type": "ECOSYSTEM"
    "events": [ { "introduced": "1.7.1" }, { "fixed": "1.7.4" } ]
} ]

Or is it required to "flatten" them into a single array of events? (The osv-schema certainly admits the above structure).

Flattening is certainly preferred (but not required) as it results in a more concise entry, and completely avoids potentially overlapping ranges. It can also simplify the encoding for this example:

affects:
  - introduced: "1.6"
    limit: "1.7"
  - introduced: "1.7"
    fixed: "1.7.1"

Because you can just omit the limit: "1.7" and still have it work as intended.

We actually started the OSV schema with something similar to what's being described here (separate range pairs), but ended up simplifying this to a flat array of events as that fixed a bunch of gotchas and problems while still being just as expressive.

@frasertweedale
Copy link
Collaborator Author

@oliverchang OK, yes I see now that operationally "limit" and "fixed" are equivalent for this scenario. Semantically (linguistically?) it is quite confusing! I'll consider if we should update our "native" schema, as well as how we express the ranges in OSV.

@oliverchang
Copy link

@oliverchang OK, yes I see now that operationally "limit" and "fixed" are equivalent for this scenario. Semantically (linguistically?) it is quite confusing! I'll consider if we should update our "native" schema, as well as how we express the ranges in OSV.

Indeed, we'll look at how we can clarify the difference in the spec.

Also one more note re flattening as a follow up to my last comment: It should be a very trivial transformation, because you can just concatenate the individual ECOSYSTEM ranges into a flat list and still have it mean the same thing (assuming the ranges don't overlap).

frasertweedale added a commit to frasertweedale/security-advisories that referenced this issue Jun 27, 2023
This reverts commit 6b95156.

Per discussion at [1], the OSV project prefers to (ab?)use the
"fixed" event type to limit a range, even when there is no fix.
Revert the commit.  We will add better documentation in a subsequent
commit.

[1]: haskell#3 (comment)
frasertweedale added a commit to frasertweedale/security-advisories that referenced this issue Jun 27, 2023
Add the `Security.Advisories.Convert.OSV` module, which defines the
conversion from our `Advisory` data type to the OSV `Model`.
Currently, no database-specific or ecosystem-specific fields are
set.  Whether or how to use those fields is a matter for future
discussion.

Re-export the *aeson* encode and decode functions from
`Security.OSV`, for convenience.

Add the `osv` subcommand to `hsec-tools`.  It works in the same way
as `check`, but emits the encoded OSV JSON data.

Later commits will add the CI workflows to generate and publish the
OSV data.

Fixes: haskell#3
frasertweedale added a commit to frasertweedale/security-advisories that referenced this issue Jun 28, 2023
Add the `Security.Advisories.Convert.OSV` module, which defines the
conversion from our `Advisory` data type to the OSV `Model`.
Currently, no database-specific or ecosystem-specific fields are
set.  Whether or how to use those fields is a matter for future
discussion.

Add the `osv` subcommand to `hsec-tools`.  It works in the same way
as `check`, but emits the encoded OSV JSON data.

Later commits will add the CI workflows to generate and publish the
OSV data.

Fixes: haskell#3
mihaimaruseac pushed a commit that referenced this issue Jun 28, 2023
This reverts commit 6b95156.

Per discussion at [1], the OSV project prefers to (ab?)use the
"fixed" event type to limit a range, even when there is no fix.
Revert the commit.  We will add better documentation in a subsequent
commit.

[1]: #3 (comment)
akacase pushed a commit to akacase/security-advisories that referenced this issue Jun 28, 2023
This reverts commit 6b95156.

Per discussion at [1], the OSV project prefers to (ab?)use the
"fixed" event type to limit a range, even when there is no fix.
Revert the commit.  We will add better documentation in a subsequent
commit.

[1]: haskell#3 (comment)
akacase pushed a commit to akacase/security-advisories that referenced this issue Jun 28, 2023
Add the `Security.Advisories.Convert.OSV` module, which defines the
conversion from our `Advisory` data type to the OSV `Model`.
Currently, no database-specific or ecosystem-specific fields are
set.  Whether or how to use those fields is a matter for future
discussion.

Add the `osv` subcommand to `hsec-tools`.  It works in the same way
as `check`, but emits the encoded OSV JSON data.

Later commits will add the CI workflows to generate and publish the
OSV data.

Fixes: haskell#3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants