OSV data model compatibility #3

frasertweedale · 2023-05-16T10:08:56Z

Ensure that our advisory data model is compatible with the OSV schema.

We can implement export / conversion later. For now, let's just make sure that our data type can be correctly converted to the OSV schema.

OSV schema: https://ossf.github.io/osv-schema/

TristanCacqueray · 2023-05-31T20:43:56Z

Using yajsv along with the schema.json, we can validate the data model, for example, this data:

id: HSEC-0000-0000
modified: 2021-01-31T00:00:00Z

... is valid according to:

$ yajsv -s schema.json test.yaml
test.yaml: pass

Here is an example of how we can fit the example advisory for this data model:

id: HSEC-0000-0000
modified: 2021-01-31T00:00:00Z
aliases: ["CVE-XXX"]
affected:
   - package:
       ecosystem: hackage
       name: package-name
       purl: https://example.com
     severity:
       - type: CVSS_V3
         score: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
     ranges:
       - type: SEMVER
         events:
           - introduced: "1.1.0"
           - fixed: "1.2.0.5"

akacase · 2023-05-31T20:43:59Z

I'll review this tonight and tomorrow; feel free to assist, as I might overlook something @mihaimaruseac. I'll comment here on issues or if I don't find any.

TristanCacqueray · 2023-05-31T21:02:48Z

Here is another example from the rustsec/advisory-db:

toml: https://github.com/rustsec/advisory-db/blob/main/crates/enumflags2/RUSTSEC-2023-0035.md
osv: https://github.com/rustsec/advisory-db/blob/osv/crates/RUSTSEC-2023-0035.json

mihaimaruseac · 2023-05-31T21:06:55Z

Oh, I verified this, was planning to report on it on today's meeting but I guess that got cancelled?

I think we should be good here.

TristanCacqueray · 2023-05-31T21:18:13Z

@mihaimaruseac we had a chat about the affected version string, which might be complicated to convert into the OSV ranges. For example:

Bug introduced in 1.8 (stable release)
Affect 2.0 and 2.1
Bug discovered and fixed in 1.9 and 2.2

Should we parse (>= 1.8 && < 1.9) || (>= 2.0 && < 2.2)? Or should we change the schema to something closer to the desired OSV range, e.g.:

affects:
  - introduced: 1.8
    fixed: 1.9
  - introduced: 2.0
    fixed: 2.2

frasertweedale · 2023-05-31T23:22:42Z

Should we parse (>= 1.8 && < 1.9) || (>= 2.0 && < 2.2)? Or should we change the schema to something closer to the desired OSV range, e.g.:
affects:
  - introduced: 1.8
    fixed: 1.9
  - introduced: 2.0
    fixed: 2.2

I prefer the second approach. There will be some redundancy in some cases. For example, when a bug is introduced in a common ancestor and fixed in multiple release branches:

affects:
  - introduced: "1.6"
    fixed: "1.6.4"
  - introduced: "1.6"
    fixed: "1.7.1"

The same could be represented in "cabal constraints" form as: >= 1.6 && (< 1.6.4 || < 1.7.1). But then we have to parse that, and calculate the resulting affects array, applying the distributive property, etc etc. Better to just avoid it, for what (I think) will be a less common scenario.

One complication: OSV spec says:

Ranges listed with type SEMVER should not overlap: since SEMVER is a strict linear ordering, it is always possible to simplify to non-overlapping ranges.

I do not know what the consequences of violating this restriction are. I don't see any reason why it would be required in practice, and (as we see) there are plausible scenarios where an overlap would be the most natural way of expressing the affected range. Nevertheless, we should probably follow suit and implement the same restriction. That means that the above scenario would properly be expressed as:

affects:
  - introduced: "1.6"
    fixed: "1.6.4"
  - introduced: "1.7"
    fixed: "1.7.1"

For vulnerabilities that are not (yet) fixed in the older branch, the OSV "limit" event closes the range. So in the above scenario, if it were fixed in 1.7.1 but not backported to the 1.6 release branch, we could say:

affects:
  - introduced: "1.6"
    limit: "1.7"
  - introduced: "1.7"
    fixed: "1.7.1"

Regarding projects living in Git, OSV schema defines a GIT range type. So there is no blocker on that end. If/when we decide to accept vulnerabilities for such projects, we can update our data model at that time.

mihaimaruseac · 2023-05-31T23:37:47Z

I am also +1 on the second approach.

akacase · 2023-05-31T23:38:45Z

I agree with @frasertweedale; this seems more straight-forward to implement.

frasertweedale · 2023-06-27T07:24:10Z

I think we're nearly there and I have a WIP for an OSV export command (PR in the next day or so).

Separately, I have engaged with osv.dev about getting them to ingest our data, and what
export format works best for them. It seems that the best approach is to export our OSV data
to a dedicated branch in our git repo. I will work on the CI for that in the next week or so.

oliverchang · 2023-06-27T08:25:04Z

Hey there! Chiming in here from the OSV side. Re "limit" in the OSV schema, this is essentially the same as "fixed" for linear version ranges. It was introduced primarily to handle git ranges (see the git example).

One more comment on the last example in #3 (comment). OSV uses a flat event approach rather than individual pair ranges, so you don't actually need to close the range for the branch that doesn't have a fixed range.

i.e. in OSV we could encode the final example like:

events:
- introduced: 1.6
- introduced: 1.7
- fixed: 1.7.4

Without needing to close the 1.6.x range, because these "events" describe a version timeline that "colors" the timeline with what's affected and what's not affected. You could also drop the "introduced: 1.7" part completely and have the same meaning algorithmically, but it may be helpful to keep that for informational purposes.

This way, ranges also can't ever technically overlap according to our evaluation algorithm as well. This part of the spec talking about overlapping ranges is outdated and we need to remove that.

frasertweedale · 2023-06-27T08:36:42Z

@oliverchang what if the issue was introduced in v1.6.5 and v1.7.1, but v1.7 was not affected, and the issue was not yet fixed in the 1.6.x series. In this case...

events:
- introduced: 1.6.5
- introduced: 1.7.1
- fixed: 1.7.4

... does not seem to be correct. It seems necessary to explicitly close the "first" range so that v1.7 is not included.

Unless I am missing something, limit seems necessary in more cases than just the Git scenario.

frasertweedale · 2023-06-27T08:44:21Z

@oliverchang follow-up question: does the OSV system handle multiple ranges (of the same type), e.g.:

"ranges": [ {
    "type": "ECOSYSTEM",
    "events": [ { "introduced": "1.6.5" }, { "limit": "1.7" } ]
  },
  {
    "type": "ECOSYSTEM"
    "events": [ { "introduced": "1.7.1" }, { "fixed": "1.7.4" } ]
} ]

Or is it required to "flatten" them into a single array of events? (The osv-schema certainly admits the above structure).

oliverchang · 2023-06-27T08:45:54Z

That case would have to be expressed as:

events:
- introduced: 1.6.5
- fixed: 1.7
- introduced: 1.7.1
- fixed: 1.7.4

My comment around not requiring limit is more that it's equivalent to fixed for linear version ranges. It's only for more complex version trees (i.e. git) where it makes a difference, and we typically recommend just sticking with "fixed". See the examples for "limit" here.

oliverchang · 2023-06-27T08:47:58Z

@oliverchang follow-up question: does the OSV system handle multiple ranges (of the same type), e.g.:
"ranges": [ {
    "type": "ECOSYSTEM",
    "events": [ { "introduced": "1.6.5" }, { "limit": "1.7" } ]
  },
  {
    "type": "ECOSYSTEM"
    "events": [ { "introduced": "1.7.1" }, { "fixed": "1.7.4" } ]
} ]
Or is it required to "flatten" them into a single array of events? (The osv-schema certainly admits the above structure).

Flattening is certainly preferred (but not required) as it results in a more concise entry, and completely avoids potentially overlapping ranges. It can also simplify the encoding for this example:

affects:
  - introduced: "1.6"
    limit: "1.7"
  - introduced: "1.7"
    fixed: "1.7.1"

Because you can just omit the limit: "1.7" and still have it work as intended.

We actually started the OSV schema with something similar to what's being described here (separate range pairs), but ended up simplifying this to a flat array of events as that fixed a bunch of gotchas and problems while still being just as expressive.

frasertweedale · 2023-06-27T08:50:17Z

@oliverchang OK, yes I see now that operationally "limit" and "fixed" are equivalent for this scenario. Semantically (linguistically?) it is quite confusing! I'll consider if we should update our "native" schema, as well as how we express the ranges in OSV.

oliverchang · 2023-06-27T08:59:56Z

@oliverchang OK, yes I see now that operationally "limit" and "fixed" are equivalent for this scenario. Semantically (linguistically?) it is quite confusing! I'll consider if we should update our "native" schema, as well as how we express the ranges in OSV.

Indeed, we'll look at how we can clarify the difference in the spec.

Also one more note re flattening as a follow up to my last comment: It should be a very trivial transformation, because you can just concatenate the individual ECOSYSTEM ranges into a flat list and still have it mean the same thing (assuming the ranges don't overlap).

This reverts commit 6b95156. Per discussion at [1], the OSV project prefers to (ab?)use the "fixed" event type to limit a range, even when there is no fix. Revert the commit. We will add better documentation in a subsequent commit. [1]: haskell#3 (comment)

Add the `Security.Advisories.Convert.OSV` module, which defines the conversion from our `Advisory` data type to the OSV `Model`. Currently, no database-specific or ecosystem-specific fields are set. Whether or how to use those fields is a matter for future discussion. Re-export the *aeson* encode and decode functions from `Security.OSV`, for convenience. Add the `osv` subcommand to `hsec-tools`. It works in the same way as `check`, but emits the encoded OSV JSON data. Later commits will add the CI workflows to generate and publish the OSV data. Fixes: haskell#3

Add the `Security.Advisories.Convert.OSV` module, which defines the conversion from our `Advisory` data type to the OSV `Model`. Currently, no database-specific or ecosystem-specific fields are set. Whether or how to use those fields is a matter for future discussion. Add the `osv` subcommand to `hsec-tools`. It works in the same way as `check`, but emits the encoded OSV JSON data. Later commits will add the CI workflows to generate and publish the OSV data. Fixes: haskell#3

This reverts commit 6b95156. Per discussion at [1], the OSV project prefers to (ab?)use the "fixed" event type to limit a range, even when there is no fix. Revert the commit. We will add better documentation in a subsequent commit. [1]: #3 (comment)

This reverts commit 6b95156. Per discussion at [1], the OSV project prefers to (ab?)use the "fixed" event type to limit a range, even when there is no fix. Revert the commit. We will add better documentation in a subsequent commit. [1]: haskell#3 (comment)

Add the `Security.Advisories.Convert.OSV` module, which defines the conversion from our `Advisory` data type to the OSV `Model`. Currently, no database-specific or ecosystem-specific fields are set. Whether or how to use those fields is a matter for future discussion. Add the `osv` subcommand to `hsec-tools`. It works in the same way as `check`, but emits the encoded OSV JSON data. Later commits will add the CI workflows to generate and publish the OSV data. Fixes: haskell#3

mihaimaruseac self-assigned this May 17, 2023

TristanCacqueray mentioned this issue Jun 7, 2023

Update affected version format #29

Merged

1 task

oliverchang mentioned this issue Jun 27, 2023

Difference between "fixed" and "limit" is confusing ossf/osv-schema#169

Open

frasertweedale mentioned this issue Jun 27, 2023

implement OSV conversion #73

Merged

1 task

mihaimaruseac closed this as completed in 795630c Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSV data model compatibility #3

OSV data model compatibility #3

frasertweedale commented May 16, 2023

TristanCacqueray commented May 31, 2023

akacase commented May 31, 2023

TristanCacqueray commented May 31, 2023

mihaimaruseac commented May 31, 2023

TristanCacqueray commented May 31, 2023

frasertweedale commented May 31, 2023

mihaimaruseac commented May 31, 2023

akacase commented May 31, 2023

frasertweedale commented Jun 27, 2023 •

edited

Loading

oliverchang commented Jun 27, 2023

frasertweedale commented Jun 27, 2023 •

edited

Loading

frasertweedale commented Jun 27, 2023

oliverchang commented Jun 27, 2023

oliverchang commented Jun 27, 2023 •

edited

Loading

frasertweedale commented Jun 27, 2023

oliverchang commented Jun 27, 2023

OSV data model compatibility #3

OSV data model compatibility #3

Comments

frasertweedale commented May 16, 2023

TristanCacqueray commented May 31, 2023

akacase commented May 31, 2023

TristanCacqueray commented May 31, 2023

mihaimaruseac commented May 31, 2023

TristanCacqueray commented May 31, 2023

frasertweedale commented May 31, 2023

mihaimaruseac commented May 31, 2023

akacase commented May 31, 2023

frasertweedale commented Jun 27, 2023 • edited Loading

oliverchang commented Jun 27, 2023

frasertweedale commented Jun 27, 2023 • edited Loading

frasertweedale commented Jun 27, 2023

oliverchang commented Jun 27, 2023

oliverchang commented Jun 27, 2023 • edited Loading

frasertweedale commented Jun 27, 2023

oliverchang commented Jun 27, 2023

frasertweedale commented Jun 27, 2023 •

edited

Loading

frasertweedale commented Jun 27, 2023 •

edited

Loading

oliverchang commented Jun 27, 2023 •

edited

Loading