Skip to content

MeetingMinutes2021

Keshav Priyadarshi edited this page Feb 16, 2024 · 3 revisions

We meet online on Tuesday at 16:00 UTC as a reference. See https://www.timeanddate.com/worldclock/meeting.html to get the time in your timezone.

Join us at https://meet.jit.si/VulnerableCode

The current meeting notes is at:

Here are the running meeting notes:

Meeting on Saturday 2021-12-04 at 10:30 UTC

Agenda:

  • Improver review
  • Speeding up the importer-improver structure migration
  • GitHub externships

Participants:

  • Philippe (@pombredanne)
  • Hritik (@hritik14)

Improver review

Quick reference to review is available at https://gist.github.com/Hritik14/d02a2c24a50e0afcaa219cc4bf8abef9

Speeding up the importer-improver structure migration

Hritik:

With the framework for importers nearly ready in this structure we can move forward with rewriting other importers like nvd, ubuntu etc. We can also, in parallel, start development for OvalDataSource and GitDataSource

GitHub externships

Philippe:

Mail sent to github for inquiry. We'll try to participate as an org.

Meeting on Thursday 2021-10-14 at 12:45 UTC

Agenda:

  • Review improvers
  • Hacktoberfest twitter VulnerableCode logo
  • Univers
  • GSoC guest invite - Why do open source

Participants:

  • Philippe (@pombredanne)
  • Hritik (@hritik14)

Review improvers

Do we want OSV Design in the database as well ? How to solve nginx multiple branch problem No, we'll carry on with the current model and use the qualifiers in PackageURL to specify the branches, in that way a package in a different branch will essentially be a different package and be displayed accordingly as a fix for that branch type.

Univers

More detailed info at https://github.com/nexB/univers/blob/386eb32468c75ecac25ec872ea004b3257962946/VERSION-RANGE-SPEC.rst

GSoC guest invite - Why do open source

Invitation accepted: https://www.youtube.com/watch?v=VNL8eO6Phj8

Meeting on Tuesday 2021-10-05 at 07:00 UTC

Agenda:

  • Hacktoberfest
  • VulnerableCode
  • Univers

Participants:

  • Philippe (@pombredanne)
  • Hritik (@hritik14)

Hacktoberfest

Hritik and Tushar will be setting up the ground for hacktoberfest. Looking for good-first-issues is mostly the task.

VulnerableCode

Nothing significant has changed since the last call. Hritik will carry on with the importer-improver framework.

Univers

Philippe is refactoring the codebase. Commits yet to be pushed.

Meeting on Wednesday 2021-09-18 at 10:45 UTC

Agenda:

  • Preview RTD before merging (local RTD via docker)
  • Merging new importer-improver model

Participants:

  • Philippe (@pombredanne)
  • Hritik (@hritik14)

Merging new importer-improver model

Hritik is going on a 2 week holiday thus the new model needs to be merged ASAP. We decided to achieve a checkpoint by today and push the latest version for Philippe to improve upon for the time being.

Preview RTD before merging

Pinged Ayan to look into it

Meeting on Friday 2021-09-17 at 09:00 UTC

Agenda:

  • Review improvers
  • Advisory data structure
  • Support http proxies, ticket <https://github.com/nexB/vulnerablecode/issues/559>_
  • Preview RTD before merging (local RTD via docker)

Participants:

  • Hritik (@hritik14)
  • Philippe (@pombredanne)

Review improvers / Advisory data structure

Batch processing of advisories needs to be avoided. Philippe:

each Improver has a method to process a single Advisory model instance such as Improver.get_inferences(self, advisory): -> Inference or something like that.

the framework then iterates on an Improver-provided query set such as Improver.get_interesting_advisories(self): -> QuerySet that has the Advisories it is interested in.

in the framework, there is an atomic transaction that updates both the Advisory (e.g. date and later a log of improvements with select for update) and whatever is updated or create from the Inference

More: https://github.com/nexB/vulnerablecode/pull/525#issuecomment-921722348

Also, some basic refactoring was discussed.

Support http proxies

This is not our prime objective right now. We could deal with it after a brief thought. Let's take this up in next session as well.

Meeting on Thursday 2021-09-14 at 14:00 UTC

Agenda:

Participants:

  • Hritik (@hritik14)
  • Philippe (@pombredanne)

Nginx version notations

Hritik contacted Nginx over their mailing list to clarify their version notations like 1.21.0+. We got the following reply:

The 1.21.0+ notation means "1.21.0 and newer", or, more
formally, "1.21.0 and derived versions".  This includes all
future nginx versions on the mainline branch, and all future
stable branches (which aren't yet created).

More: https://mailman.nginx.org/pipermail/nginx/2021-September/061039.html

Further, according to Nginx <https://www.nginx.com/blog/nginx-1-16-1-17-released/>_

- Mainline is the active development branch where the latest features and bug fixes get added. It is denoted by an odd number in the second part of the version number, for example 1.17.
- Stable receives fixes for high‑severity bugs, but is not updated with new features. It is denoted by an even number in the second part of the version number, for example 1.16.0.

These information need to be accounted for in the nginx importer.

Review improvers - VULCOIDS

The function to check if there is an existing VULCOID for an advisory with an alloted CVE is missing. We need to implement that i.e. effectively move VULCOID to from vulnerability_id to old_vulnerability_id.

Naming: VersionSpecifer and VersionRange

The currently named VersionRange should be renamed to VersionConstraint and a VersionRange should be implemented to represent an actual range with an upper and lower bounds. The canonical string representation for this needs some discussion and an RFC needs to be drafted for the same. The current proposed representations are in the ticket <https://github.com/nexB/univers/issues/8>_.

Meeting on Tuesday 2021-09-07 at 14:00 UTC

Agenda:

  • Review improvers
  • Preview RTD before merging (local RTD via docker)

Review Improvers

Reviewed at: https://github.com/nexB/vulnerablecode/pull/525#pullrequestreview-745189344

Meeting on Thursday 2021-09-02 at 14:00 UTC

Agenda:

Review improvers

Partial review done, more to come next day

Consider adopting the OSV API as alternative output

Marked as low priority. Let's do it at some later stage.

drf-spectacular vs redoc

redoc licensing is a huge mess due to the webpack (for eg: some licenses are not mentioned or enforced properly, http://tartarus.org/~martin/PorterStemmer/js.txt, feross). A detailed review needs to be done and all licenses should be mentioned and enforced explicitly. We should create a ticket about all unknown/unenforced packges and propagate that to upstream. Ticket: https://github.com/nexB/vulnerablecode/issues/549

Decide on a uniform regular time for meetings

Phlippe is comfortable with - before 3pm CET on Tuesday

Meeting on Tuesday 2021-08-17 at 14:00 UTC

Agenda:

  • Decide on the structure of improvers

Participants:

  • @Hritik14
  • @pombredanne
  • @sbs2001

Decide on the structure of improvers

The importers would now directly insert the Advisories into the database in the following format. Creating database relationships and populating Vulnerability, Packages, PackageRelatedVulnerabiliy etc would be the job of a default improver.

class Advisory(models.Model):
    date_published = models.DateField()
    date_collected = models.DateField()
    source = models.CharField()
    improved_on = models.DateTimeField()
    # data would contain a data_source.Advisory
    data = JSONField()


@dataclasses.dataclass(order=True)
class AdvisoryData:
    summary: str
    vulnerability_id: Optional[str] = None
    fix_packages: List[AffectedPackage] = dataclasses.field(default_factory=list)
    affected_packages: List[AffectedPackage] = dataclasses.field(default_factory=list)
    references: List[Reference] = dataclasses.field(default_factory=list)


@dataclasses.dataclass(order=True, frozen=True)
class AffectedPackage:
    # this package MUST NOT have a version
    package: PackageURL
    # the version specifier contains the version scheme as is: semver:>=1,3,4
    version_specifier: VersionSpecifier

There would be two kinds of improvers - Default improver - Populates the tables and creates concrete relationships mentioned above - Inference generating improvers - These improvers take metrics like time travel and create new relationships with respective confidence scores.

The Advisory model above must never be written by an improver (as of now). It would be a true upstream data that is populated by the importers and used by the improvers to generate appropriate relationships.

Meeting on Tuesday 2021-08-10 at 14:00 UTC

Agenda:

  • Deployment
  • Importer restructure
  • Hand written migrations (not generated by makemigrations, eg: package)
  • Dumping import_yielder
  • Any update on nix tests?

Deployment

Let's take this next week

Importer restructure

PackageRelatedVulnerability - Have data source - could be a text field of one message per line or a JSON field with list of strings showing where it came from - if a new information comes, add that information and then change that confidence, if a new advisory comes which is similar to an improvement then overwrite the improvement confidence to max - add confidence here - Higher confidence overrides lower confidence

Properly comment the fields in a dataclass

Hand written migrations (not generated by makemigrations, eg: package)

Not hand written, find in from packageurl.contrib.django.models import PackageURLMixin

Dumping import_yielder

Any update on nix tests?

No

Meeting on Wednesday 2021-07-28 at 14:15 UTC

Participants:

  • @pombredanne
  • @Hritik14
  • @Divya

Agenda:

  • Refactor / Redesign importers to strictly import
  • Dump set_api from DataSource
    • Knowing all the existing versions is not necessary to collect Advisories
    • Strictly no inference
  • Revisit updated_advisories vs added_advisories
  • Consistent naming for fixed / vulnerable packages. No other package type should exist in a DataSource
  • Should a container be used instead of make postgres in scancodeIO ? - https://github.com/nexB/scancode.io/blob/4bebd7ae88ecaa00eb526bd831530c497903faf8/.github/workflows/ci.yml#L15-L28
  • Should Dockerfiles create virtualenvs as well ?
  • ScancodeIO uses FROM python:3.9 in Dockerflie - Docker upstream says: This is the defacto image. If you are unsure about what your needs are, you probably want to use this one. It is designed to be used both as a throw away container (mount your source code and start the container to start your app), as well as the base to build other images off of. - Docker upstream also recommends not to use -slim version unless there are space constraints - Docker images are reusable - Using a -slim requires to explicitly install build-essential, libpq-dev, git and in future svn - Most of the aforementioned packages come preinstalled in buster - Drastically increases Docker build time as now our image doesn't reuse already installed PYTHON image

Refactor / Redesign importers to strictly import

Dump set_api from DataSource: Yes. Separate them. We cannot not import a vulnerability when we don't have version info. We need advisories however we get. Clean organization of code in separate modules/functions should be there.

importer_yielder

Dump this. We can have data related to importer inside importers themselves. Problems with import_yielder: - Doesn't make sure importer class is actually a class - No type checking - This is a setting thing. It could appear as a Django setting (perhaps), which would enable/activate importers with fully qualified class names with module prefixes. Let's discuss this later in detail.

Revisit updated_advisories vs added_advisories

  • Do not return sets in both of the functions (TBD at later stage)
  • This should be handled by the framework before doing an update to the advisory. Something that could exist at the insertion time of importer (process_advisory).
  • Incremental update should be attribute of an specific importer which has that capability.
  • Importers have to return via one single function with the Advisory with some marks if they are new updated advisories. Importers with incremental supports will have different frequency of run, different entry points etc

Consistent naming for fixed / vulnerable packages

Inspired from https://tinyurl.com/vuln-json we'll use affects and fixed.

Should a container be used instead of make postgres in scancodeIO

Let's create an issue about control of versions (here postgres version) in both places. Then fix them at a later stage with similar fixes.

Should Dockerfiles create virtualenvs as well ?

Yes. Even ScancodeIO should do that. TODO: Create ticket : https://github.com/nexB/scancode.io/blob/4bebd7ae88ecaa00eb526bd831530c497903faf8/Dockerfile#L52

ScancodeIO uses FROM python:3.9 in Dockerflie

In CI tests: Be explicit- 3.8.11-buster In Docker: Let's support 3.8 A test for docker-build and run

Meeting on Tuesday 2021-07-27 at 13:00 UTC

Participants:

  • @pombredanne
  • @Hritik14

Minutes:

Next Meeting agenda Wednesday 2021-07-21 at 13:00 UTC

Meeting on Wednesday 2021-07-21 at 13:00 UTC

Participants:

  • @pombredanne
  • @Hritik14
  • @sbs2001
  • @Divya

The proposed agenda is:

security review status

  • All tickets are entered

Server / infrastructure

  • Delayed for now, will deploy on provisioned server. Philippe to work on it sometimes between this and next week
  • No more Django migration reset

SVN for git tags: which approach or dep using

  • using `svn --xml ls /tags/ should be enough

Zip file in testcases

These take a lot of space and cannot be diffed. Text files are better. See https://github.com/nexB/vulnerablecode/pull/393#discussion_r668006209 The code in rust and npm importers likely refactoring to use a different approach, but feel free to refactor.

Conclusion: tests and code refactoring may make the zip file usage moot.

Add API rate limiting and auth

Revert time traveling - how to go about it ?

Definition: finding the set of versions that existed for a package at a point in the past when the advisory was published.

  • Advisory without correct/actionable package/version info

    • goes into a review queue, or error log
    • especially if we are missing which version of a package the bug is fixed in
  • Advisory with well defined set of impacted and fixed packages

    • there no problem and we should import this alright
  • Advisory with only one of impacted or fixed packages, both well defined (e.g. no open ranges)

    • we should import this alright

    • Case1: only fix and no impacted:

      • do we want to time travel to find the version(s) that were impacted just before this?
      • there could be version marked as impacted and that's the one prior to the fixed version This may be often ambiguous... and this is inferred: should store this stating this is inferred/concluded and not a hard fact and we should have a log or queue entry so this issue can be revisited.
      • may be best done outside of importers proper, as some batch inference job See https://github.com/nexB/vulnerablecode/issues/500
    • Case2: only impacted and no fix:

      • we should import this alright, this is normal
      • we should "time travel" and create relationships with the set of versions that matched the impacted closed/well defined range at
    • it is OK to have no fix

    • it is OK to have knowledge of when the bug was introduced:

      • do we want to time travel to fin
  • Advisory with impacted or fixed packages using open ranges

Conclusion of the likely best approach for version ranges

  • If we have concrete enumerated exact versions, then we can store Vuln<>Package relationship as a fact.
  • Otherwise this is some version range and its resolution is always an inference.
  • With time travel we provide a high degree of confidence in the inference, other the inference is weak.
  • All inferences may need some level of review (guided the confidence of the inference)
  • Each importer should have an explicit separation between the code to import and the code to resolve version ranges (and time travel).

TODO:

Meeting on Thursday 2021-07-15 at 12:00 UTC

Participants:

  • @pombredanne
  • @Hritik14

The proposed agenda is:

  • documentation of new importers
  • alpine package versions
  • sorting imports
  • issue #494

These two items were not discussed:

  • security review status
  • server and infra

Documentation of new importers

  • we should progressively adopt the same documentation structure as scancode.io Docker files: we should progressively copy the same structure and approach as in scancode.io
  • Also README.rst needs to be beefed up and

Alpine package versions: notes and TODO

TODO

  • create ticket to refactor importer_yielder.py
  • revisit time travel. It should not be part of importer but something that is about data improvement, inference and refinements
  • and something that is about data improvement, inference and refinements should be a separate tool

Pending PRs

Let's close the ones that are not from Hritik14

We pinged:

This is also related to https://github.com/nexB/vulnerablecode/pull/436 Eventually we need a common way to obtain a range of version for a given importer (or rather for job that's fixing data after imports)

Other topics:

Meeting on Monday 2021-06-14 at 09:00 UTC

Participants:

  • Shivam @sbs2001
  • Philippe @pombredanne

Agenda:

  1. Code review of https://github.com/nexB/vulnerablecode/pulls/467 (Shivam)
  2. Integration for significant volume of vulnerability lookups (Shivam)
  3. Getting to clean slate to avoid upcoming merge conflicts (Philippe)

"Time travel" is to find about the range of affected package versions "at the time" of publication of an advisory.

Follow ups:

integration for significant volume of vulnerability lookups

To integrate VulnerableCode (VC) in ScanCode.io and other tools that can make intensive lookups on may packages, we would need to have a way to efficiently partially mirror VC data though API and reuse the VC models on the client application side (when based on the same stack like it is the case with ScanCode.io). Other things to consider:

  • using pub/sub or webhooks?

We need an issue for this. Design is needed

  1. was not discussed.
Clone this wiki locally