Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For packages, enable sorting versions in database queries, or store additional computed fields #1549

Closed
pombredanne opened this issue Aug 13, 2024 · 3 comments · Fixed by #1686

Comments

@pombredanne
Copy link
Member

For packages, I would like to enable the sorting of versions using database queries.

Why? we have performance issues such as:

Sorting by version is important to determine what are the next and latest non-vulnerable versions. The process implies transforming a queryset to a list sorted by a key function converting the version string to a univers object.

We could consider storing the version relative order as a model field and order the model based on that field to get an always sorted package model.

There are a few consideration for this:

  1. Using an integer field would need either to keep gaps or to renumber this field if there are versions that are added in between existing versions. Therefore using a float field would help ensure that there is always a float in between two versions when inserting a new version.
  2. We would need to know if a version if a pre-release, possibly as a package boolean flag

With these two extra fields, each time a new package version is added, we would recompute its ordering once by sorting versions using univers and then give the proper order float value to the float field in between two versions, before the first or after the last version and then saving the package.

We could then have a database-backed queryset with a sort order on versions, and an optional filter on pre-releases. We should then be able to directly query for the next and latest non-vulnerable versions (possibly with an annotate) without having an intermediate break in the queryset chain because of the sort by version.

Alternatively, or concurrently we could also store versions-related computed flags and relationships on a package instance such as:

  • is_vulnerable if a package version is affected by any vulnerability (though this is now computed as an annotation that seems fast enough)
  • next_non_vulnerable and latest_non_vulnerable could be stored relationships that are updated when new versions and new vulnerabilities are added (though these could be computed as an annotation if we enable storing versions order as suggested above)
  • affected_by and fixing relationships to vulnerabilities to replace the PackageRelatedVulnerability many to many and its problematic fix boolean field as tracked separately in:
@pombredanne
Copy link
Member Author

This is completed. We merged:

The design is that we have an improver that will continuously update a version_rank field in packages so that we can then query the data with an "order_by" clause and have good performances.

@pombredanne
Copy link
Member Author

@TG1999 can you elaborate a bit on final design, for reference?

@TG1999
Copy link
Contributor

TG1999 commented Dec 18, 2024

Design for the pipeline and how we are retrieving packages in a sorted manner:

  • Added a pipeline ComputeVersionRankPipeline: This pipeline takes packages grouped by type, namespace and name.
  • For each group we use univers range for the type of that group and sort all the packages with that univers range.
  • Then we enumerate on the sorted order and assign ranks to all packages and store it's as package version rank.

If a package has been added after the pipeline has ran. We re-index all the packages for the group that package belongs.

In Package model we have a clause for ordering in this manner that ensures packages always come in a sorted order

ordering = ["type", "namespace", "name", "version_rank", "version", "qualifiers", "subpath"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants