Releases: kernelci/kcidb
v9
Another major release. Most-visible changes are listed below.
After this release we'll be improving our CI/CD to shorten our development cycle, so we can make smaller and more frequent releases.
Schema
- Switch to using v4 schema, released with kcidb-io v3. Changes from v3 schema include:
-
Rename
revisions
tocheckouts
to better represent what is actually submitted, improve correlation, and prevent data loss. The checkouts are identified purely by origin-generated IDs, similarly to builds and tests. The commit hash only appears ingit_commit_hash
field now, and the patchset hash gets its own field.NOTE: the submitting CI systems that test and send revisions more than once are urged to upgrade to v4 schema to avoid revision ID-inherited checkouts overwriting each other.
-
Add
patchset_hash
field to checkouts to store the patchset hash, which was previously a part of revision ID.NOTE: you need to set
patchset_hash
to empty string, if you have no patches applied on top of the commit you checked out, otherwise your data might not appear in reports and dashboards. -
Rename the checkout's
patch_mboxes
field topatchset_files
to better correspond to the newpatchset_hash
field. -
Rename all
description
fields tocomment
. Thedescription
name had the meaning of describing each object overall. However we have other, dedicated fields describing objects in detail, and we'd rather use those to generate our own description, consistently, regardless of the submitter, and use thecomment
field to augment that description. -
Add
log_url
field to tests. It is meant to contain the URL pointing to a plain-text log file with the highest-level overview of the test's execution, similar to thelog_url
field in builds and checkouts. All the other log and output files should go intooutput_files
. -
Add
log_excerpt
field to all objects, meant to contain the part of the object's log (normally referenced bylog_url
), that was most relevant to its status. E.g. patch errors for a failed checkout, compiler errors for a failed build, error messages for a failed test. It could also begit am
output for a successful checkout, the last hundred lines of a successful build, or a test suite summary for a successful test. -
Remove the
publishing_time
field from checkouts, as nobody is sending them, it's not really possible to know a commit's publishing time in git, and there are no maillist-posted patches being submitted yet, for which that could be possible.
-
- Support validating I/O JSON against a specific schema version with
kcidb-validate
. Thank you, @pawiecz! - Support outputting a specific version of the schema with
kcidb-schema
. Thank you, @effulgentstar! - Support specifying the version of the schema to upgrade I/O data to, with
kcidb-upgrade
.
Database
-
Separate the database client and database drivers. This allows implementing support for more databases, and pseudo-databases.
Switch the library to accepting a single string specifying the driver and its parameters for opening a database, instead of BigQuery-specific project ID and dataset name. Switch all the database-accessing command-line tools to accepting just one option:
-d
/--database
, specifying the driver and its parameters, instead of the two BigQuery-specific options:-p
/--project
and-d
/--dataset
.E.g. instead of running:
kcidb-query -p kernelci-production -d kernelci05 -c redhat:122398712
, run:kcidb-query -d bigquery:kernelci-production.kernelci05 -c redhat:122398712
.Use the
--database-help
option with any database-accessing tool to print documentation on all drivers and their parameters (thank you, @amfelso). -
Add
null
driver, which just discards loaded data, and returns no data for queries, which is useful for testing and development. -
Add SQLite database driver (
sqlite
), supporting all the operations we use on BigQuery. This simplifies development and testing of subscriptions and notifications by removing the need for BigQuery access. -
Add
json
database driver - an extension of the SQLite driver, always storing the database in-memory, and pre-loading it with JSON I/O data from stdin. This lets us implement command-line tools simulating notification generation directly from the JSON generated by a CI system, without the need to create or access a database explicitly. -
Add object de-duplication when either loading into, or querying from the database. As previously, if there are two objects with the same ID being loaded into, or queried from the database, and a field's value is present in both of them (is not NULL in both of them), then the used value will be picked out of those two non-deterministically.
-
Replace BigQuery tables with views returning de-duplicated objects. Prefix the original table names with
_
. This makes querying the BigQuery database easier in code, manually, and in our Grafana dashboards. -
Remove support for querying database objects using LIKE patterns matching their IDs, from both the library and the command-line tools, since nothing and nobody was using that, and since that simplifies the code.
-
Remove the
kcidb-db-complement
tool, since the "complement" operation is no longer required by the new ORM. Thank you, @mharyam!
ORM
-
Implement a new ORM layer to support representing results of any query as Python objects (e.g. revisions aggregated from checkouts), and summarizing results (e.g. giving a build/test PASS/FAIL for a revision). Use a custom "pattern" syntax inside the ORM and with command-line tools, to specify the objects to query or notify about.
E.g.
>checkout[redhat:12398712]#>*#
pattern matches the checkout with IDredhat:12398712
and all its children objects (builds and tests), and e.g.>test[kernelci:8768ad33f]<*$
matches the ultimate parent (revision) of a test with IDkernelci:8768ad33f
.Use the
--pattern-help
option with any ORM-using tool (e.g.kcidb-notify
) to print the pattern's ABNF syntax and some examples. -
Add
kcidb-oo-query
tool, which outputs the internal object-oriented representation of database objects matching the specified ORM "pattern", and is useful for debugging and developing the ORM layer.
Notifications
- Rework our notifications to aggregate results coming from multiple CI systems for the same revision, and to summarize build and test results into a compact message. Support subscription-specific notification templates, allowing sharing and reusing of various pieces and macros with others.
- Add a minimal HTML version to notification messages, to force some clients (e.g. GMail and groups.io) to use fixed-width fonts, for correct formatting. Thank you, @effulgentstar!
- Remove the
kcidb-summarize
andkcidb-describe
tools, since the notion of "canonical" text rendering of database objects has been removed from the new ORM. - Add
kcidb-ingest
tool, which generates notifications for objects created or modified by loading the input data into a (temporary) database. This emulates the notification-generation process deployed to Google Cloud without requiring it, and helps with developing and testing subscriptions and notifications.
Miscellaneous
- Fold the
kcidb-mq-publisher-*
andkcidb-mq-subscriber-*
tools intokcidb-mq-io-publisher
andkcidb-mq-io-subscriber
respectively. This reduces the number of KCIDB executables. - Add
kcidb-mq-pattern-publisher
andkcidb-mq-pattern-subscriber
tools for managing ORM Pattern message queues used in our Google Cloud deployment. - Automate Google Cloud deployment and start doing test deployments in CI.
v8
Another major release. Changes include:
- Support processing JSON streams for all command-line tools. Now it's possible to feed multiple JSON report objects, one after another, into a single KCIDB command, and have them processed appropriately. RFC 7464 is supported as well. This removes the overhead of starting the tool (and connecting to the cloud) for every submitted report.
- Make
kcidb-merge
accept the reports to merge on standard input, as a JSON stream, instead of expecting them as file arguments. - Make
kcidb-notify
accept the "new" reports on standard input, as a JSON stream, instead of expecting them as file arguments. - Support splitting the data retrieved from the database into multiple reports, limited by the number of objects, when using the library or the command-line tools. This allows retrieving large amounts of data without running out of memory. Support output either as simple concatenated-JSON streams, one-report-per-line, or using the RFC 7464 format.
- Extract the kcidb.io package into a separate distribution called
kcidb-io
, to minimize the number of dependencies required for validating report data. See its v1 and v2 release notes for changes since kcidb v7. One important change brought by this is enabling enforcement offormat
rules in JSON schema for fields containing URLs, timestamps, and email addresses. If any of those were incorrect in your data before, now they will fail to validate. - Make sure the report is successfully sent to the message queue before returning from the
submit()
/publish()
function inkcidb
library, to avoid data loss. Before this the report could be handled later by a separate thread for the purpose of batching multiple submissions. Provide a function (future_publish()
) to still allow batching and delayed submission. - Make
kcidb-submit
andkcidb-mq-publisher-publish
print "submission IDs" (message queue message IDs) of each sent report. Note that due to batching the IDs could be printed with a delay, even after multiple following report were accepted, but they would still be printed in order. - Reduce amount of internal consistency verification in KCIDB code, by default. This improves performance when processing multiple/large datasets.
- Ignore Syzbot test results in PoC subscriptions until we implement issues/incidents and can handle its frequent test failures.
- Stop sorting JSON object keys in command-line tool output. The order will change, but will still stay stable mostly.
- Add
SUBMISSION_HOWTO.md
explaining how to start submitting to KCIDB. - Add a minimal
Dockerfile
for a container with KCIDB installed.
v7
A minor release, including the following changes:
- Switch the notification templates to working with v3 schema introduced in the previous release. Before that they would produce nonsense when describing revisions.
- Implement
kcidb_load_queue
- a Google Cloud Function optimizing the submission loading to avoid exceeding BigQuery load job quota and stalling. Pull submissions from the queue explicitly, providing more information on the speed of processing and the outstanding data, than the previously-used Google Cloud Function retry system would. Explicit pulling also allows holding submissions in the queue while upgrading or debugging without losing data. The new implementation could still hit the quota, but the probability of that is low, and so is complexity. Rename the previous implementation tokcidb_load_message
. - Optimize I/O data merging to speed up 3+ dataset cases dramatically. This is particularly useful for bundling submissions before loading to BigQuery in
kcidb_load_queue
. - Add
kcidb-count
tool outputting the total number of objects (revisions/builds/tests) contained in I/O data. The underlying implementation is used to calculate cut-off point when collecting submissions to load inkcidb_load_queue
.
v6
A major release of KCIDB. Changes include:
- Implement schema v3.0, with the following changes. See the attached
kcidb.v3.0.schema.json
for details.- Re-introduce the separate
origin
field, but keep the origin in IDs as necessary, regardless. - Tighten the definition of the revision ID: now it must be the commit hash, optionally followed by a plus (
+
) character and a sha256 hash identifying the applied patches. This allows correlating received reports across submitters. - Add
tree_name
field to revisions, containing the widely-recognized name of the base code's sub-tree. E.g. "mainline", or "net-next". - Rename revision fields
git_repository_commit_name
andgit_repository_commit_hash
togit_commit_name
andgit_commit_hash
respectively, making them easier to read and not linked to the containing repository. - Require Git repository URLs to start with either
https://
(preferably) orgit://
.
- Re-introduce the separate
- Add
kcidb-notify
tool, taking new (and existing) I/O data and outputtingNUL
-terminated notification messages. Could be used to debug notifications, or as an alternative way of generating them in production. - Add support for merging I/O data, and the corresponding
kcidb-merge
tool. Could be useful for merging smaller submissions together into bigger ones. - Add support for specifying logging level to every command-line tool. Nothing much is logged yet, only queries executed by the database client. The default level is
NONE
, disabling any logging. - Add minimal logging to Google Cloud Functions, set
INFO
as the log level. - Log data coming to Google Cloud Functions with
DEBUG
level. - Add a dummy subscription for mainline tree failures.
- Support sending notifications to selected subscriptions only in Google Cloud Functions, select "mainline".
- Support querying objects using exact IDs (both for library and command-line tools), in addition to LIKE patterns, which works much faster.
- Switch to querying exact object IDs in notification generation, speeding it up dramatically.
- Add
X-KCIDB-Notification-ID
header to notification messages, containing the (unique) notification ID. - Support and require specifying the Firestore collection path with the spooled notifications, both for Google Cloud Functions and the
kcidb-spool-wipe
tool.
The IDs in the existing dataset were updated for the new schema using the attached update-revision-ids
script.
v5
Another major release of kcidb includes:
-
Switch to report submission via Google Cloud Pub/Sub message queue. This speeds up submission considerably and allows implementing report notifications.
However, this also changes the parameters required for submission: instead of BigQuery dataset name (e.g.
kernelci03
), these now should be the Google Cloud project ID (kernelci
) and the Pub/Sub topic name (kernelci_new
). OTOH, these parameters won't need to be updated whenever we switch to a new dataset.The required query parameters stay the same.
The Client interface in the library changes accordingly.
See
kcidb-submit --help
andkcidb-query --help
output for details, as well as the code documentation forkcidb.Client
class. -
Implement preliminary report notification system, with two dummy subscriptions and e-mails sent to kernelci-results-staging@groups.io. Spool the generated notifications in Google Cloud Firestore database, to avoid sending the same notification twice. Implement subscriptions as Python modules matching the report objects (revisions/builds/tests) of interest and generating notifications.
-
Add
kcidb-spool-wipe
tool for removing (old) notifications from the notification spool. -
Add two tools for producing a summary and a description of a report object:
kcidb-summarize
andkcidb-describe
respectively. These take report data on the standard input, and the name of the object list, plus optional IDs of objects to process on the command line. They output a text summary or a text description of the object(s), the same way as they would appear in a notification e-mail. These could be used for testing both the data you submit and the report generation. -
Add support for querying particular objects from the database, using SQL LIKE patterns for IDs. Also allow querying the matching objects' parents and/or children. See
kcidb-query --help
output for details. -
Add
kcidb-db-dump
tool for dumping the whole database unconditionally, doing the previous job ofkcidb-db-query
, which acquires the same object selection parameters askcidb-query
does. -
Fix
kcidb-db-complement
tool andkcidb.db.Client.complement()
function to not produce a combinatorial explosion when fetching multiple copies of the same object from the database.
v4
v3
A major release with lots of changes, including the below.
- Add I/O schema v2.0. Changes below.
- Merge
*origin
and*origin_id
fields into*id
fields, for all objects. - Explicitly prohibit resource file names from containing directory names (i.e. the
/
character) to allow using them inContent-Disposition: filename=
headers.
- Merge
- Implement upgrading data from older to newer schema versions, automatically. Add a tool for manual upgrading of I/O data, called
kcidb-upgrade
. - Add storing the latest I/O schema version in the BigQuery dataset, when initializing it. Prohibit loading and querying data into/from the dataset if its major version is not the same as the major version of KCIDB's latest I/O schema.
- Add a test catalog file (
tests.yaml
) containing identifiers for tests submitted by CI systems (CKI mostly so far), and a tool for validating the catalog, calledkcidb-tests-validate
. - To make space for adding tools for other subsystems, rename
kcidb-init
tokcidb-db-init
andkcidb-cleanup
tokcidb-db-cleanup
. Makekcidb-submit
andkcidb-query
tools implementation-agnostic, and add implementation-specifickcidb-db-load
andkcidb-db-query
. - Add an experimental tool called
kcidb-db-complement
, which takes I/O data and returns it with all missing, but referenced objects added. The mechanism is to be used for processing subscriptions and generating reports, but the tool may be deprecated later. - Implement prototype tools for managing message queues and communicating through them:
kcidb-mq-publisher-init
kcidb-mq-publisher-cleanup
kcidb-mq-publisher-publish
kcidb-mq-subscriber-init
kcidb-mq-subscriber-cleanup
kcidb-mq-subscriber-pull
- Implement a placeholder for the future Google Cloud Functions module communicating with a message queue. To be used for intercepting submissions and generating notifications with reports.