Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better data format versioning (aka data_format_version) #423

Closed
bassosimone opened this issue Mar 30, 2020 · 3 comments
Closed

Better data format versioning (aka data_format_version) #423

bassosimone opened this issue Mar 30, 2020 · 3 comments
Assignees
Labels
effort/L Large effort enhancement New feature or request ooni/probe-engine Issues related to github.com/ooni/probe-engine ooni/spec Issues related to github.com/ooni/spec priority/high High priority

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Mar 30, 2020

Based on a conversation with @FedericoCeratto, I propose that we always keep the data_format_version to be 0.2.0. We need to reckon that the data_format_version needs to version only the top-level keys, otherwise it quickly becomes madness.

Overall objective We want to declare the top-level data format and we want to be able to experiment with new "sub" data formats (e.g. network events) especially before we're blessed a stable "sub" data format. So we want to be able to declare that the outside envelope is reasonably stable and we want the inside of some specific subkey to vary until we're okay. Consider, for example, how the version number of WebSockets varied a bunch of times before settling into a stable version.

Because I blessed a bunch of version numbers already and submitted measurements using them sometimes, we probably need to use 0.5.0 as the next data format version. Since numbers are just numbers, I believe this is fine. In the current spec there is a long explanation describing differences in the data_format_version, which I'll shorten to explain that the next version MUST be 0.5.0 and that the numbers in between were consumed by me doing some experiments.

To properly roll back, I need to make sure that probe-engine emits all the top-level fields we are supposed to emit with 0.2.0. I also need to modify the spec to represent the status quo at 0.2.0 and specifically make sure we mention the resolver_ip field as optional.

To properly handle what happens inside of test_keys we will document into the spec that a probe should include into the top-level keys the following data structure:

{
  "extensions": {
    "<name>": 1
  }
}

where we provide on an optional basis the version of the extensions we use inside of the test keys. I should also document what one is expected to see if there are no explicit extensions (which is basically the statu quo as of Measurement Kit v0.9.0). Then I should make sure that for each extended data format, e.g., the HTTP template, we spell out clearly what is its version. Finally, I should make sure that probe-engine emits such structure for each experiment.

Each experiment will be responsible of filling the extensions field with the extensions it is currently including into the measurement itself. Versioning will be put inside of the spec.

Part of this is important to do: we don't want production probes to emit experimental data formats, while the extensions part could maybe wait? (But I don't see a reason for that)

@bassosimone bassosimone added enhancement New feature or request priority/high High priority effort/L Large effort ooni/spec Issues related to github.com/ooni/spec ooni/probe-engine Issues related to github.com/ooni/probe-engine labels Mar 30, 2020
@bassosimone bassosimone self-assigned this Mar 30, 2020
@FedericoCeratto
Copy link

I'm summarizing here some principles that were discussed:

  • data_format_version describes a subset of key-value fields and their semantics. Sematics are fixed for a given version.
  • Additional key-value fields can be added to the data structure and are out of the scope of data_format
  • Extensions effectively describe upper layers above the existing data_format. The version number is an incremental integer to allow quick comparison and increases independently from data_format and from each other
  • Extensions cannot make assumptions around the presence or version of other extensions

@bassosimone bassosimone changed the title Better version number for data format Better data format versioning (data_format_version and beyond) Mar 30, 2020
@bassosimone bassosimone changed the title Better data format versioning (data_format_version and beyond) Better data format versioning (aka data_format_version) Mar 30, 2020
@bassosimone
Copy link
Contributor Author

cc: @anadahz with whom I recently discussed this topic

bassosimone added a commit that referenced this issue Apr 1, 2020
This is part of the plan discussed with @FedericoCeratto
here #423

I am anticipating this part of the change so we don't release
a probe-cli that uses such annotation.

More changes to follow, after probe-engine 0.9.0, but I am not
going to guarantee 0.10.0 will happen before stable cli.
bassosimone added a commit that referenced this issue Apr 1, 2020
This is part of the plan discussed with @FedericoCeratto
here #423

I am anticipating this part of the change so we don't release
a probe-cli that uses such annotation.

More changes to follow, after probe-engine 0.9.0, but I am not
going to guarantee 0.10.0 will happen before stable cli.
bassosimone added a commit that referenced this issue Apr 6, 2020
bassosimone added a commit that referenced this issue Apr 6, 2020
bassosimone added a commit that referenced this issue Apr 6, 2020
bassosimone added a commit that referenced this issue Apr 6, 2020
* model: let Input be `null` if empty

Part of #423

* Add standard annotations to the measurement

Part of #423

* model/model.go: sort variables by JSON serialization name

Part of #423

* Declare measurement extensions and version

See #423

* repair tests

* More robust implementation
@bassosimone
Copy link
Contributor Author

This is done. I will pin probe-cli to the master of probe-engine so we have the most recent data format version, which is not strictly necessary but useful to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/L Large effort enhancement New feature or request ooni/probe-engine Issues related to github.com/ooni/probe-engine ooni/spec Issues related to github.com/ooni/spec priority/high High priority
Projects
None yet
Development

No branches or pull requests

2 participants