Skip to content

Commit

Permalink
Roll back to v0.2.0 and introduce data format extensions
Browse files Browse the repository at this point in the history
  • Loading branch information
bassosimone committed Apr 6, 2020
1 parent 13db93c commit aca9679
Show file tree
Hide file tree
Showing 7 changed files with 91 additions and 98 deletions.
96 changes: 30 additions & 66 deletions data-formats/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

| Authors | Arturo Filastò et al. |
|------------|-----------------------|
| Version | 0.4.0 |
| Version | 0.2.0 |
| Maintainer | Simone Basso |

## Overview
Expand All @@ -21,15 +21,8 @@ of more fields generated by the experiment itself. That is:

```JavaScript
{
"data_format_version": "0.3.1",
// other toplevel keys that are part of the base data format
// ...
"test_keys": {
// keys written by test templates
// ...
// keys written by each experiment
// ...
}
"data_format_version": "0.2.0",
"test_keys": {}
}
```

Expand All @@ -42,30 +35,14 @@ emit `null`s) unless this has been explicitly documented in the field descriptio

## Data format version

For compatibility with the legacy OONI pipeline, since `data_format_version` equal
to `0.4.0`, a OONI probe SHOULD send `data_format_version` equal to `0.2.0`. In such
case, the probe MUST use the `real_data_format_version` annotation to declare the
the real data format version being used:

```JavaScript
{
"annotations": {
"real_data_format_version": "0.4.0"
},
"data_format_version": "0.2.0",
// other toplevel keys that are part of the base data format
// ...
"test_keys": {
// keys written by test templates
// ...
// keys written by each experiment
// ...
}
}
```
The current `data_format_version` is `0.2.0`. This applies only to the keys
in the external envelope. Since 2020-04-06, the `extensions` top-level key describes
the data formats contained inside the `test_keys` (see below).

The annotation allows to distinguish between probes using data format version
`0.2.0` and probes using subsequent versions.
Between November 2019 and April 2020, experimental versions of OONI probe had
version numbers ranging from `0.2.1` to `0.4.0`. Since 2020-04-06, the version
is back again to `0.2.0`. Because such larger version numbers were used by
experimental versions of OONI the next major data format version will be `0.5.0`.

## Example

Expand All @@ -75,11 +52,15 @@ The following is a valid JSON that was edited for brevity.
{
"annotations": {
"platform": "macos",
"real_data_format_version": "0.4.0"
},
"data_format_version": "0.2.0",
"extensions": {
"dnst": 0,
"httpt": 0,
"tcpconnect": 0
},
"input": null,
"measurement_start_time": "2020-01-10 17:25:19",
"test_runtime": 4.426603178,
"probe_asn": "AS30722",
"probe_cc": "IT",
"probe_ip": "127.0.0.1",
Expand Down Expand Up @@ -181,6 +162,7 @@ The following is a valid JSON that was edited for brevity.
"telegram_web_status": "ok"
},
"test_name": "telegram",
"test_runtime": 4.426603178,
"test_start_time": "2020-01-10 17:25:19",
"test_version": "0.0.4"
}
Expand All @@ -191,22 +173,26 @@ In this example:
- all toplevel keys belong to the base data format.

- the `agent` and `requests` keys within the `test_keys`
belong to the HTTP data format.
belong to the HTTP data format, which is declared as
`httpt` in of the `extensions` map.

- the `queries` key within the `test_keys` belongs to
the DNS data format.
the DNS data format, which is declared as `dnst` in the
`extensions` map.

- the `tcp_connect` key within the `test_keys` belongs
to the TCPConnect data format.
to the TCPConnect data format, which is declared as
`tcpconnect` in the `extensions` map.

- all the other keys within `test_keys` are generated
by the `telegram` experiment.

## Index

This directory contains the specification of:
This directory contains the specification of
[the base data format](df-000-base.md) as well as
of the following extensions:

- [the common data format](df-000-base.md)
- [the HTTP data format](df-001-httpt.md)
- [the DNS data format](df-002-dnst.md)
- [the Scapy data format](df-003-scapy.md)
Expand All @@ -225,29 +211,7 @@ See the [nettests](../nettests) directory for the experiments' specs.
- `0.2.0` [2016-01-27]: the new JSON format. OONI Probe CLI v2.x and OONI
Probe Mobile when using Measurement Kit as the measurement engine.

- `0.2.1` [2019-11-11]: should have been `0.3.0` because we moved the
`resolver_ip` field to toplevel keys. Was briefly used by unstable OONI
Probe CLI v3.0.0.

- `0.3.0` [2019-12-02]: same as `0.2.1` but renamed for correctness.

- `0.3.1` [2019-12-29]: added the `resolver_asn` and
`resolver_network_name` toplevel keys.

- `0.3.2` [2020-01-10]: initially used to deprecate `test_start_time`, but
that was a mistake, and it was reverted. Thus this is now an alias for `0.3.1`.

- `0.3.3` [2020-01-11]: document TLSHandshake and NetworkEvents templates,
add timing support via `t` to DNS and TCPConnect templates, allow cross
referencing DNS, TCPConnect, NetworkEvents and possibly TLSHandshake using
the `conn_id` and `dial_id` identifiers.

- `0.3.4` [2020-01-11]: add `transaction_id` to link the HTTP template
with DNS template, TCPConnect template, TLSHandshake template, and
NetworkEvents template. Make sure we clarify which IDs may be reused
and which are expected to appear only once per measurement session.

- `0.4.0` [2020-02-10]: clients SHOULD set `data_format_version` to
`0.2.0` to avoid breaking the legacy pipeline. If they do so, they
MUST set the `real_data_format_version` annotation to the real data
format version that is being used.
Between `2019-11-11` and `2020-04-06`, experimental versions of OONI have
used the following versions `0.2.1`, `0.3.0`, `0.3.1`, `0.3.2`, `0.3.3`,
and `0.4.0`. Since `2020-04-06`, `0.2.0` is again used by experimental and
stable versions of OONI Probe.
69 changes: 41 additions & 28 deletions data-formats/df-000-base.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ directory's [README](README.md) for the basic concepts.
"backend_version": "",
"bucket_date": "2019-10-10",
"data_format_version": "0.3.1",
"extensions": {},
"id": "bc1ff44a-04e7-45e0-81a6-46bc95c7c6b0",
"input": "http://example.com/",
"input_hashes": [],
Expand All @@ -28,7 +29,7 @@ directory's [README](README.md) for the basic concepts.
"software_name": "ooniprobe-ios",
"software_version": "2.1.0",
"test_helpers": {},
"test_keys": {},
"test_keys": {},
"test_name": "web_connectivity",
"test_runtime": 2.2955930233,
"test_start_time": "2019-10-10 23:58:13",
Expand All @@ -50,30 +51,37 @@ of course are not supposed to emit this field.
- `data_format_version` (`string`): indicates the data format version. See
[README.md](README.md) for the current version and for the versions history.

- `extensions` (`map[string]int`; optional): SHOULD describe the extensions
to the base data format included in the `test_keys` field. The name of an
extension is the obtained directly from the file name describing the extension
in this directory, with the `df-xxx` prefix and the `.md` suffix removed. A
probe SHOULD describe the extensions included by its measurements.

- `id` (`string`; optional): client-generated UUID4 identifying this measurement
in the context of a set of measurements (i.e. a report). Consumers of OONI data
SHOULD NOT trust this identifier to uniquely identify the measurement. This
identifier is only meaningful for measurements that have not been submitted to
a OONI collector yet. In fact, OONI collectors SHOULD clear this field to avoid
any potential confusion caused by it.

- `input` (`string`; optional; nullable): if this experiment accepts any input, the
- `input` (`string`; nullable): if this experiment accepts any input, the
input that was used to produce this measurement. For example, the Web
Connectivity experiment uses URLs as input.
Connectivity experiment uses URLs as input. Otherwise, this field SHOULD
be present and set to `null`.

- `input_hashes` (`[]string`; optional; deprecated): historical field that
used to contain the SHA256s of all inputs provided to the experiment. Modern
implementations, e.g. Measurement Kit, typically emit an empty list. All
clients using `v0.2.1`, or greater, of the data format SHOULD NOT emit this field at all.
modern clients SHOULD NOT emit this field at all.

- `measurement_start_time` (`string`): time when this measurement was
started in UTC, using the `"2006-01-02 08:04:05"` format. Note that
ooniprobe <= 1.4.0 generates skewed time information.

- `options` (`[]string`; optional; deprecated): list of options passed on the
command line when running this specific experiment. Modern implementations,
e.g. Measurement Kit, typically emit an empty list here. All clients using
`v0.2.1` or greater of the data format SHOULD NOT emit this field at all.
e.g. Measurement Kit, typically emit an empty list here. All modern clients
SHOULD NOT emit this field at all.

- `probe_asn` (`string`): AS Number of the probe (prefixed by AS, e.g.,
`"AS1234"`), or `"AS0"` if the user does not want to share their ASN.
Expand All @@ -83,8 +91,7 @@ e.g. Measurement Kit, typically emit an empty list here. All clients using

- `probe_city` (`string`; optional; deprecated): name of the city where the
measurement was run. If the user does not want to share this information,
this field should be set to `null` by `v0.2.0` clients. Clients using `v0.2.1`
or greater SHOULD NOT emit this field.
this field was historically set to `null`; modern clients SHOULD NOT emit it.

- `probe_ip` (`string`): IP address of the probe, or `"127.0.0.1"` if
the user does not want to share their IP.
Expand All @@ -96,14 +103,14 @@ course are not supposed to emit this field.
- `report_id` (`string`): identifier of a set of related measurements
generated by OONI backends when submitting one or more measurements.

- `resolver_asn` (`string`; since `0.3.1`): like `probe_asn` but for
- `resolver_asn` (`string`; optional; since `2019-12-29`): like `probe_asn` but for
`resolver_ip` rather than for `probe_ip`.

- `resolver_ip` (`string`; since `0.2.1`): IP of the DNS resolver used by
- `resolver_ip` (`string`; optional; since `2019-11-11`): IP of the DNS resolver used by
the probe, as determined by the measurement engine.

- `resolver_network_name` (`string`; since `0.3.1`): like `probe_network_name`
but for `resolver_ip` rather than for `probe_ip`.
- `resolver_network_name` (`string`; optional; since `2019-12-29`): like
`probe_network_name` but for `resolver_ip` rather than for `probe_ip`.

- `software_name` (`string`): name of the software that has generated
this specific measurement (e.g., `"ooniprobe"`).
Expand Down Expand Up @@ -159,32 +166,32 @@ SHOULD NOT assume that clients using `data_format_version < 0.2.1` have always
used string values. Since `v0.2.1` clients MUST always use string values. A
client SHOULD always add to the map of annotations:

- `engine_name` (`string`): the name of the measurement engine
- `engine_name` (`string`): the name of the measurement engine

- `engine_version` (`string`): the version of the measurement engine
- `engine_version` (`string`): the version of the measurement engine

- `engine_version_full` (`string`): the version of the measurement
engine as generated by `git describe --tags`
- `engine_version_full` (`string`): the version of the measurement
engine as generated by `git describe --tags`

- `network_type` (`string`): one of:
- `network_type` (`string`): one of:

- `mobile`: when OONI Probe Mobile is using 2G/3G/4G/5G networks.
- `mobile`: when OONI Probe Mobile is using 2G/3G/4G/5G networks.

- `wifi`: when OONI Probe Mobile is using Wi-Fi networks.
- `wifi`: when OONI Probe Mobile is using Wi-Fi networks.

- `platform` (`string`): one of:
- `platform` (`string`): one of:

- `android`
- `android`

- `ios`
- `ios`

- `lepidopter`
- `lepidopter`

- `linux`
- `linux`

- `macos`
- `macos`

- `windows`
- `windows`

## Test Helpers

Expand Down Expand Up @@ -218,9 +225,14 @@ because it was not relevant for this discussion.
"annotations": {
"platform": "macos"
},
"data_format_version": "0.3.1",
"data_format_version": "0.2.0",
"extensions": {
"dnst": 0,
"httpt": 0,
"tcpconnect": 0
},
"input": null,
"measurement_start_time": "2020-01-10 17:25:19",
"test_runtime": 4.426603178,
"probe_asn": "AS30722",
"probe_cc": "IT",
"probe_ip": "127.0.0.1",
Expand All @@ -232,6 +244,7 @@ because it was not relevant for this discussion.
"software_version": "0.1.0-dev",
"test_keys": {},
"test_name": "telegram",
"test_runtime": 4.426603178,
"test_start_time": "2020-01-10 17:25:19",
"test_version": "0.0.4"
}
Expand Down
6 changes: 5 additions & 1 deletion data-formats/df-001-httpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ This document describes the keys with `test_keys` that all experiments
using HTTP SHOULD populate, possibly using directly the specific template
code. See this directory's [README](README.md) for the basic concepts.

|------------|---------|
| Name | `httpt` |
| Version | 0 |

## Specification

```JSON
Expand Down Expand Up @@ -90,7 +94,7 @@ instance of `tor` that we may be using for measuring.

In case we have the following headers:

```
```bash
> Foo: bar
> Foo: <binary-data-here>
```
Expand Down
6 changes: 5 additions & 1 deletion data-formats/df-002-dnst.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ This document describes the keys with `test_keys` that all experiments
using DNS SHOULD populate, possibly using directly the specific template
code. See this directory's [README](README.md) for the basic concepts.

|------------|--------|
| Name | `dnst` |
| Version | 0 |

## Specification

```JSON
Expand Down Expand Up @@ -115,7 +119,7 @@ mailbox of the person responsible for this zone.
that should be elapsed before the zone should be retried in case of failure.

- `serial_number` (`int`; only for SOA answers): version number
of the original copy of the zone.
of the original copy of the zone.

- `ttl` (`int`; nullable): the TTL if known, otherwise `null`.

Expand Down
4 changes: 4 additions & 0 deletions data-formats/df-005-tcpconnect.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ performing TCP connects SHOULD populate, possibly using directly the
specific template code. See this directory's [README](README.md) for the
basic concepts.

|------------|--------------|
| Name | `tcpconnect` |
| Version | 0 |

## Specification

```JSON
Expand Down
4 changes: 3 additions & 1 deletion data-formats/df-006-tlshandshake.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ This document describes the keys with `test_keys` that all experiments
using TLS SHOULD populate, possibly using directly the specific template
code. See this directory's [README](README.md) for the basic concepts.

This data format is available since data format version 0.3.3.
|------------|----------------|
| Name | `tlshandshake` |
| Version | 0 |

## Specification

Expand Down
4 changes: 3 additions & 1 deletion data-formats/df-008-netevents.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ This document describes the keys with `test_keys` that experiments
MAY use to include network-level events. See this directory's
[README](README.md) for the basic concepts.

This data format is available since data format version 0.3.3.
|------------|-------------|
| Name | `netevents` |
| Version | 0 |

## Specification

Expand Down

0 comments on commit aca9679

Please sign in to comment.