Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: sync data format with measurexlite #2238

Closed
bassosimone opened this issue Aug 26, 2022 · 2 comments
Closed

spec: sync data format with measurexlite #2238

bassosimone opened this issue Aug 26, 2022 · 2 comments
Assignees
Labels
cleanup There's need to cleanup stuff a bit data quality documentation Improvements or additions to documentation enhancement improving existing code or new feature methodology issues related to the testing methodology ooni/probe-engine priority/high research prototype techdebt This issue describes technical debt technical task technical tasks e.g. deployment

Comments

@bassosimone
Copy link
Contributor

bassosimone commented Aug 26, 2022

We want to ensure that web_connectivity@v0.5.x and measurexlite generate a data format that is compliant with what we have written inside the spec. If not, one of them or both need to change, to improve clarity.

It's good to perform this kind of activities now that web_connectivity@v0.5.x has not been released on a stable version of ooniprobe yet. This fact also holds for other experiments. But we must ensure we don't break existing experiments: they should continue to emit the same dataformat they used to do before.

We also need to make sure we are not going to emit optional fields, such as t0 when they're not available.

@bassosimone bassosimone self-assigned this Aug 26, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 26, 2022
In a pure step-by-step model, we don't need to trace HTTP round trips like we did before. We _may_ want in the future to also have some form of HTTP tracing (see #868 for a prototype) but doing that is currently not in scope for moving forward the step-by-step design. For this reason, I only added a public convenience function for formatting an OONI spec compatible request. I also added new fields, which should be documented inside the ooni/spec repository (see ooni/probe#2238).

Required by ooni/probe#2237
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 27, 2022
See what we documented at ooni/spec#257

Reference issue: ooni/probe#2238

While there, bump webconnectivity@v0.5 version because this change
has an impact onto the generated data format.
@bassosimone bassosimone added enhancement improving existing code or new feature documentation Improvements or additions to documentation priority/high technical task technical tasks e.g. deployment research prototype methodology issues related to the testing methodology data quality ooni/probe-engine techdebt This issue describes technical debt cleanup There's need to cleanup stuff a bit and removed triage labels Aug 27, 2022
bassosimone added a commit to ooni/spec that referenced this issue Aug 27, 2022
This pull request documents more precise stdlib resolver naming that will benefit web_connectivity@v0.5.

See ooni/probe#2238, ooni/probe#2237, ooni/probe-cli#885.

While there, fix the template for new PRs.
bassosimone added a commit to ooni/probe-cli that referenced this issue Aug 27, 2022
See what we documented at ooni/spec#257

Reference issue: ooni/probe#2238

See also the related ooni/spec PR: ooni/spec#257

See also ooni/probe#2237

While there, bump webconnectivity@v0.5 version because this change
has an impact onto the generated data format.

The drop in coverage is unavoidable because we've written some
tests for `measurex` to ensure we deal with DNS resolvers and transport
names correctly depending on the splitting policy we use.

(However, `measurex` is only used for the `tor` experiment and, per
the step-by-step design document, new experiments should use
`measurexlite` instead, so this is hopefully fine(TM).)

While there, fix a broken integration test that does not run in `-short` mode.
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 2, 2022
This issue was mentioned in ooni/probe#2137.

I double checked the spec and this field is not mentioned. We will
update it when we do ooni/probe#2238.
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 2, 2022
This issue was mentioned in ooni/probe#2137.

I double checked the spec and this field is not mentioned. We will
update it when we do ooni/probe#2238.
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
Basically, `t0` and `transaction_id` should be optional. Version 0.4.x
of web_connectivity should not include them, version 0.5.x should.

There is a technical reason why v0.4.x should not include them. The code
it is based on, tracex, does not record these two fields.

Whereas, v0.5.x, uses measurexlite, which records these two fields.

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
This diff adds the following fields to webconnectivity@v0.5:

1. agent, always set to "redirect" (legacy field);

2. client_resolver, properly initialized w/ the resolver's IPv4 address;

3. retries, legacy field always set to null;

4. socksproxy, legacy field always set to null.

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
The general idea behind this field is that we would be able
in the future to tweak the data model for some fields, by declaring
we're using a later version, so it seems useful to add it.

See ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
This diff fixes a bug where measurexlite was using "tls" as the
protocol for the TLS handshake when using TCP.

While this choice _could_ make sense, the rest of the code we have
written so far uses "tcp" instead.

Using "tcp" makes more sense because it allows you to search for
the same endpoint across different events by checking for the same
network and for the same endpoint rather than special casing TLS
handshakes for using "tls" when the endpoint is "tcp".

See ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
* fix(model/archival.go): more optional keys

Basically, `t0` and `transaction_id` should be optional. Version 0.4.x
of web_connectivity should not include them, version 0.5.x should.

There is a technical reason why v0.4.x should not include them. The code
it is based on, tracex, does not record these two fields.

Whereas, v0.5.x, uses measurexlite, which records these two fields.

Part of ooni/probe#2238

* fix(webconnectivity@v0.5): add more fields

This diff adds the following fields to webconnectivity@v0.5:

1. agent, always set to "redirect" (legacy field);

2. client_resolver, properly initialized w/ the resolver's IPv4 address;

3. retries, legacy field always set to null;

4. socksproxy, legacy field always set to null.

Part of ooni/probe#2238

* fix(webconnectivity@v0.5): register extensions

The general idea behind this field is that we would be able
in the future to tweak the data model for some fields, by declaring
we're using a later version, so it seems useful to add it.

See ooni/probe#2238

* fix(measurexlite): use tcp or quic for tls handshake network

This diff fixes a bug where measurexlite was using "tls" as the
protocol for the TLS handshake when using TCP.

While this choice _could_ make sense, the rest of the code we have
written so far uses "tcp" instead.

Using "tcp" makes more sense because it allows you to search for
the same endpoint across different events by checking for the same
network and for the same endpoint rather than special casing TLS
handshakes for using "tls" when the endpoint is "tcp".

See ooni/probe#2238

* chore: run alltests.yml for "alltestsbuild" branches

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
#942)

* fix(model/archival.go): more optional keys

Basically, `t0` and `transaction_id` should be optional. Version 0.4.x
of web_connectivity should not include them, version 0.5.x should.

There is a technical reason why v0.4.x should not include them. The code
it is based on, tracex, does not record these two fields.

Whereas, v0.5.x, uses measurexlite, which records these two fields.

Part of ooni/probe#2238

* fix(webconnectivity@v0.5): add more fields

This diff adds the following fields to webconnectivity@v0.5:

1. agent, always set to "redirect" (legacy field);

2. client_resolver, properly initialized w/ the resolver's IPv4 address;

3. retries, legacy field always set to null;

4. socksproxy, legacy field always set to null.

Part of ooni/probe#2238

* fix(webconnectivity@v0.5): register extensions

The general idea behind this field is that we would be able
in the future to tweak the data model for some fields, by declaring
we're using a later version, so it seems useful to add it.

See ooni/probe#2238

* fix(measurexlite): use tcp or quic for tls handshake network

This diff fixes a bug where measurexlite was using "tls" as the
protocol for the TLS handshake when using TCP.

While this choice _could_ make sense, the rest of the code we have
written so far uses "tcp" instead.

Using "tcp" makes more sense because it allows you to search for
the same endpoint across different events by checking for the same
network and for the same endpoint rather than special casing TLS
handshakes for using "tls" when the endpoint is "tcp".

See ooni/probe#2238

* chore: run alltests.yml for "alltestsbuild" branches

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
Code based on urlgetter had this event and we would like to have this
event with step-by-step code as well.

Because there's no tracing for HTTP when using step-by-step, we will
need to include emitting these events inside the boilerplate.

By doing that, we emit events out of order, so make sure we sort
them by T, which is "the moment when the event was collected".

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
Code based on urlgetter had this event and we would like to have this
event with step-by-step code as well.

Because there's no tracing for HTTP when using step-by-step, we will
need to include emitting these events inside the boilerplate.

By doing that, we emit events out of order, so make sure we sort
them by T, which is "the moment when the event was collected".

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
…done (#943)

Code based on urlgetter had this event and we would like to have this
event with step-by-step code as well.

Because there's no tracing for HTTP when using step-by-step, we will
need to include emitting these events inside the boilerplate.

By doing that, we emit events out of order, so make sure we sort
them by T, which is "the moment when the event was collected".

Part of ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
We're bumping the version number to reflect recent improvements in the
data format implemented in these pull requests:

- #942

- #943

- #944

Reference issue: ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
chore: web_connectivity v0.5.5

We're bumping the version number to reflect recent improvements in the
data format implemented in these pull requests:

- #942

- #943

- #944

Reference issue: ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
chore: web_connectivity v0.5.5

We're bumping the version number to reflect recent improvements in the
data format implemented in these pull requests:

- #942

- #943

- #944

Reference issue: ooni/probe#2238
bassosimone added a commit to ooni/spec that referenced this issue Sep 8, 2022
bassosimone added a commit to ooni/spec that referenced this issue Sep 8, 2022
@bassosimone
Copy link
Contributor Author

One last chunk of work is required here: changing "quic" to "udp" in TLS handshake and HTTP results to simplify joins for tabular formats.

bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
This diff changes the data format to prefer "udp" to "quic"
everywhere we were previously using "quic".

Previously, the code inconsistently used "quic" for operations
where we knew we were using "quic" and "udp" otherwise (e.g.,
for generic operations like ReadFrom).

While it would be more correct to say that a specific HTTP
request used "quic" rather than "udp", using "udp" consistently
allows one to see how distinct events such as ReadFrom and an
handshake all refer to the same address, port, and protocol
triple. Therefore, this change makes it easier to programmatically
unpack a single measurement and create endpoint stats.

Before implementing this change, I discussed the problem with
@hellais who mentioned that ooni/data is not currently using the
"quic" string anywhere. I know that ooni/pipeline also doesn't
rely on this string. The only users of this feature have been
research-oriented experiments such as urlgetter, for which such
a change would actually be acceptable.

See ooni/probe#2238
bassosimone added a commit to ooni/spec that referenced this issue Sep 8, 2022
See ooni/probe-cli#946 where we also
explain why we're doing this change.

Reference issue: ooni/probe#2238
bassosimone added a commit to ooni/spec that referenced this issue Sep 8, 2022
See ooni/probe-cli#946 where we also
explain why we're doing this change.

Reference issue: ooni/probe#2238
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
This diff changes the data format to prefer "udp" to "quic" everywhere we were previously using "quic".

Previously, the code inconsistently used "quic" for operations where we knew we were using "quic" and "udp" otherwise (e.g., for generic operations like ReadFrom).

While it would be more correct to say that a specific HTTP request used "quic" rather than "udp", using "udp" consistently allows one to see how distinct events such as ReadFrom and an handshake all refer to the same address, port, and protocol triple. Therefore, this change makes it easier to programmatically unpack a single measurement and create endpoint stats.

Before implementing this change, I discussed the problem with @hellais who mentioned that ooni/data is not currently using the "quic" string anywhere. I know that ooni/pipeline also doesn't rely on this string. The only users of this feature have been research-oriented experiments such as urlgetter, for which such a change would actually be acceptable.

See ooni/probe#2238 and ooni/spec#262.
bassosimone added a commit to ooni/probe-cli that referenced this issue Sep 8, 2022
This diff changes the data format to prefer "udp" to "quic" everywhere we were previously using "quic".

Previously, the code inconsistently used "quic" for operations where we knew we were using "quic" and "udp" otherwise (e.g., for generic operations like ReadFrom).

While it would be more correct to say that a specific HTTP request used "quic" rather than "udp", using "udp" consistently allows one to see how distinct events such as ReadFrom and an handshake all refer to the same address, port, and protocol triple. Therefore, this change makes it easier to programmatically unpack a single measurement and create endpoint stats.

Before implementing this change, I discussed the problem with @hellais who mentioned that ooni/data is not currently using the "quic" string anywhere. I know that ooni/pipeline also doesn't rely on this string. The only users of this feature have been research-oriented experiments such as urlgetter, for which such a change would actually be acceptable.

See ooni/probe#2238 and ooni/spec#262.
@bassosimone
Copy link
Contributor Author

This work has now been completed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup There's need to cleanup stuff a bit data quality documentation Improvements or additions to documentation enhancement improving existing code or new feature methodology issues related to the testing methodology ooni/probe-engine priority/high research prototype techdebt This issue describes technical debt technical task technical tasks e.g. deployment
Projects
None yet
Development

No branches or pull requests

1 participant