-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec: sync data format with measurexlite #2238
Labels
cleanup
There's need to cleanup stuff a bit
data quality
documentation
Improvements or additions to documentation
enhancement
improving existing code or new feature
methodology
issues related to the testing methodology
ooni/probe-engine
priority/high
research prototype
techdebt
This issue describes technical debt
technical task
technical tasks e.g. deployment
Comments
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Aug 26, 2022
In a pure step-by-step model, we don't need to trace HTTP round trips like we did before. We _may_ want in the future to also have some form of HTTP tracing (see #868 for a prototype) but doing that is currently not in scope for moving forward the step-by-step design. For this reason, I only added a public convenience function for formatting an OONI spec compatible request. I also added new fields, which should be documented inside the ooni/spec repository (see ooni/probe#2238). Required by ooni/probe#2237
4 tasks
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Aug 27, 2022
See what we documented at ooni/spec#257 Reference issue: ooni/probe#2238 While there, bump webconnectivity@v0.5 version because this change has an impact onto the generated data format.
bassosimone
added
enhancement
improving existing code or new feature
documentation
Improvements or additions to documentation
priority/high
technical task
technical tasks e.g. deployment
research prototype
methodology
issues related to the testing methodology
data quality
ooni/probe-engine
techdebt
This issue describes technical debt
cleanup
There's need to cleanup stuff a bit
and removed
triage
labels
Aug 27, 2022
bassosimone
added a commit
to ooni/spec
that referenced
this issue
Aug 27, 2022
This pull request documents more precise stdlib resolver naming that will benefit web_connectivity@v0.5. See ooni/probe#2238, ooni/probe#2237, ooni/probe-cli#885. While there, fix the template for new PRs.
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Aug 27, 2022
See what we documented at ooni/spec#257 Reference issue: ooni/probe#2238 See also the related ooni/spec PR: ooni/spec#257 See also ooni/probe#2237 While there, bump webconnectivity@v0.5 version because this change has an impact onto the generated data format. The drop in coverage is unavoidable because we've written some tests for `measurex` to ensure we deal with DNS resolvers and transport names correctly depending on the splitting policy we use. (However, `measurex` is only used for the `tor` experiment and, per the step-by-step design document, new experiments should use `measurexlite` instead, so this is hopefully fine(TM).) While there, fix a broken integration test that does not run in `-short` mode.
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 2, 2022
This issue was mentioned in ooni/probe#2137. I double checked the spec and this field is not mentioned. We will update it when we do ooni/probe#2238.
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 2, 2022
This issue was mentioned in ooni/probe#2137. I double checked the spec and this field is not mentioned. We will update it when we do ooni/probe#2238.
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
Basically, `t0` and `transaction_id` should be optional. Version 0.4.x of web_connectivity should not include them, version 0.5.x should. There is a technical reason why v0.4.x should not include them. The code it is based on, tracex, does not record these two fields. Whereas, v0.5.x, uses measurexlite, which records these two fields. Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
This diff adds the following fields to webconnectivity@v0.5: 1. agent, always set to "redirect" (legacy field); 2. client_resolver, properly initialized w/ the resolver's IPv4 address; 3. retries, legacy field always set to null; 4. socksproxy, legacy field always set to null. Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
The general idea behind this field is that we would be able in the future to tweak the data model for some fields, by declaring we're using a later version, so it seems useful to add it. See ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
This diff fixes a bug where measurexlite was using "tls" as the protocol for the TLS handshake when using TCP. While this choice _could_ make sense, the rest of the code we have written so far uses "tcp" instead. Using "tcp" makes more sense because it allows you to search for the same endpoint across different events by checking for the same network and for the same endpoint rather than special casing TLS handshakes for using "tls" when the endpoint is "tcp". See ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
* fix(model/archival.go): more optional keys Basically, `t0` and `transaction_id` should be optional. Version 0.4.x of web_connectivity should not include them, version 0.5.x should. There is a technical reason why v0.4.x should not include them. The code it is based on, tracex, does not record these two fields. Whereas, v0.5.x, uses measurexlite, which records these two fields. Part of ooni/probe#2238 * fix(webconnectivity@v0.5): add more fields This diff adds the following fields to webconnectivity@v0.5: 1. agent, always set to "redirect" (legacy field); 2. client_resolver, properly initialized w/ the resolver's IPv4 address; 3. retries, legacy field always set to null; 4. socksproxy, legacy field always set to null. Part of ooni/probe#2238 * fix(webconnectivity@v0.5): register extensions The general idea behind this field is that we would be able in the future to tweak the data model for some fields, by declaring we're using a later version, so it seems useful to add it. See ooni/probe#2238 * fix(measurexlite): use tcp or quic for tls handshake network This diff fixes a bug where measurexlite was using "tls" as the protocol for the TLS handshake when using TCP. While this choice _could_ make sense, the rest of the code we have written so far uses "tcp" instead. Using "tcp" makes more sense because it allows you to search for the same endpoint across different events by checking for the same network and for the same endpoint rather than special casing TLS handshakes for using "tls" when the endpoint is "tcp". See ooni/probe#2238 * chore: run alltests.yml for "alltestsbuild" branches Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
#942) * fix(model/archival.go): more optional keys Basically, `t0` and `transaction_id` should be optional. Version 0.4.x of web_connectivity should not include them, version 0.5.x should. There is a technical reason why v0.4.x should not include them. The code it is based on, tracex, does not record these two fields. Whereas, v0.5.x, uses measurexlite, which records these two fields. Part of ooni/probe#2238 * fix(webconnectivity@v0.5): add more fields This diff adds the following fields to webconnectivity@v0.5: 1. agent, always set to "redirect" (legacy field); 2. client_resolver, properly initialized w/ the resolver's IPv4 address; 3. retries, legacy field always set to null; 4. socksproxy, legacy field always set to null. Part of ooni/probe#2238 * fix(webconnectivity@v0.5): register extensions The general idea behind this field is that we would be able in the future to tweak the data model for some fields, by declaring we're using a later version, so it seems useful to add it. See ooni/probe#2238 * fix(measurexlite): use tcp or quic for tls handshake network This diff fixes a bug where measurexlite was using "tls" as the protocol for the TLS handshake when using TCP. While this choice _could_ make sense, the rest of the code we have written so far uses "tcp" instead. Using "tcp" makes more sense because it allows you to search for the same endpoint across different events by checking for the same network and for the same endpoint rather than special casing TLS handshakes for using "tls" when the endpoint is "tcp". See ooni/probe#2238 * chore: run alltests.yml for "alltestsbuild" branches Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
Code based on urlgetter had this event and we would like to have this event with step-by-step code as well. Because there's no tracing for HTTP when using step-by-step, we will need to include emitting these events inside the boilerplate. By doing that, we emit events out of order, so make sure we sort them by T, which is "the moment when the event was collected". Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
Code based on urlgetter had this event and we would like to have this event with step-by-step code as well. Because there's no tracing for HTTP when using step-by-step, we will need to include emitting these events inside the boilerplate. By doing that, we emit events out of order, so make sure we sort them by T, which is "the moment when the event was collected". Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
…done (#943) Code based on urlgetter had this event and we would like to have this event with step-by-step code as well. Because there's no tracing for HTTP when using step-by-step, we will need to include emitting these events inside the boilerplate. By doing that, we emit events out of order, so make sure we sort them by T, which is "the moment when the event was collected". Part of ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
We're bumping the version number to reflect recent improvements in the data format implemented in these pull requests: - #942 - #943 - #944 Reference issue: ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
chore: web_connectivity v0.5.5 We're bumping the version number to reflect recent improvements in the data format implemented in these pull requests: - #942 - #943 - #944 Reference issue: ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
chore: web_connectivity v0.5.5 We're bumping the version number to reflect recent improvements in the data format implemented in these pull requests: - #942 - #943 - #944 Reference issue: ooni/probe#2238
bassosimone
added a commit
to ooni/spec
that referenced
this issue
Sep 8, 2022
bassosimone
added a commit
to ooni/spec
that referenced
this issue
Sep 8, 2022
One last chunk of work is required here: changing "quic" to "udp" in TLS handshake and HTTP results to simplify joins for tabular formats. |
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
This diff changes the data format to prefer "udp" to "quic" everywhere we were previously using "quic". Previously, the code inconsistently used "quic" for operations where we knew we were using "quic" and "udp" otherwise (e.g., for generic operations like ReadFrom). While it would be more correct to say that a specific HTTP request used "quic" rather than "udp", using "udp" consistently allows one to see how distinct events such as ReadFrom and an handshake all refer to the same address, port, and protocol triple. Therefore, this change makes it easier to programmatically unpack a single measurement and create endpoint stats. Before implementing this change, I discussed the problem with @hellais who mentioned that ooni/data is not currently using the "quic" string anywhere. I know that ooni/pipeline also doesn't rely on this string. The only users of this feature have been research-oriented experiments such as urlgetter, for which such a change would actually be acceptable. See ooni/probe#2238
bassosimone
added a commit
to ooni/spec
that referenced
this issue
Sep 8, 2022
See ooni/probe-cli#946 where we also explain why we're doing this change. Reference issue: ooni/probe#2238
bassosimone
added a commit
to ooni/spec
that referenced
this issue
Sep 8, 2022
See ooni/probe-cli#946 where we also explain why we're doing this change. Reference issue: ooni/probe#2238
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
This diff changes the data format to prefer "udp" to "quic" everywhere we were previously using "quic". Previously, the code inconsistently used "quic" for operations where we knew we were using "quic" and "udp" otherwise (e.g., for generic operations like ReadFrom). While it would be more correct to say that a specific HTTP request used "quic" rather than "udp", using "udp" consistently allows one to see how distinct events such as ReadFrom and an handshake all refer to the same address, port, and protocol triple. Therefore, this change makes it easier to programmatically unpack a single measurement and create endpoint stats. Before implementing this change, I discussed the problem with @hellais who mentioned that ooni/data is not currently using the "quic" string anywhere. I know that ooni/pipeline also doesn't rely on this string. The only users of this feature have been research-oriented experiments such as urlgetter, for which such a change would actually be acceptable. See ooni/probe#2238 and ooni/spec#262.
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Sep 8, 2022
This diff changes the data format to prefer "udp" to "quic" everywhere we were previously using "quic". Previously, the code inconsistently used "quic" for operations where we knew we were using "quic" and "udp" otherwise (e.g., for generic operations like ReadFrom). While it would be more correct to say that a specific HTTP request used "quic" rather than "udp", using "udp" consistently allows one to see how distinct events such as ReadFrom and an handshake all refer to the same address, port, and protocol triple. Therefore, this change makes it easier to programmatically unpack a single measurement and create endpoint stats. Before implementing this change, I discussed the problem with @hellais who mentioned that ooni/data is not currently using the "quic" string anywhere. I know that ooni/pipeline also doesn't rely on this string. The only users of this feature have been research-oriented experiments such as urlgetter, for which such a change would actually be acceptable. See ooni/probe#2238 and ooni/spec#262.
This work has now been completed! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cleanup
There's need to cleanup stuff a bit
data quality
documentation
Improvements or additions to documentation
enhancement
improving existing code or new feature
methodology
issues related to the testing methodology
ooni/probe-engine
priority/high
research prototype
techdebt
This issue describes technical debt
technical task
technical tasks e.g. deployment
We want to ensure that web_connectivity@v0.5.x and measurexlite generate a data format that is compliant with what we have written inside the spec. If not, one of them or both need to change, to improve clarity.
It's good to perform this kind of activities now that web_connectivity@v0.5.x has not been released on a stable version of ooniprobe yet. This fact also holds for other experiments. But we must ensure we don't break existing experiments: they should continue to emit the same dataformat they used to do before.
We also need to make sure we are not going to emit optional fields, such as
t0
when they're not available.The text was updated successfully, but these errors were encountered: