-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
measurex: unify data model with engine/netx #2035
Labels
cleanup
There's need to cleanup stuff a bit
ooni/probe-engine
priority/medium
refactoring
techdebt
This issue describes technical debt
wontfix
when an issue won't be addressed (add a comment to the issue as to why this is the case)
Comments
bassosimone
added
priority/medium
refactoring
ooni/probe-engine
techdebt
This issue describes technical debt
cleanup
There's need to cleanup stuff a bit
labels
Feb 22, 2022
3 tasks
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Jun 2, 2022
There are two reasons why this is beneficial: 1. github.com/google/go-cmp is more annoying to use for comparing data structures when there are interfaces to compare. Sure, there's a recipe for teaching it to compare errors, but how about making the errors trivially comparable instead? 2. if we want to send errors over the network, JSON serialization works but we cannot unmarshal the resulting string back to an error, so how about making this representation trivial to serialize (we are not going this now, but we need this property for websteps and it may be sensible to try to avoid to have duplicate code because of that -- measurex currently duplicates many tracex functionality and this is quite unfortunate because it slows development down) Additionally, if an error is a string: 3. we cannot very easily use a switch for comparing its possible values with "" representing the absence of errors, while it is more complex to do the same when using a nullable string or even and error 4. if a type is not nullable, it's easier to write safe code for it and we may want to refactor experiments to use the internal representation of measurements for more robust processing code For all these reasons, let's internally use strings in tracex. The overall aim here is to reduce the duplicated code between pre and post-measurex measurements (see ooni/probe#2035).
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Jun 2, 2022
There are two reasons why this is beneficial: 1. github.com/google/go-cmp is more annoying to use for comparing data structures when there are interfaces to compare. Sure, there's a recipe for teaching it to compare errors, but how about making the errors trivially comparable instead? 2. if we want to send errors over the network, JSON serialization works but we cannot unmarshal the resulting string back to an error, so how about making this representation trivial to serialize (we are not going this now, but we need this property for websteps and it may be sensible to try to avoid to have duplicate code because of that -- measurex currently duplicates many tracex functionality and this is quite unfortunate because it slows development down) Additionally, if an error is a string: 3. we can very easily use a switch for comparing its possible values with "" representing the absence of errors, while it is more complex to do the same when using a nullable string or even an error (i.e., an interface) 4. if a type is not nullable, it's easier to write safe code for it and we may want to refactor experiments to use the internal representation of measurements for more robust processing code For all these reasons, let's internally use strings in tracex. The overall aim here is to reduce the duplicated code between pre and post-measurex measurements (see ooni/probe#2035).
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Jun 2, 2022
By just storing the raw certificate we simplify the internal data structure we use. In turn, this enables us to write better unit tests using github.com/google/go-cmp where we can construct the expected result and compare with that. (Yeah, in principle we could also construct the full certificate but I'm not sure it's worth the effort since we basically only care about the raw certificate.) The general idea here is to make tracex more tested. Once it's more tested, I will create separate structs for each event, which is something that measurex also does. Once that is done, we can start ensuring that the code in measurex and the code in tracex do the same thing in terms of storing observations. When also this is done, we can then rewrite measurex to use tracex directly. The overall goal is ooni/probe#2035.
bassosimone
added a commit
to ooni/probe-cli
that referenced
this issue
Jun 2, 2022
By just storing the raw certificate we simplify the internal data structure we use. In turn, this enables us to write better unit tests using github.com/google/go-cmp where we can construct the expected result and compare with that. (Yeah, in principle we could also construct the full certificate but I'm not sure it's worth the effort since we basically only care about the raw certificate.) The general idea here is to make tracex more tested. Once it's more tested, I will create separate structs for each event, which is something that measurex also does. Once that is done, we can start ensuring that the code in measurex and the code in tracex do the same thing in terms of storing observations. When also this is done, we can then rewrite measurex to use tracex directly. The overall goal is ooni/probe#2035.
bassosimone
changed the title
websteps: unify data model with archival and engine/netx
measurex: unify data model with engine/netx
Jun 14, 2022
This issue has now become a WONTFIX because the step-by-step design calls for freezing all measurements packages, for getting rid of the flat data model in new experiments, and for improving the archival data model to add support for missing fields. |
bassosimone
added
the
wontfix
when an issue won't be addressed (add a comment to the issue as to why this is the case)
label
Jun 17, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cleanup
There's need to cleanup stuff a bit
ooni/probe-engine
priority/medium
refactoring
techdebt
This issue describes technical debt
wontfix
when an issue won't be addressed (add a comment to the issue as to why this is the case)
This issue is about unifying the data model used by websteps/measurex with the ones used by engine/netx. We explain below why we have three data models and what is the plan. We'll close this issue when we'll be using a single data model.
Data models
Let's start off by describing all the data models that we have currently. There are two data models of interest. The former is the archival data model (
./internal/model/archival.go
). This data model describes how we should exchange data with the backend. The latter is the flat data model used internally to represent measurements.They have conflicting requirements. The archival data model is basically the historical representation of measurements we've been using since we switched OONI Probe from YAML to JSON. While it would be nice to change this data model to become something more regular and simple to parse (e.g., by getting rid of nullable types and enforcing strict typing policies), we are currently not in a position to do that. Unfortunately, this data model is quite annoying to use internally for processing measurements, because it'a not regular and does not have strict typing. So, there have been a bunch of experiments at using a more regular and simpler to use "internal" data model. I call this model "flat" because it aims to reduce nesting.
As an historical note, we have more than one such "flat" data model implementation. While this is, of course, suboptimal, I guess it's part of the baggage of experimenting and trying to figure out what a good flat data model would look like.
This observation that we have more than one implementation smoothly transitions us to the next section of this issue.
Implementations
Now, onto the implementations (as of ooni/probe-cli@ce40127). The
engine/netx
package has its own flat data model at./internal/engine/netx/trace
and is already using the./internal/model/archival.go
data model to represent data serialized for the OONI backend. The main issue withengine/netx
's flat data model is that every observation shares the same structure, which makes processing and serialization complex. We had the idea some time ago to just submit this kind of observations and let the backend do its magic. Hence, the design leaned towards flattening any kind of event into the same event structure. But we learned with time that this does not body well with processing internally the observations to measurement results (or "probe opinions", if you wish). In fact, the resulting code is quite messy and it looks more like Python than Go code, which is bad in terms of safety and mental baggage when reading it. For this reason, when we started working onwebsteps
, we started experimenting with even more regular data models.The
internal/archival
data format defines its own flat data model and also uses./internal/model/archival.go
. This code is currently unused. It starts as a refactoring of the code inmeasurex
where we also added unit testing. The nice property of the flat representation provided by this package is that every event type has its own model. This fact greatly simplifies processing of events by type, which is useful when you're producing "probe opinions".The
internal/measurex
package has its own definition of both the archival and the flat data models. The code is very similar to the code inside ofinternal/archival
since the latter derives frommeasurex
code. Because it's quite similar tointernal/archival
, also this flat data model is simpler to process and evaluate for the probe.Dynamics
Even before opening this issue, the plan was to refactor code from
measurex
intointernal/archival
and ensuring that all parts of the codebase useinternal/archival
for both using a flat data model and producing measurements using the archival data model.However, because this refactoring is still in progress, we have three data models at the moment.
(I should have documented this issue earlier, but the second best moment to document it appears to be now.)
The plan
In going forward, the plan is roughly the following:
measurex
: as documented in websteps: changes to the oddity model #2034, there are some issues with this concept, so we can safely back it out and perhaps make it an "internal" aspect of a flat data model that just helps us to compare similar measurements.measurex
to depend oninternal/archival
: the main obstacles here are (i) the fact thatinternal/archival
does not have a concept of oddity (but we're removing oddity for now, so we're fine) and (ii) the fact thatmeasurex
uses its flat data model as an exchange format with its test helper. So, regarding (ii), the main problem is that the flat data model ofinternal/measurex
represents errors aserror
, which is a type that cannot round trip when using JSON. I spent some time thinking about this problem and I think that the right way to go about it is using more regular types. So, we basically need the flat data model ofinternal/archival
to reduce as much as possible type complexity. Regarding errors, this would mean abandoning theOptional<String>
type and instead just represent the absence of error as the empty string. Once these changes are made, we basically have the possibility of completely rewritingmeasurex
(and hencewebsteps
) in terms ofinternal/archival
.engine/netx
to use theinternal/archival
package as opposed to its own flat data model.Because these steps are not very small, it may be that each of them requires its own issue.
The text was updated successfully, but these errors were encountered: