-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1876219 -- Consider attachments with newer fields unparsable #6107
Conversation
components/suggest/src/rs.rs
Outdated
@@ -272,16 +280,21 @@ pub(crate) struct DownloadedPocketSuggestion { | |||
#[serde(rename = "highConfidenceKeywords")] | |||
pub high_confidence_keywords: Vec<String>, | |||
pub score: f64, | |||
#[allow(unused)] | |||
#[serde(default)] | |||
pub description: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was specified in the tests, but not used in the code. I'm guessing this is part of the real-world payload, but we just don't need it for anything. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct!
components/suggest/src/rs.rs
Outdated
// We use deny_unknown_fields for all attachment types so that unrecognized fields result in | ||
// `UnparsableRecord` entries. This means that if we add new attachment fields, older clients will | ||
// ignore the suggestions until they know how to parse them. | ||
#[serde(deny_unknown_fields)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think this is going a bit too far the other way—it means that older clients will stop being able to display suggestions that they previously could, as soon as we add new fields to them.
I wonder if we can use #[serde(other)]
to soak up any unknown fields, then check if the map is empty. If it is, we know we can understand the suggestion completely; if it's not, we store the fields we do know, and remember the ID to refetch after a schema version upgrade.
The high-level helpers you added in #6106 would need to be taught about this new case—a record could now be in unparsable_records
and in suggestions.record_id
if we understood some of its fields—which is making me wonder if we actually want to track these record IDs in a separate meta
key, because they're "partially parsable", not "unparsable".
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes a lot of sense. I updated the PR to follow those semantics.
The one thing I didn't do was create a separate meta key for this. I think it might be simpler to keep 1 key and to change the naming from "unparsable records" to "records to retry" or something like that. Do we need to differentiate between the two cases? If so, maybe we could still keep one key but add a field that distinguishes between the two.
Adding another meta key also seems fine to me, I just wanted to discuss it more with you before committing to anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment that deny_unknown_fields
could be too much, but let's discuss more—requesting changes so that I don't forget to check back later 😅
c845011
to
5c86e4a
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6107 +/- ##
=======================================
Coverage 84.07% 84.07%
=======================================
Files 117 117
Lines 15630 15630
=======================================
Hits 13141 13141
Misses 2489 2489 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few comments for your consideration.
To clarify, this will mark the record as unparsable if we see any unknown fields in its attachment payload(s), though the recognized fields would still be ingested as the current behavior. Is that correct?
components/suggest/src/store.rs
Outdated
ingestion_handler(dao, record_id, attachment.suggestions()) | ||
}), | ||
match serde_json::from_slice::<OneOrMany<T>>(&attachment_data) { | ||
Ok(attachments) => self.ingest_record( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: calling those attachments
might confuse the readers, attachment_payload
could be better IMHO.
Yes that's how it should work. |
If we see an attachment with an unknown field we now both ingest it and add it to the `unparsable_records` meta. The reasoning is that an unknown field signals that an attachment likely has data from a future schema that we don't understand yet. Adding it to the 'unparsable_records` meta, forces us to retry sometime in the future. Removed the `SuggestAttachment` type. I think the inner `OneOrMany` better describes what this is doing, so let's just use that. Also I wanted to use `SuggestAttachment` as the name of the trait that all attachment types implement.
5c86e4a
to
ea1f18b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, r=nanj.
This would allow us to "tolerate" more schema changes that don't necessarily require a version bump and an entire db re-ingestion. Very cool!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One suggestion about changing the Boolean to an enum, but the rest looks great—thanks for doing this!
@@ -405,7 +405,7 @@ where | |||
let data = self.settings_client.get_attachment(&attachment.location)?; | |||
writer.write(|dao| { | |||
dao.put_icon(icon_id, &data)?; | |||
dao.handle_ingested_record(record) | |||
dao.handle_ingested_record(record, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could make this an enum instead of a Boolean, or split up handle_ingested_record
into handle_ingested_record
and handle_ingested_record_with_unknown_fields
?
handle_ingested_record(record, false)
feels a bit too close to a "Boolean trap" to me—it's not super obvious from context what false
means.
/// Represents either a single value, or a list of values. This is used to | ||
/// deserialize downloaded attachments. | ||
/// Implemented by the various attachment types | ||
pub trait SuggestAttachment: DeserializeOwned + std::fmt::Debug { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lovely use of traits!
I'm going to close this one because we decided to stop tracking unparsable records. The current system where we throw away all suggestion records and re-download them when the schema changes is working well enough and the unparsable records code adds more complexity than we want. |
I got this working by adding
deny_unknown_fields
to all attachment structs that implementDeserialize
.The tests pass, but I'm not sure I'm doing the right thing with the various serde structs.
Pull Request checklist
[ci full]
to the PR title.Branch builds: add
[firefox-android: branch-name]
to the PR title.