Bug 1876219 -- Consider attachments with newer fields unparsable #6107

bendk · 2024-02-05T19:57:33Z

I got this working by adding deny_unknown_fields to all attachment structs that implement Deserialize.

The tests pass, but I'm not sure I'm doing the right thing with the various serde structs.

Pull Request checklist

Breaking changes: This PR follows our breaking change policy
- This PR follows the breaking change policy:
  - This PR has no breaking API changes, or
  - There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
Quality: This PR builds and tests run cleanly
- Note:
  - For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
  - If this pull request includes a breaking change, consider cutting a new release after merging.
Tests: This PR includes thorough tests or an explanation of why it does not
Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
- Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
Dependencies: This PR follows our dependency management guidelines
- Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

Branch builds: add [firefox-android: branch-name] to the PR title.

bendk · 2024-02-05T19:58:43Z

components/suggest/src/rs.rs

@@ -272,16 +280,21 @@ pub(crate) struct DownloadedPocketSuggestion {
    #[serde(rename = "highConfidenceKeywords")]
    pub high_confidence_keywords: Vec<String>,
    pub score: f64,
+    #[allow(unused)]
+    #[serde(default)]
+    pub description: String,


This was specified in the tests, but not used in the code. I'm guessing this is part of the real-world payload, but we just don't need it for anything. Is that correct?

That's correct!

linabutler · 2024-02-06T00:48:27Z

components/suggest/src/rs.rs

+// We use deny_unknown_fields for all attachment types so that unrecognized fields result in
+// `UnparsableRecord` entries.  This means that if we add new attachment fields, older clients will
+// ignore the suggestions until they know how to parse them.
+#[serde(deny_unknown_fields)]


Hmmm, I think this is going a bit too far the other way—it means that older clients will stop being able to display suggestions that they previously could, as soon as we add new fields to them.

I wonder if we can use #[serde(other)] to soak up any unknown fields, then check if the map is empty. If it is, we know we can understand the suggestion completely; if it's not, we store the fields we do know, and remember the ID to refetch after a schema version upgrade.

The high-level helpers you added in #6106 would need to be taught about this new case—a record could now be in unparsable_records and in suggestions.record_id if we understood some of its fields—which is making me wonder if we actually want to track these record IDs in a separate meta key, because they're "partially parsable", not "unparsable".

WDYT?

I think that makes a lot of sense. I updated the PR to follow those semantics.

The one thing I didn't do was create a separate meta key for this. I think it might be simpler to keep 1 key and to change the naming from "unparsable records" to "records to retry" or something like that. Do we need to differentiate between the two cases? If so, maybe we could still keep one key but add a field that distinguishes between the two.

Adding another meta key also seems fine to me, I just wanted to discuss it more with you before committing to anything.

linabutler

I left a comment that deny_unknown_fields could be too much, but let's discuss more—requesting changes so that I don't forget to check back later 😅

codecov-commenter · 2024-02-06T21:45:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (41367fd) 84.07% compared to head (5c86e4a) 84.07%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #6107   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files         117      117           
  Lines       15630    15630           
=======================================
  Hits        13141    13141           
  Misses       2489     2489

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ncloudioj

Looks good, just a few comments for your consideration.

To clarify, this will mark the record as unparsable if we see any unknown fields in its attachment payload(s), though the recognized fields would still be ingested as the current behavior. Is that correct?

components/suggest/src/rs.rs

ncloudioj · 2024-02-12T17:42:34Z

components/suggest/src/store.rs

-                ingestion_handler(dao, record_id, attachment.suggestions())
-            }),
+        match serde_json::from_slice::<OneOrMany<T>>(&attachment_data) {
+            Ok(attachments) => self.ingest_record(


Nit: calling those attachments might confuse the readers, attachment_payload could be better IMHO.

bendk · 2024-02-12T19:16:01Z

To clarify, this will mark the record as unparsable if we see any unknown fields in its attachment payload(s), though the recognized fields would still be ingested as the current behavior. Is that correct?

Yes that's how it should work.

If we see an attachment with an unknown field we now both ingest it and add it to the `unparsable_records` meta. The reasoning is that an unknown field signals that an attachment likely has data from a future schema that we don't understand yet. Adding it to the 'unparsable_records` meta, forces us to retry sometime in the future. Removed the `SuggestAttachment` type. I think the inner `OneOrMany` better describes what this is doing, so let's just use that. Also I wanted to use `SuggestAttachment` as the name of the trait that all attachment types implement.

ncloudioj

LGTM, r=nanj.

This would allow us to "tolerate" more schema changes that don't necessarily require a version bump and an entire db re-ingestion. Very cool!

linabutler

LGTM! One suggestion about changing the Boolean to an enum, but the rest looks great—thanks for doing this!

linabutler · 2024-02-12T21:52:37Z

components/suggest/src/store.rs

@@ -405,7 +405,7 @@ where
                    let data = self.settings_client.get_attachment(&attachment.location)?;
                    writer.write(|dao| {
                        dao.put_icon(icon_id, &data)?;
-                        dao.handle_ingested_record(record)
+                        dao.handle_ingested_record(record, false)


I wonder if we could make this an enum instead of a Boolean, or split up handle_ingested_record into handle_ingested_record and handle_ingested_record_with_unknown_fields?

handle_ingested_record(record, false) feels a bit too close to a "Boolean trap" to me—it's not super obvious from context what false means.

linabutler · 2024-02-12T21:54:34Z

components/suggest/src/rs.rs

-/// Represents either a single value, or a list of values. This is used to
-/// deserialize downloaded attachments.
+/// Implemented by the various attachment types
+pub trait SuggestAttachment: DeserializeOwned + std::fmt::Debug {


This is a lovely use of traits!

bendk · 2024-02-28T21:40:06Z

I'm going to close this one because we decided to stop tracking unparsable records. The current system where we throw away all suggestion records and re-download them when the schema changes is working well enough and the unparsable records code adds more complexity than we want.

bendk commented Feb 5, 2024

View reviewed changes

bendk requested a review from a team February 5, 2024 20:03

linabutler reviewed Feb 6, 2024

View reviewed changes

linabutler suggested changes Feb 6, 2024

View reviewed changes

bendk force-pushed the suggest-new-attachment-fields branch 2 times, most recently from c845011 to 5c86e4a Compare February 6, 2024 21:37

ncloudioj reviewed Feb 12, 2024

View reviewed changes

bendk force-pushed the suggest-new-attachment-fields branch from 5c86e4a to ea1f18b Compare February 12, 2024 19:28

bendk requested a review from linabutler February 12, 2024 19:31

ncloudioj approved these changes Feb 12, 2024

View reviewed changes

linabutler approved these changes Feb 12, 2024

View reviewed changes

bendk closed this Feb 28, 2024

linabutler mentioned this pull request Feb 28, 2024

Bug 1876208 - API for dismissing suggestions #6147

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1876219 -- Consider attachments with newer fields unparsable #6107

Bug 1876219 -- Consider attachments with newer fields unparsable #6107

bendk commented Feb 5, 2024 •

edited

Loading

bendk Feb 5, 2024

linabutler Feb 6, 2024

linabutler Feb 6, 2024

bendk Feb 6, 2024

linabutler left a comment

codecov-commenter commented Feb 6, 2024

ncloudioj left a comment

ncloudioj Feb 12, 2024

bendk commented Feb 12, 2024

ncloudioj left a comment

linabutler left a comment

linabutler Feb 12, 2024

linabutler Feb 12, 2024

bendk commented Feb 28, 2024

Bug 1876219 -- Consider attachments with newer fields unparsable #6107

Bug 1876219 -- Consider attachments with newer fields unparsable #6107

Conversation

bendk commented Feb 5, 2024 • edited Loading

Pull Request checklist

bendk Feb 5, 2024

Choose a reason for hiding this comment

linabutler Feb 6, 2024

Choose a reason for hiding this comment

linabutler Feb 6, 2024

Choose a reason for hiding this comment

bendk Feb 6, 2024

Choose a reason for hiding this comment

linabutler left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 6, 2024

Codecov Report

ncloudioj left a comment

Choose a reason for hiding this comment

ncloudioj Feb 12, 2024

Choose a reason for hiding this comment

bendk commented Feb 12, 2024

ncloudioj left a comment

Choose a reason for hiding this comment

linabutler left a comment

Choose a reason for hiding this comment

linabutler Feb 12, 2024

Choose a reason for hiding this comment

linabutler Feb 12, 2024

Choose a reason for hiding this comment

bendk commented Feb 28, 2024

bendk commented Feb 5, 2024 •

edited

Loading