Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1876219 -- Consider attachments with newer fields unparsable #6107

Closed
wants to merge 1 commit into from

Conversation

bendk
Copy link
Contributor

@bendk bendk commented Feb 5, 2024

I got this working by adding deny_unknown_fields to all attachment structs that implement Deserialize.

The tests pass, but I'm not sure I'm doing the right thing with the various serde structs.

Pull Request checklist

  • Breaking changes: This PR follows our breaking change policy
    • This PR follows the breaking change policy:
      • This PR has no breaking API changes, or
      • There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
  • Quality: This PR builds and tests run cleanly
    • Note:
      • For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
      • If this pull request includes a breaking change, consider cutting a new release after merging.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

Branch builds: add [firefox-android: branch-name] to the PR title.

@@ -272,16 +280,21 @@ pub(crate) struct DownloadedPocketSuggestion {
#[serde(rename = "highConfidenceKeywords")]
pub high_confidence_keywords: Vec<String>,
pub score: f64,
#[allow(unused)]
#[serde(default)]
pub description: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was specified in the tests, but not used in the code. I'm guessing this is part of the real-world payload, but we just don't need it for anything. Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct!

@bendk bendk requested a review from a team February 5, 2024 20:03
// We use deny_unknown_fields for all attachment types so that unrecognized fields result in
// `UnparsableRecord` entries. This means that if we add new attachment fields, older clients will
// ignore the suggestions until they know how to parse them.
#[serde(deny_unknown_fields)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think this is going a bit too far the other way—it means that older clients will stop being able to display suggestions that they previously could, as soon as we add new fields to them.

I wonder if we can use #[serde(other)] to soak up any unknown fields, then check if the map is empty. If it is, we know we can understand the suggestion completely; if it's not, we store the fields we do know, and remember the ID to refetch after a schema version upgrade.

The high-level helpers you added in #6106 would need to be taught about this new case—a record could now be in unparsable_records and in suggestions.record_id if we understood some of its fields—which is making me wonder if we actually want to track these record IDs in a separate meta key, because they're "partially parsable", not "unparsable".

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes a lot of sense. I updated the PR to follow those semantics.

The one thing I didn't do was create a separate meta key for this. I think it might be simpler to keep 1 key and to change the naming from "unparsable records" to "records to retry" or something like that. Do we need to differentiate between the two cases? If so, maybe we could still keep one key but add a field that distinguishes between the two.

Adding another meta key also seems fine to me, I just wanted to discuss it more with you before committing to anything.

Copy link
Contributor

@linabutler linabutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment that deny_unknown_fields could be too much, but let's discuss more—requesting changes so that I don't forget to check back later 😅

@bendk bendk force-pushed the suggest-new-attachment-fields branch 2 times, most recently from c845011 to 5c86e4a Compare February 6, 2024 21:37
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (41367fd) 84.07% compared to head (5c86e4a) 84.07%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6107   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files         117      117           
  Lines       15630    15630           
=======================================
  Hits        13141    13141           
  Misses       2489     2489           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments for your consideration.

To clarify, this will mark the record as unparsable if we see any unknown fields in its attachment payload(s), though the recognized fields would still be ingested as the current behavior. Is that correct?

components/suggest/src/rs.rs Outdated Show resolved Hide resolved
components/suggest/src/rs.rs Outdated Show resolved Hide resolved
components/suggest/src/rs.rs Outdated Show resolved Hide resolved
components/suggest/src/rs.rs Outdated Show resolved Hide resolved
ingestion_handler(dao, record_id, attachment.suggestions())
}),
match serde_json::from_slice::<OneOrMany<T>>(&attachment_data) {
Ok(attachments) => self.ingest_record(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: calling those attachments might confuse the readers, attachment_payload could be better IMHO.

@bendk
Copy link
Contributor Author

bendk commented Feb 12, 2024

To clarify, this will mark the record as unparsable if we see any unknown fields in its attachment payload(s), though the recognized fields would still be ingested as the current behavior. Is that correct?

Yes that's how it should work.

If we see an attachment with an unknown field we now both ingest it and
add it to the `unparsable_records` meta.  The reasoning is that an
unknown field signals that an attachment likely has data from a future
schema that we don't understand yet.  Adding it to the
'unparsable_records` meta, forces us to retry sometime in the future.

Removed the `SuggestAttachment` type.  I think the inner `OneOrMany`
better describes what this is doing, so let's just use that.  Also I
wanted to use `SuggestAttachment` as the name of the trait that all
attachment types implement.
@bendk bendk force-pushed the suggest-new-attachment-fields branch from 5c86e4a to ea1f18b Compare February 12, 2024 19:28
@bendk bendk requested a review from linabutler February 12, 2024 19:31
Copy link
Member

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, r=nanj.

This would allow us to "tolerate" more schema changes that don't necessarily require a version bump and an entire db re-ingestion. Very cool!

Copy link
Contributor

@linabutler linabutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One suggestion about changing the Boolean to an enum, but the rest looks great—thanks for doing this!

@@ -405,7 +405,7 @@ where
let data = self.settings_client.get_attachment(&attachment.location)?;
writer.write(|dao| {
dao.put_icon(icon_id, &data)?;
dao.handle_ingested_record(record)
dao.handle_ingested_record(record, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could make this an enum instead of a Boolean, or split up handle_ingested_record into handle_ingested_record and handle_ingested_record_with_unknown_fields?

handle_ingested_record(record, false) feels a bit too close to a "Boolean trap" to me—it's not super obvious from context what false means.

/// Represents either a single value, or a list of values. This is used to
/// deserialize downloaded attachments.
/// Implemented by the various attachment types
pub trait SuggestAttachment: DeserializeOwned + std::fmt::Debug {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lovely use of traits!

@bendk
Copy link
Contributor Author

bendk commented Feb 28, 2024

I'm going to close this one because we decided to stop tracking unparsable records. The current system where we throw away all suggestion records and re-download them when the schema changes is working well enough and the unparsable records code adds more complexity than we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants