Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1882381 - Switch to using the data path for the suggest DB #6146

Merged
merged 1 commit into from
Mar 12, 2024

Conversation

bendk
Copy link
Contributor

@bendk bendk commented Feb 27, 2024

  • Use the data_path rather than cache_path for the suggest DB. This is prep for for storing the suggestion dismissal data in the DB, which should not be reset on schema upgrades.
  • Don't always drop and recreate the database when the schema upgrades. Instead, I'm hoping we can use the code from SuggestDao.clear to delete the suggestion data so that we re-ingest it.
  • Other than adding the dismissed_suggestion table, this doesn't implement any of the suggestion dismissal functionality.

Pull Request checklist

  • Breaking changes: This PR follows our breaking change policy
    • This PR follows the breaking change policy:
      • This PR has no breaking API changes, or
      • There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
  • Quality: This PR builds and tests run cleanly
    • Note:
      • For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
      • If this pull request includes a breaking change, consider cutting a new release after merging.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due dilligence applied in selecting them.

Branch builds: add [firefox-android: branch-name] to the PR title.

@bendk bendk requested review from linabutler and a team February 27, 2024 19:31
@bendk bendk force-pushed the suggest-use-data-path branch 2 times, most recently from 9850b52 to cf14873 Compare February 27, 2024 19:34
@bendk
Copy link
Contributor Author

bendk commented Feb 27, 2024

Putting this up slightly early for feedback, but don't merge until Android and iOS is using the SuggestStoreBuilder (iOS PR, Android PR)

@bendk bendk force-pushed the suggest-use-data-path branch from cf14873 to 5576618 Compare February 27, 2024 19:36
@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.08%. Comparing base (f44e9d0) to head (b112ff7).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6146   +/-   ##
=======================================
  Coverage   84.08%   84.08%           
=======================================
  Files         117      117           
  Lines       15629    15629           
=======================================
  Hits        13141    13141           
  Misses       2488     2488           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bendk
Copy link
Contributor Author

bendk commented Feb 27, 2024

See #6147 for how this will be used.

@bendk bendk force-pushed the suggest-use-data-path branch from 5576618 to e461055 Compare March 6, 2024 19:19
@bendk bendk force-pushed the suggest-use-data-path branch from e461055 to 001495f Compare March 6, 2024 19:24
@bendk
Copy link
Contributor Author

bendk commented Mar 6, 2024

Pushing this one out again because:

  • The Android and iOS PRs are both merged! Nothing is blocking this anymore
  • I realized that my scheme to automatically delete some of the tables but keep the rest was really fragile. I don't think it would work correctly if we ever added or removed tables from the temp tables list. Instead, I'm hoping we can use the code from SuggestDao.clear to do what we want. Does that make sense?

@bendk bendk force-pushed the suggest-use-data-path branch 2 times, most recently from 1dad61b to 58508d6 Compare March 8, 2024 16:10
@@ -121,6 +121,10 @@ pub const SQL: &str = "
description TEXT NOT NULL,
FOREIGN KEY(suggestion_id) REFERENCES suggestions(id) ON DELETE CASCADE
);

CREATE TABLE dismissed_suggestions (
url TEXT PRIMARY KEY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly indexing URLs might lead to sub-optimal query performance as vast majority of them begin with "https://www." especially against large tables. We could either index the reversed URLs or follow Places' wisdom of indexing on URL hashes instead.

CREATE TABLE dismissed_urls (
    url_hash INTEGER NOT NULL,
    url TEXT NOT NULL,
) WITHOUT ROWID;

CREATE INDEX idx_dismissed_urls_url_hash ON dismissed_urls (url_hash);

SELECT
  1
FROM
  dismissed_urls
WHERE
  url_hash = MD5(target_url)  -- Much faster index lookup
AND
  url = target_url  -- Needed to avoid hash collisions

Alternatively, for dismissal records, we often only store the URL hashes in the browser as the collision rate should be super low, and the collision impact (i.e. not serving a suggestion) is normally acceptable.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. I switched to just storing the hash.

components/suggest/src/schema.rs Outdated Show resolved Hide resolved
@bendk bendk force-pushed the suggest-use-data-path branch 2 times, most recently from 86e5f0d to 63242e6 Compare March 11, 2024 17:09
@bendk bendk requested a review from ncloudioj March 11, 2024 17:10
Copy link
Member

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few clarifying questions.

components/suggest/src/schema.rs Show resolved Hide resolved
components/suggest/src/schema.rs Outdated Show resolved Hide resolved
-- Just store the MD5 hash of the dismissed suggestion. The collision rate is low and the
-- impact of a collision is not showing a suggestion, which is not that bad.
CREATE TABLE dismissed_suggestions (
url_hash INTEGER PRIMARY KEY
Copy link
Member

@ncloudioj ncloudioj Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, just a note that we might want to use INSERT OR IGNORE for inserts to avoid spurious insert failures due to hash collisions.

@bendk bendk force-pushed the suggest-use-data-path branch from 63242e6 to 5e0c1d3 Compare March 12, 2024 13:49
Copy link
Member

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+, thanks!

components/suggest/src/schema.rs Show resolved Hide resolved
- Use the `data_path` rather than `cache_path` for the suggest DB.  This
  is prep for for storing the suggestion dismissal data in the DB, which
  should not be reset on schema upgrades.
- Don't always drop and recreate the database when the schema upgrades.
  Instead, I'm hoping we can use the code from `SuggestDao.clear` to
  delete the suggestion data so that we re-ingest it.
- Other than adding the `dismissed_suggestions` table, this doesn't
  implement any of the suggestion dismissal functionality.
@bendk bendk force-pushed the suggest-use-data-path branch from 5e0c1d3 to b112ff7 Compare March 12, 2024 16:14
@bendk bendk enabled auto-merge March 12, 2024 16:16
@bendk bendk added this pull request to the merge queue Mar 12, 2024
Merged via the queue into mozilla:main with commit 222573c Mar 12, 2024
16 checks passed
@bendk bendk deleted the suggest-use-data-path branch March 12, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants