-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1882381 - Switch to using the data path for the suggest DB #6146
Conversation
9850b52
to
cf14873
Compare
Putting this up slightly early for feedback, but don't merge until Android and iOS is using the |
cf14873
to
5576618
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6146 +/- ##
=======================================
Coverage 84.08% 84.08%
=======================================
Files 117 117
Lines 15629 15629
=======================================
Hits 13141 13141
Misses 2488 2488 ☔ View full report in Codecov by Sentry. |
See #6147 for how this will be used. |
5576618
to
e461055
Compare
e461055
to
001495f
Compare
Pushing this one out again because:
|
1dad61b
to
58508d6
Compare
components/suggest/src/schema.rs
Outdated
@@ -121,6 +121,10 @@ pub const SQL: &str = " | |||
description TEXT NOT NULL, | |||
FOREIGN KEY(suggestion_id) REFERENCES suggestions(id) ON DELETE CASCADE | |||
); | |||
|
|||
CREATE TABLE dismissed_suggestions ( | |||
url TEXT PRIMARY KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly indexing URLs might lead to sub-optimal query performance as vast majority of them begin with "https://www." especially against large tables. We could either index the reversed URLs or follow Places' wisdom of indexing on URL hashes instead.
CREATE TABLE dismissed_urls (
url_hash INTEGER NOT NULL,
url TEXT NOT NULL,
) WITHOUT ROWID;
CREATE INDEX idx_dismissed_urls_url_hash ON dismissed_urls (url_hash);
SELECT
1
FROM
dismissed_urls
WHERE
url_hash = MD5(target_url) -- Much faster index lookup
AND
url = target_url -- Needed to avoid hash collisions
Alternatively, for dismissal records, we often only store the URL hashes in the browser as the collision rate should be super low, and the collision impact (i.e. not serving a suggestion) is normally acceptable.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. I switched to just storing the hash.
86e5f0d
to
63242e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few clarifying questions.
-- Just store the MD5 hash of the dismissed suggestion. The collision rate is low and the | ||
-- impact of a collision is not showing a suggestion, which is not that bad. | ||
CREATE TABLE dismissed_suggestions ( | ||
url_hash INTEGER PRIMARY KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, just a note that we might want to use INSERT OR IGNORE
for inserts to avoid spurious insert failures due to hash collisions.
63242e6
to
5e0c1d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+, thanks!
- Use the `data_path` rather than `cache_path` for the suggest DB. This is prep for for storing the suggestion dismissal data in the DB, which should not be reset on schema upgrades. - Don't always drop and recreate the database when the schema upgrades. Instead, I'm hoping we can use the code from `SuggestDao.clear` to delete the suggestion data so that we re-ingest it. - Other than adding the `dismissed_suggestions` table, this doesn't implement any of the suggestion dismissal functionality.
5e0c1d3
to
b112ff7
Compare
data_path
rather thancache_path
for the suggest DB. This is prep for for storing the suggestion dismissal data in the DB, which should not be reset on schema upgrades.SuggestDao.clear
to delete the suggestion data so that we re-ingest it.dismissed_suggestion
table, this doesn't implement any of the suggestion dismissal functionality.Pull Request checklist
[ci full]
to the PR title.Branch builds: add
[firefox-android: branch-name]
to the PR title.