-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore: Optimize local cache sync when resuming a query that had docs deleted #7229
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…st with spec tests (#7107)
We'll need to wait for #7228 to be merged before this to fix the node es bundle. |
dconeybe
changed the title
Bloom filter
Firestore: Optimize local cache sync when resuming a query that had docs deleted.
Apr 18, 2023
dconeybe
changed the title
Firestore: Optimize local cache sync when resuming a query that had docs deleted.
Firestore: Optimize local cache sync when resuming a query that had docs deleted
Apr 18, 2023
…those sdks merge their bloom filter support
dconeybe
approved these changes
Apr 18, 2023
dconeybe
added a commit
to firebase/firebase-android-sdk
that referenced
this pull request
Apr 24, 2023
…ocal cache sync when resuming a query that had docs deleted)
Merged
dconeybe
added a commit
to firebase/firebase-ios-sdk
that referenced
this pull request
Apr 26, 2023
FYI This was released on April 27, 2023 in v9.21.0: https://firebase.google.com/support/release-notes/js#version_9210_-_april_27_2023 |
For a discussion about the implementation details of this PR, see firebase/firebase-ios-sdk#12270. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement an optimization in Firestore when resuming a query where documents have either been deleted or no longer match the query on the server (a.k.a. "removed"). The optimization avoids re-running the entire query just to figure out which documents were deleted/removed in most cases.
Background Information
When a Firestore query is sent to the server, the server replies with the documents in the result set and a "resume token". The result set and the resume token are stored locally on the client. If the same query is resumed at a later time, such as by a later call to
getDocs()
or a listener registered viaonSnapshot()
reconnects, then the client sends the same query to the server, but this time includes the resume token. To save on network bandwidth, the server only replies with the documents that have changed since the timestamp encoded in the resume token. Additionally, if the query is resumed within 30 minutes, and persistence is enabled, then the customer is only billed for the delta, and not the entire result set (see https://firebase.google.com/docs/firestore/pricing#listens for the official and most up-to-date details on pricing).The problem is that if some documents in the result set were deleted or removed (i.e. changed to no longer match the query) then the server simply does not observe their presence in the result set and does not send updates for them. This leaves the client's cache in an inconsistent state because it still contains the deleted/removed documents. To work around this cache inconsistency, the server also replies with an "existence filter", a count of the documents that matched the query on the server. The client then compares this count with the number of documents that match the query in its local cache. If those counts are the same then all is good and the result set is raised via a snapshot; however, if the counts do not match then this is called an "existence filter mismatch" and the client re-runs the entire query from scratch, without a resume token, to figure out which documents in its local cache were deleted or removed. Then, the deleted or removed documents go into "limbo" and individual document reads are issued for each of the limbo documents to bring them into sync with the server.
The inefficiency is realized when the client "re-runs the entire query from scratch". This is inefficient for 2 reasons: (1) it re-transmits documents that were just sent when the query was resumed, wasting network bandwidth and (2) it results in being billed for document reads of the entire result set.
The Optimization
To avoid this expensive re-running of the query from scratch the server has been modified to also reply with the names of the documents that had not changed since the timestamp encoded in the resume token. With this additional information, the client can determine which documents in its local cache were deleted or removed, and directly put them into "limbo" without having to re-run the entire query from scratch.
The document names sent from the server are encoded in a data structure called a "bloom filter". A bloom filter is a size-efficient way to encode a "set" of strings. The size efficiency comes at the cost of correctness; that is, when testing for membership in a bloom filter it may incorrectly report that a value is contained in the bloom filter when in fact it is not (a.k.a. a "false positive"). The probability of this happening is made to be exceptionally low by tweaking the parameters of the bloom filter. However, when a false positive does happen then the client is forced to fall back to a full requery. But eliminating the vast majority of the full requeries is an overall win.
Googlers see go/firestore-ttl-deletion-protocol-changes for full details.
Spec tests ported to Android in firebase/firebase-android-sdk#4929 and to iOS in firebase/firebase-ios-sdk#11185
The entire feature was ported to Android in firebase/firebase-android-sdk#4982 and to iOS in firebase/firebase-ios-sdk#11457.