Firestore: Optimize local cache sync when resuming a query that had docs deleted #7229

milaGGL · 2023-04-14T18:35:40Z

Implement an optimization in Firestore when resuming a query where documents have either been deleted or no longer match the query on the server (a.k.a. "removed"). The optimization avoids re-running the entire query just to figure out which documents were deleted/removed in most cases.

Background Information

When a Firestore query is sent to the server, the server replies with the documents in the result set and a "resume token". The result set and the resume token are stored locally on the client. If the same query is resumed at a later time, such as by a later call to getDocs() or a listener registered via onSnapshot() reconnects, then the client sends the same query to the server, but this time includes the resume token. To save on network bandwidth, the server only replies with the documents that have changed since the timestamp encoded in the resume token. Additionally, if the query is resumed within 30 minutes, and persistence is enabled, then the customer is only billed for the delta, and not the entire result set (see https://firebase.google.com/docs/firestore/pricing#listens for the official and most up-to-date details on pricing).

The problem is that if some documents in the result set were deleted or removed (i.e. changed to no longer match the query) then the server simply does not observe their presence in the result set and does not send updates for them. This leaves the client's cache in an inconsistent state because it still contains the deleted/removed documents. To work around this cache inconsistency, the server also replies with an "existence filter", a count of the documents that matched the query on the server. The client then compares this count with the number of documents that match the query in its local cache. If those counts are the same then all is good and the result set is raised via a snapshot; however, if the counts do not match then this is called an "existence filter mismatch" and the client re-runs the entire query from scratch, without a resume token, to figure out which documents in its local cache were deleted or removed. Then, the deleted or removed documents go into "limbo" and individual document reads are issued for each of the limbo documents to bring them into sync with the server.

The inefficiency is realized when the client "re-runs the entire query from scratch". This is inefficient for 2 reasons: (1) it re-transmits documents that were just sent when the query was resumed, wasting network bandwidth and (2) it results in being billed for document reads of the entire result set.

The Optimization

To avoid this expensive re-running of the query from scratch the server has been modified to also reply with the names of the documents that had not changed since the timestamp encoded in the resume token. With this additional information, the client can determine which documents in its local cache were deleted or removed, and directly put them into "limbo" without having to re-run the entire query from scratch.

The document names sent from the server are encoded in a data structure called a "bloom filter". A bloom filter is a size-efficient way to encode a "set" of strings. The size efficiency comes at the cost of correctness; that is, when testing for membership in a bloom filter it may incorrectly report that a value is contained in the bloom filter when in fact it is not (a.k.a. a "false positive"). The probability of this happening is made to be exceptionally low by tweaking the parameters of the bloom filter. However, when a false positive does happen then the client is forced to fall back to a full requery. But eliminating the vast majority of the full requeries is an overall win.

Googlers see go/firestore-ttl-deletion-protocol-changes for full details.

Spec tests ported to Android in firebase/firebase-android-sdk#4929 and to iOS in firebase/firebase-ios-sdk#11185

The entire feature was ported to Android in firebase/firebase-android-sdk#4982 and to iOS in firebase/firebase-ios-sdk#11457.

…6839)

…th validation

…st with spec tests (#7107)

dconeybe · 2023-04-15T00:05:14Z

We'll need to wait for #7228 to be merged before this to fix the node es bundle.

…those sdks merge their bloom filter support

…ocal cache sync when resuming a query that had docs deleted)

dconeybe · 2023-04-28T19:45:17Z

FYI This was released on April 27, 2023 in v9.21.0: https://firebase.google.com/support/release-notes/js#version_9210_-_april_27_2023

dconeybe · 2024-01-16T15:25:33Z

For a discussion about the implementation details of this PR, see firebase/firebase-ios-sdk#12270.

milaGGL and others added 30 commits November 14, 2022 17:47

Update proto to include BloomFilter (#6780)

0ccea52

Merge branch 'master' into mila/BloomFilter

dd8b061

Merge branch 'master' into mila/BloomFilter

d6c5756

Merge branch 'master' into mila/BloomFilter

a2c1c24

add BloomFilter class (#6795)

c17af51

Add bloom filter to existence filter and watchFilters spec builder (#…

a3fb711

…6839)

Merge branch 'master' into mila/BloomFilter

4bd34f1

Merge branch 'master' into mila/BloomFilter

eaef9da

Merge branch 'master' into mila/BloomFilter

036849f

Add expectedCount to Target in listen request (#6854)

9e49b4c

Merge branch 'master' into mila/BloomFilter

dd66835

Merge branch 'master' into mila/BloomFilter

5335e7a

apply bloomFilter while handling existence filter mismatch (#6897)

333aac9

Merge branch 'master' into mila/BloomFilter

7e2f069

Merge branch 'master' into mila/BloomFilter

c128eaa

add "no-ios", "no-android" to invalid base64 bitmap spec test case

dee7744

Optimize bloom filter application (#6992)

277f8e1

Merge branch 'master' into mila/BloomFilter

d66f4ca

Merge remote-tracking branch 'origin/master' into HEAD

dbed11f

base64.ts: tweak the comment about why we're doing custom base64 leng…

a3cd73d

…th validation

Merge remote-tracking branch 'origin/master' into HEAD

839ccf7

Merge branch 'master' into mila/BloomFilter

b7072d1

Mila/bloom filter add integration test (#7045)

97ca3ee

Merge remote-tracking branch 'origin/master' into HEAD

5bd6142

skip the integration test when using emulator

7b410e6

Merge remote-tracking branch 'origin/master' into mila/BloomFilter

8a5faad

Fix expectedCount encoding in grpc (#7087)

65e5452

Merge remote-tracking branch 'origin/master' into HEAD

a4ee560

Merge remote-tracking branch 'origin/master' into mila/BloomFilter

465e8df

Add 'existence-filter-mismatch-bloom' to listen request labels and te…

5c2ec00

…st with spec tests (#7107)

Merge remote-tracking branch 'origin/master' into mila/BloomFilter

256cb3e

dconeybe changed the title ~~Bloom filter~~ Firestore: Optimize local cache sync when resuming a query that had docs deleted. Apr 18, 2023

dconeybe changed the title ~~Firestore: Optimize local cache sync when resuming a query that had docs deleted.~~ Firestore: Optimize local cache sync when resuming a query that had docs deleted Apr 18, 2023

dconeybe added 2 commits April 18, 2023 14:50

Merge remote-tracking branch 'origin/master' into mila/BloomFilter

bbc1944

[no ci]

f6ef44c

dconeybe mentioned this pull request Apr 18, 2023

Send target to backend before local processing #5980

Merged

dconeybe added 6 commits April 18, 2023 19:38

Format bloom filter golden test JSON files with prettier

a3ccdeb

.changeset/swift-eels-change.md: update

f30c45f

Remove no-longer-needed exports of TargetBackend and TARGET_BACKEND

0c74eb0

bloom_filter_golden_test_data/README.md added

c238fd2

Disable bloom filter spec tests on android and ios; enable them once …

b8066f1

…those sdks merge their bloom filter support

Merge remote-tracking branch 'origin/master' into mila/BloomFilter

7681f05

dconeybe approved these changes Apr 18, 2023

View reviewed changes

dconeybe added 4 commits April 18, 2023 16:11

Further tweak the change log entry

93dedd1

Merge remote-tracking branch 'origin/master' into mila/BloomFilter

80fc46d

listen_spec.test.ts: withGCEnabled(false) -> ensureManualLruGC()

f795005

Further tweak change log

787d523

milaGGL merged commit 98abcd5 into master Apr 19, 2023

milaGGL deleted the mila/BloomFilter branch April 19, 2023 16:32

dconeybe mentioned this pull request Apr 24, 2023

Firestore: use string values for TargetPurpose enum #7257

Merged

dconeybe added a commit to firebase/firebase-android-sdk that referenced this pull request Apr 24, 2023

Port spec test changes from firebase/firebase-js-sdk#7229 (Optimize l…

0f870e9

…ocal cache sync when resuming a query that had docs deleted)

dconeybe mentioned this pull request Apr 24, 2023

Firestore Spec Tests: Port JS PR 7229 (optimized query resumption using bloom filter) firebase/firebase-android-sdk#4929

Merged

google-oss-bot mentioned this pull request Apr 25, 2023

Version Packages #7266

Merged

dconeybe added a commit to firebase/firebase-ios-sdk that referenced this pull request Apr 26, 2023

update spec test json files from firebase/firebase-js-sdk#7229

9a10e2e

dconeybe mentioned this pull request Apr 26, 2023

Firestore Spec Tests: Port JS PR 7229 (optimized query resumption using bloom filter) firebase/firebase-ios-sdk#11185

Merged

milaGGL mentioned this pull request May 5, 2023

Firestore: Optimize local cache sync when resuming a query that had docs deleted firebase/firebase-android-sdk#4982

Merged

firebase locked and limited conversation to collaborators Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Firestore: Optimize local cache sync when resuming a query that had docs deleted #7229

Firestore: Optimize local cache sync when resuming a query that had docs deleted #7229

Uh oh!

milaGGL commented Apr 14, 2023 •

edited by dconeybe

Loading

Uh oh!

dconeybe commented Apr 15, 2023

Uh oh!

dconeybe commented Apr 28, 2023

Uh oh!

dconeybe commented Jan 16, 2024

Uh oh!

Uh oh!

Firestore: Optimize local cache sync when resuming a query that had docs deleted #7229

Firestore: Optimize local cache sync when resuming a query that had docs deleted #7229

Uh oh!

Conversation

milaGGL commented Apr 14, 2023 • edited by dconeybe Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background Information

The Optimization

Uh oh!

dconeybe commented Apr 15, 2023

Uh oh!

dconeybe commented Apr 28, 2023

Uh oh!

dconeybe commented Jan 16, 2024

Uh oh!

Uh oh!

milaGGL commented Apr 14, 2023 •

edited by dconeybe

Loading