Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reads dont complete with parallelism #2215

Open
michaelAtCoalesce opened this issue Oct 6, 2024 · 9 comments
Open

reads dont complete with parallelism #2215

michaelAtCoalesce opened this issue Oct 6, 2024 · 9 comments
Assignees
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: enhancement

Comments

@michaelAtCoalesce
Copy link

i have a collection with ~2000 documents, each of ~10k-20k bytes.. so maybe 10-40 megabytes.

when i submit a single get() of all documents in this collection via the node.js SDK, i get great response times. 11.54 seconds.

however, when i submit 4 at nearly the same time such that they run in parallel, i see that the reads finish such that they are barely faster than if i had submitted them sequentially. ~44 seconds.

i would expect to see, slightly slower in the multiple concurrent get() case, but not this linearly increasing behavior.

image
  • OS: mac os x
  • Node.js version: 20.11.1
  • @google-cloud/firestore version: 7.10.0

Steps to reproduce

create a firestore collection of sufficient size and documents. execute a get() on that collection, see that it completes fine as a single request. then submit with multiple concurrent get() read operations, and notice that the time is almost scaling linearly.

@michaelAtCoalesce michaelAtCoalesce added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 6, 2024
@product-auto-label product-auto-label bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Oct 6, 2024
@michaelAtCoalesce
Copy link
Author

michaelAtCoalesce commented Oct 6, 2024

i also tried using the readOnly:true on a transaction, didn't seem to help either. and also tried pagination. no faster.

@tom-andersen
Copy link
Contributor

You may want to try stream(). You should receive documents as they arrive, thereby avoiding the delay. Please let us know if you experience a performance improvement by doing this.

@tom-andersen tom-andersen added type: enhancement and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 8, 2024
@tom-andersen tom-andersen self-assigned this Oct 8, 2024
@tom-andersen
Copy link
Contributor

@michaelAtCoalesce The 4 requests, are they for the same query as the single request? If so, you are requesting 4 times as much data and your network might be the bottleneck. In fact, the linear scaling likely indicates that the bandwidth is being maximized.

Firestore might also be the bottleneck, in which case you should also understand scaling traffic. Firestore will dynamically add more capacity as required, but this takes time.

https://firebase.google.com/docs/firestore/best-practices#ramping_up_traffic
https://firebase.google.com/docs/firestore/understand-reads-writes-scale#avoid_hotspots

@michaelAtCoalesce
Copy link
Author

Yes it’s the same request. It’s not that much data (20 megabytes) so I don’t think it’s a matter of bandwidth… I’m on very fast gigabit internet on a beefy machine, I think it’s something Firestore backend related.

Is there something potentially going on with how reads occur in the backend? Doesn’t Firestore do some kind of optimistic locking on reads that might cause this kind of behavior if multiple readers of a collection are executing?

In this case, I’d be okay with an older snapshot of the data or one from a cache, as long as it was consistent. Is there a way to do that? I tried a readOnly transaction and it didn’t appear to help performance either.

@tom-andersen
Copy link
Contributor

tom-andersen commented Oct 8, 2024

@michaelAtCoalesce I just noticed that you have localhost in your log output. Are you running against the emulator?

@michaelAtCoalesce
Copy link
Author

@michaelAtCoalesce I just noticed that you have localhost in your log output. Are you running against the emulator?

No, live Firestore

@tom-andersen
Copy link
Contributor

tom-andersen commented Oct 8, 2024

Is there something potentially going on with how reads occur in the backend? Doesn’t Firestore do some kind of optimistic locking on reads that might cause this kind of behavior if multiple readers of a collection are executing?

Your queries won't lock anything in read-only transactions, nor outside of transactions.

Optimistic concurrency doesn't use locks at all. This is what some of the other Firestore SDKs use.

This SDK will only use locks within a transaction, but much of that has been optimized away. From what I understand, you are not using any transactions? Since your test doesn't use transactions, so locks should not be a concern.

@tom-andersen
Copy link
Contributor

In this case, I’d be okay with an older snapshot of the data or one from a cache, as long as it was consistent. Is there a way to do that? I tried a readOnly transaction and it didn’t appear to help performance either.

There is an optimization where you specify read time. By doing so, Firestore can serve data from closest replica.

See: https://firebase.google.com/docs/firestore/understand-reads-writes-scale#stale_reads

@michaelAtCoalesce
Copy link
Author

michaelAtCoalesce commented Oct 16, 2024

update - i did another test where i had two separate processes, then submitted a parallel request through each process. it appears as though they complete in parallel well. so it appears that something that is specific to executing parallel reads in a single node process is causing this.

i'm also noticing that for a ~20mb payload, the memory goes up about 800 megabytes... (this is with the preferRest) option it may be potentially related to the fact that the memory usage for this test case goes up so quickly, that it becomes a problem. it might be worth investigating why the memory usage on a single get() call of 20 megabytes worth of data is causing a spike of 800 megabytes in memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: firestore Issues related to the googleapis/nodejs-firestore API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: enhancement
Projects
None yet
Development

No branches or pull requests

2 participants