-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cassandra-journal.max-result-size via Paging #70
Comments
After reading the docs again, +1 to use the fetch-size on the driver (and simplify The current implementation came from a misunderstanding on my side: I thought the server always limits the size of a result set (using a default value if none is given) so that clients need to query for more results once they've iterated over a previous query. I thought that |
You were right that LIMIT is independent of fetch size and that the server by default will return all the rows. LIMIT is mainly used for manual paging (like we are now) or getting top N results, and will result in you needing to issue another query based on the last row of the previous query. Prior to 2.0 if you didn't specify a LIMIT (paging didn't exist) you would get OutOfMemory in either the coordinator or your app as C* just brought back all the rows. I deleted so much code when paging was added. I believe the driver sets a default fetch-size and without a LIMIT all the rows will come back (and if you read them all into memory you of course still risk a OOM). To add confusion cqlsh adds a 1000 LIMIT to all your queries tho this may have been changed to use fetch-size as I see cqlsh has a less like interface for large results now. I need to confirm this. TLDR we can remove LIMIT and just rely on fetch-size |
OK, sound good. |
[#70] Use cassandra paging to implement max result size
Atm a LIMIT query is used. We could use cassandra paging fetch-size introduced in C* 2.0:
http://datastax.github.io/java-driver/2.1.7/features/paging/
This is used in the C* Spark Connector when bringing a large number of rows into Spark from C*.
It would remove one of the cases in the RowIterator, we're already doing synchronous queries so this would be almost transparent (call to next() would just block). You can also query the paging state to see how many rows can be taken without blocking / get a future for when more rows are available.
WDYT?
The text was updated successfully, but these errors were encountered: