Implement cassandra-journal.max-result-size via Paging #70

chbatey · 2015-07-28T09:30:25Z

Atm a LIMIT query is used. We could use cassandra paging fetch-size introduced in C* 2.0:

http://datastax.github.io/java-driver/2.1.7/features/paging/

This is used in the C* Spark Connector when bringing a large number of rows into Spark from C*.

It would remove one of the cases in the RowIterator, we're already doing synchronous queries so this would be almost transparent (call to next() would just block). You can also query the paging state to see how many rows can be taken without blocking / get a future for when more rows are available.

WDYT?

krasserm · 2015-08-01T14:56:37Z

After reading the docs again, +1 to use the fetch-size on the driver (and simplify RowIterator).

The current implementation came from a misunderstanding on my side: I thought the server always limits the size of a result set (using a default value if none is given) so that clients need to query for more results once they've iterated over a previous query. I thought that LIMIT is independent of the driver's fetch-size but they seem to be same. I somehow missed in the docs that the server doesn't limit the size of a result set at all. Is my understanding correct?

chbatey · 2015-08-01T15:45:47Z

You were right that LIMIT is independent of fetch size and that the server by default will return all the rows.

LIMIT is mainly used for manual paging (like we are now) or getting top N results, and will result in you needing to issue another query based on the last row of the previous query.

Prior to 2.0 if you didn't specify a LIMIT (paging didn't exist) you would get OutOfMemory in either the coordinator or your app as C* just brought back all the rows. I deleted so much code when paging was added.

I believe the driver sets a default fetch-size and without a LIMIT all the rows will come back (and if you read them all into memory you of course still risk a OOM).

To add confusion cqlsh adds a 1000 LIMIT to all your queries tho this may have been changed to use fetch-size as I see cqlsh has a less like interface for large results now. I need to confirm this.

TLDR we can remove LIMIT and just rely on fetch-size

krasserm · 2015-08-01T16:25:26Z

OK, sound good.

[#70] Use cassandra paging to implement max result size

krasserm added the in progress label Aug 30, 2015

chbatey mentioned this issue Sep 1, 2015

Issue 48 atomic updates #69

Merged

krasserm modified the milestone: 0.4 Sep 1, 2015

krasserm added a commit that referenced this issue Sep 3, 2015

Merge pull request #84 from chbatey/issues/70

9f8ea03

[#70] Use cassandra paging to implement max result size

chbatey closed this as completed Sep 3, 2015

chbatey removed the in progress label Sep 3, 2015

jypma pushed a commit to jypma/akka-persistence-cassandra that referenced this issue Sep 10, 2015

[krasserm#70] Use cassandra paging to implement max result size

a4b5da2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cassandra-journal.max-result-size via Paging #70

Implement cassandra-journal.max-result-size via Paging #70

chbatey commented Jul 28, 2015

krasserm commented Aug 1, 2015

chbatey commented Aug 1, 2015

krasserm commented Aug 1, 2015

Implement cassandra-journal.max-result-size via Paging #70

Implement cassandra-journal.max-result-size via Paging #70

Comments

chbatey commented Jul 28, 2015

krasserm commented Aug 1, 2015

chbatey commented Aug 1, 2015

krasserm commented Aug 1, 2015