Group CQL keys into a single query [cql-tests] [tp-tests] #3879

porunov · 2023-07-05T15:32:04Z

Benchmarking

Local single instance Cassandra node

CQL benchmarks for local single instance Cassandra node is located here: https://gist.github.com/porunov/b0804e789ad6af56625230785ae8698b

There 2 tests for master branch (with enabled and disabled slice query grouping) as well as 4 tests with different configurations of this PR.

The observations from local testing is that generally performance increases 10-20% for cases when there are only keys grouping used (i.e. multi-key cases but not multi-slice cases).
I noticed that when there are few keys to be grouped (less than 500) then performance decreases (up to 20% performance decrease). I imagine it's due to less parallelism on deserialization side (1 thread instead of CPUs * 2 threads) as well as increased amount of data to be fetched (when grouping keys we need to return key beck per each column fetched).
Another observation is that when both slice and keys grouping is triggered for a specific query the performance also drops about 5%. It could be that it's harder for Cassandra to process a query with two IN operators, but it's hard to verify the real reason.

Overall, from the above tests it's clearly seen that:

When keys grouping is disabled - the performance is the same as in master branch (i.e. no regression and no improvements).
When keys grouping is enabled most of the queries with more than 700 keys gain additional performance improvements (mostly 10-20%, but goes up to 40% for some cases).
There are some cases when performance drops with enabled keys grouping.

Distributed cluster

At home I started a distributed ScyllaDB cluster on 3 different machines and started JanusGraph benchmarking on the 4th machine via:

# Machine 1:
docker run --name scyllaTest --network host --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla:5.2.4 --developer-mode=0 --smp 4 --memory 28G --api-address 0.0.0.0 --listen-address 192.168.68.100
# Machine 2:
docker run --name scyllaTest --network host --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla:5.2.4 --developer-mode=0 --smp 4 --memory 28G --api-address 0.0.0.0 --seeds=192.168.68.100,192.168.68.65,192.168.68.54 --listen-address 192.168.68.65
# Machine 3
docker run --name scyllaTest --network host --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla:5.2.4 --developer-mode=0 --smp 4 --memory 28G --api-address 0.0.0.0 --seeds=192.168.68.100,192.168.68.65,192.168.68.54 --listen-address 192.168.68.54

Unfortunately, processors, disks, and RAM are different and have different performance characteristics. Moreover this was a home WIFI network up to 1800 Mbps with possible network congestion from neighbor devices. Thus, this testing should be taken with a grain of salt because it could be influenced by a number of factors.

I couldn't finish all the tests yet, but quick first tests of getAdjacentVerticesLocalCounts show that when keys grouping is enabled the performance actually drops 5-8%. I did run this test multiple times and the results are consistent (5%-8% percent performance decrease).
It seems that the amount of queries in the distributed environment becomes less important that the amount of data fetched. Seems the additional key fetching per each column influence performance more than I expected.
It's hard to verify if the performance decrease is really related to the fact that additional data is being fetched or the way IN operator is processed on the storage node side.
After a meeting with AstraDB developers they ensured me that this way of grouping partition keys together is legit and they usually don't recommend using IN operator because users usually don't group partition keys by same token ranges. However, in our case, it should be OK. Nevertheless, they warned that the checksum verification by coordinator node could still potentially be executed by multiple queries which could potentially affect performance, but should be verified case by case.
Another grey area for now is internal parallelism on the IN query when executed by a node. It isn't obvious weather Cassandra / ScyllaDB utilizes multiple cores for processing a single IN query or not. In case only a single CPU is utilized for IN query but multiple CPUs are utilized to process separate queries then the internal parallelism of separate queries could potentially be higher (in non overloaded clusters).

Conclusion

Performance improvement of the keys grouping feature is questionable. It might improve performance in some cases and decrease performance in other cases.
However, the obvious benefit is that this gives users a choice of enabling keys grouping for Serverless cases when users pay per read for each CQL query. In some cases performance might be not the main factor and the price reduction could be more needed.

I propose to have this option be disabled by default and users should enable it only for cases when they are sure they will gain benefits from grouping CQL queries.

Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there an issue associated with this PR? Is it referenced in the commit message?
Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you written and/or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

li-boxuan · 2023-07-13T04:36:20Z

docs/configs/janusgraph-cfg.md

+| storage.cql.grouping.slice-allowed | If `true` this allows multiple Slice queries which are allowed to be performed as non-range queries (i.e. direct equality operation) to be grouped together into a single CQL query via `IN` operator. Notice, currently only operations to fetch properties with Cardinality.SINGLE are allowed to be performed as non-range queries (edges fetching or properties with Cardinality SET or LIST won't be grouped together).
+If this option if `false` then each Slice query will be executed in a separate asynchronous CQL query even when grouping is allowed. | Boolean | true | MASKABLE |
+| storage.cql.grouping.slice-limit | Maximum amount of grouped together slice queries into a single CQL query.
+Notice, for ScyllaDB this option should not exceed the maximum number of distinct clustering key restrictions per query which can be changed by ScyllaDB configuration option `max-partition-key-restrictions-per-query` (https://enterprise.docs.scylladb.com/branch-2022.2/faq.html#how-can-i-change-the-maximum-number-of-in-restrictions). For AstraDB this limit is set to 20 and should be asked to be changed by AstraDB side via `partition_keys_in_select_failure_threshold` and `in_select_cartesian_product_failure_threshold` threshold configurations (https://docs.datastax.com/en/astra-serverless/docs/plan/planning.html#_cassandra_yaml).


Users might be confused about should be asked to be changed by AstraDB side. My first impression was one can ask AstraDB customer support to change this, but it seems the value is a fixed number and cannot be configured.

The value is generally fixed, but you may ask AstraDB customer support and they may modify some threshold restrictions for your own use case.
That’s of course case by case, and it’s not guaranteed they will increase those numbers for your specific case.
However, it’s possible.
I will try to make it more clear in this message that it’s generally fixed, but there is a chance they will increase those limits per your request.

Changed the description, so that it would be clear for AstraDB side.

li-boxuan · 2023-07-13T04:53:54Z

...src/main/java/org/janusgraph/diskstorage/cql/service/GroupingAsyncQueryExecutionService.java

+                                              ExecutorService executorService,
+                                              QueryBackPressure queryBackPressure) {
+        this.session = session;
+        sliceGroupingLimit = configuration.get(SLICE_GROUPING_LIMIT) < 1 ? 1 : configuration.get(SLICE_GROUPING_LIMIT);


Suggested change

sliceGroupingLimit = configuration.get(SLICE_GROUPING_LIMIT) < 1 ? 1 : configuration.get(SLICE_GROUPING_LIMIT);

sliceGroupingLimit = Math.min(configuration.get(SLICE_GROUPING_LIMIT), 1);

I think it should be ‘max’ instead of ‘min’ here, but I agree that it looks better like you suggested. I will change it.

Oh yeah it should be max

Applied Math.max

li-boxuan · 2023-07-13T05:30:56Z

janusgraph-cql/src/test/java/org/janusgraph/graphdb/cql/CQLGraphTest.java

+        clopen(configuration);
+
+        // test batching for `values()`
+        TraversalMetrics profile = testLimitedBatch(() -> graph.traversal().V(cs).barrier(elementsAmount).values("singleProperty1", "singleProperty2", "singleProperty3", "setProperty", "listProperty"), configuration);


I have two questions:

testLimitedBatch only compares results with and without batching. In the context of this PR, we would want to compare the results with and without key/column grouping.

Can one have one query that requests different properties (columns) from different vertices (keys)? For example, g.V('v1').values('prop1').as('v1Prop1').V('v2').values('prop2').as('v2Prop2').select('v1Prop1', 'v2Prop2'). I am not sure if a query like this will trigger your new logic at all, but I would like to make sure the grouping doesn't assume we request the same columns for different keys.

That makes sense. I will change to use a new method which compares results also with and without keys grouping.

This logic is applied to a single query like ‘values(“prop1”, “prop2”)’ only because that’s how multi-query is working. However, we may have some data already cached for some vertices in transaction level cache of db-level cache which will modify what we are actually querying from the storage backend. Nevertheless, those will be different query groups. The grouping logic is aware that there might be different columns asked for different set of keys (thats what query group means there). Grouping logic won’t ever fetch columns which were not explicitly requested for specific keys.

Nevertheless, those will be different query groups. The grouping logic is aware that there might be different columns asked for different set of keys (thats what query group means there)

Cool 👍

Updated this test to test each configuration with enabled / disabled grouping.

li-boxuan · 2023-07-13T05:32:50Z

We should test this feature against as many cloud providers as possible. I can test it with Amazon Keyspace.

porunov · 2023-07-13T07:30:34Z

We should test this feature against as many cloud providers as possible. I can test it with Amazon Keyspace.

Thank you! That will be very helpful!
On my side I will test it with AstraDB side.
I had a meeting with AstraDB developers and they said that generally they don’t recommend using ‘IN’ query because users often abuse its usage by providing partition keys of different nodes inside ‘IN’ query which results in poor performance.
However, in this case, as we are grouping keys ourselves by the same token ranges, it should be good enough to reduce pricing and don’t affect performance too much.

The one thing to notice, is queries with small number of keys. Generally you queries could be routed to different replicas. If you request 5 keys (for example) and assuming each separate CQL query takes 1 ms, then grouped query could potentially take longer than 1 ms because it is executed on the single node. I’m not sure how IN queries is executed inside that node (i.e. if it uses more than one CPU or not for evaluation), but this also could affect performance. Nevertheless, with queries where same token range keys amount be more than ‘CPUs over all replicas of that token range * grouping limit size’ - performance should not suffer theoretically.
I would say it’s a balance which users should find and decide which ‘keys-min’ and ‘keys-limit’ to use for their case.

Now I also realised that for ‘janusgraph-scylla’ driver, token range only grouping won’t be enough because ScyllaDB driver is also shard aware. We should provide different grouping strategy for ScyllaDB and for token-aware only implementations (like Cassandra, AstraDB, etc.). Thus, I will update the code slightly to be able to select different grouping strategies (i.e. Token-Aware strategy or Token-And-Shard-Aware strategy or user provided custom strategy). I.e. when ScyllaDB driver is used we should group keys by token range + shard id.
This change won’t affect Cassandra / AstraDB / Amazon Keyspaces performance testing or grouping approach.

porunov · 2023-07-14T14:30:31Z

For ScyllaDB driver there is no an easy way to determine Shard id without making this API on ScyllaDB public. The issue to track this feature on ScyllaDB is: scylladb/java-driver#232

Discussion thread: https://forum.scylladb.com/t/how-to-get-node-sharding-information-using-scylla-java-driver/696

I propose to just making non-optimized strategy versions of grouping keys for ScyllaDB as for now (i.e. always execute each key separately for ScyllaDB driver), but when ScyllaDB has this API public, we can improve grouping implementation for ScyllaDB to also group tokens by shards in addition to grouping by Tokens.

porunov · 2023-07-14T21:42:13Z

@li-boxuan I added a new configuration option storage.cql.grouping.keys-class to be able to provide custom grouping strategies. There is only one default strategy right now which groups keys by token ranges only. This strategy won't be efficient for ScyllaDB (unless nodes with 1 CPU only are used) because this strategy ignores Shards concept which ScyllaDB have. I mentioned it in the documentation. When scylladb/java-driver#232 is resolved then we will be able to add a more optimized grouping for janusgraph-scylla storage implementation.

li-boxuan · 2023-07-20T07:17:38Z

The feature storage.cql.grouping.keys-allowed does not work for Amazon Keyspace:

PER PARTITION LIMIT is not supported
Type ':help' or ':h' for help.
Display stack trace? [yN]y
com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: PER PARTITION LIMIT is not supported
	at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48)
	at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
	at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:59)
	at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:31)
	at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
	at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:206)
	at org.janusgraph.diskstorage.cql.service.GroupingAsyncQueryExecutionService.<init>(GroupingAsyncQueryExecutionService.java:150)

but as long as this option is turned off, everything seems to be working as usual, so I am happy with this PR. It would be even better if you could mention that this feature does not work for Amazon Keyspace.

porunov · 2023-07-21T18:38:05Z

The feature storage.cql.grouping.keys-allowed does not work for Amazon Keyspace:

PER PARTITION LIMIT is not supported
Type ':help' or ':h' for help.
Display stack trace? [yN]y
com.datastax.oss.driver.api.core.servererrors.InvalidQueryException: PER PARTITION LIMIT is not supported
	at com.datastax.oss.driver.api.core.servererrors.InvalidQueryException.copy(InvalidQueryException.java:48)
	at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
	at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:59)
	at com.datastax.oss.driver.internal.core.cql.CqlPrepareSyncProcessor.process(CqlPrepareSyncProcessor.java:31)
	at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
	at com.datastax.oss.driver.api.core.cql.SyncCqlSession.prepare(SyncCqlSession.java:206)
	at org.janusgraph.diskstorage.cql.service.GroupingAsyncQueryExecutionService.<init>(GroupingAsyncQueryExecutionService.java:150)

but as long as this option is turned off, everything seems to be working as usual, so I am happy with this PR. It would be even better if you could mention that this feature does not work for Amazon Keyspace.

Thank you @li-boxuan for checking it with Amazon Keyspace. I will surely reference that this feature doesn't work with Amazon Keyspaces.
I tested it with AstraDB (with keys group limit = 20).
When grouping keys by up to 20 keys via IN operator the individual query becomes up to 5 times slower. However, the query returns data for 20 keys together. The thing to notice, is that the amount of data returned for our use-cases increases almost 80% (I thought it will be smaller for some reason). This is due to the fact that we now return partition-key with each row. The amount of data sent decreases 29% (which is good), however, as the amount of data received is 13 times bigger than the amount of data sent - thus, it's not that noticeable.
For this statistics you might have impression that the queries are now 5 times slower and return more redundant data. However it's actually opposite. As our use-cases usually involve querying data for many vertices, we see that the overall performance of our heavy queries increased 22%! Even so, the performance improvement isn't the main goal here, but it's a nice bonus. The main reason of grouping partition keys together is cost optimization for Serverless deployments like AstraDB. Even so we do return 80% more data from AstraDB now, it actually costs much less due to the fact that we simply have up to 20 times less CQL queries for some of the use-cases.
Overall this feature brings positive influence for some of the use-cases , but I think this feature isn't for everyone and it really depends from use-case to use-case if this feature makes sense or not.
I think for less loaded use-cases users would prefer to disable this feature to achieve better query parallelism with potentially better performance, but for loaded use-cases where users struggle with exhausted connection channels or reaching the back-pressure limit this feature might help.

Screenshots below show the moment of enabling grouping feature.

porunov · 2023-07-22T12:57:27Z

@li-boxuan I added information about Amazon Keyspace as well as added one more grouping strategy which groups by same replica sets. The difference between grouping partition keys by token ranges and replica sets doesn't have a big difference, but knowing that a single node may have more than 1 token range grouping by same nodes may be slightly better (i.e. less groups will be created). However grouping by same token ranges means that those tokens leave close to each other on a disk (potentially less disk seeks). Overall the difference between these grouping strategy is small, but one grouping strategy might be preferred to another in some situations.
I still want to add Shard-aware grouping strategy (i.e. 3rd grouping strategy) when ScyllaDB opens this information in their driver (scylladb/java-driver#232), but it will probably be in the follow up PRs.

li-boxuan

I did another pass on the comments and most of them are just nitpicks.

docs/changelog.md

docs/configs/janusgraph-cfg.md

.../java/org/janusgraph/diskstorage/cql/function/slice/AsyncCQLMultiKeyMultiColumnFunction.java

...c/main/java/org/janusgraph/diskstorage/cql/function/slice/AsyncCQLMultiKeySliceFunction.java

...src/main/java/org/janusgraph/diskstorage/cql/service/GroupingAsyncQueryExecutionService.java

porunov · 2023-07-23T15:24:37Z

Thank you @li-boxuan ! I force pushed the changes to resolve your comments here: https://github.com/JanusGraph/janusgraph/compare/93d0c4357e0af0d3072c89ba050c907b605c2f6e..f984038d6d7c056d4d9ad7397bf4153c683031cf

Btw. The failing test dist-11 isn't related to this PR and it currently fails on master. A separate issue about the dist-11 failing CI is #3894

li-boxuan

LGTM 👍

…ests] Fixes JanusGraph#3863 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

porunov added this to the Release v1.0.0 milestone Jul 5, 2023

porunov force-pushed the feature/cql-key-grouping-by-partition branch from 3438649 to 94981f3 Compare July 5, 2023 15:38

janusgraph-bot added the cla: external Externally-managed CLA label Jul 5, 2023

porunov force-pushed the feature/cql-key-grouping-by-partition branch 3 times, most recently from 95bf7f4 to 4ffdd10 Compare July 6, 2023 18:01

porunov mentioned this pull request Jul 8, 2023

Group keys by token ranges for CQL storage backend and execute a single SliceQuery for them via IN operator #3863

Closed

porunov force-pushed the feature/cql-key-grouping-by-partition branch 3 times, most recently from 397d65e to 7227a24 Compare July 8, 2023 19:22

porunov changed the title ~~Group CQL keys into a single query~~ Group CQL keys into a single query [cql-tests] [tp-tests] Jul 11, 2023

porunov force-pushed the feature/cql-key-grouping-by-partition branch 3 times, most recently from 1d5e7bc to 3fad6ae Compare July 11, 2023 22:40

porunov marked this pull request as ready for review July 12, 2023 00:30

porunov requested a review from a team July 12, 2023 00:30

li-boxuan reviewed Jul 13, 2023

View reviewed changes

porunov force-pushed the feature/cql-key-grouping-by-partition branch from 3fad6ae to f40ad5b Compare July 14, 2023 21:36

porunov force-pushed the feature/cql-key-grouping-by-partition branch from f40ad5b to 93d0c43 Compare July 22, 2023 12:50

li-boxuan reviewed Jul 23, 2023

View reviewed changes

porunov force-pushed the feature/cql-key-grouping-by-partition branch from 93d0c43 to f984038 Compare July 23, 2023 15:21

li-boxuan approved these changes Jul 23, 2023

View reviewed changes

Group CQL same token range keys into a single query [cql-tests] [tp-t…

a069074

…ests] Fixes JanusGraph#3863 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

porunov force-pushed the feature/cql-key-grouping-by-partition branch from f984038 to a069074 Compare August 4, 2023 13:24

porunov merged commit 6d3ec8d into JanusGraph:master Aug 4, 2023
189 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group CQL keys into a single query [cql-tests] [tp-tests] #3879

Group CQL keys into a single query [cql-tests] [tp-tests] #3879

porunov commented Jul 5, 2023 •

edited

Loading

li-boxuan Jul 13, 2023

porunov Jul 13, 2023

porunov Jul 14, 2023

li-boxuan Jul 13, 2023

porunov Jul 13, 2023

li-boxuan Jul 13, 2023

porunov Jul 14, 2023

li-boxuan Jul 13, 2023

porunov Jul 13, 2023

li-boxuan Jul 13, 2023

porunov Jul 14, 2023

li-boxuan commented Jul 13, 2023

porunov commented Jul 13, 2023

porunov commented Jul 14, 2023

porunov commented Jul 14, 2023

li-boxuan commented Jul 20, 2023

porunov commented Jul 21, 2023

porunov commented Jul 22, 2023

li-boxuan left a comment

porunov commented Jul 23, 2023 •

edited

Loading

li-boxuan left a comment

	sliceGroupingLimit = configuration.get(SLICE_GROUPING_LIMIT) < 1 ? 1 : configuration.get(SLICE_GROUPING_LIMIT);
	sliceGroupingLimit = Math.min(configuration.get(SLICE_GROUPING_LIMIT), 1);

Group CQL keys into a single query [cql-tests] [tp-tests] #3879

Group CQL keys into a single query [cql-tests] [tp-tests] #3879

Conversation

porunov commented Jul 5, 2023 • edited Loading

Benchmarking

Local single instance Cassandra node

Distributed cluster

Conclusion

For all changes:

For code changes:

For documentation related changes:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

li-boxuan commented Jul 13, 2023

porunov commented Jul 13, 2023

porunov commented Jul 14, 2023

porunov commented Jul 14, 2023

li-boxuan commented Jul 20, 2023

porunov commented Jul 21, 2023

porunov commented Jul 22, 2023

li-boxuan left a comment

Choose a reason for hiding this comment

porunov commented Jul 23, 2023 • edited Loading

li-boxuan left a comment

Choose a reason for hiding this comment

porunov commented Jul 5, 2023 •

edited

Loading

porunov commented Jul 23, 2023 •

edited

Loading