Adding cassandra key value storage #961

irach-ramos · 2024-09-20T15:12:14Z

No description provided.

golem-test-framework/src/config/cli.rs

golem-common/src/config.rs

golem-test-framework/src/config/env.rs

golem-worker-executor-base/src/cassandra.rs

noise64 · 2024-09-26T15:32:40Z

golem-worker-executor-base/src/cassandra.rs

+ })
+ }
+
+ pub async fn create_docker_schema(&self) -> Result<(), String> {


is this docker specific?

Yes, it is.
I guess the user will have installed Cassandra, this is just for test.

It's ok to only run this in tests, but that does not make it docker specific, this is code that creates the schema in any cassandra connection.

Also it is key-value-store specific, so we should have it in the keyvalue/cassandra module.

Well, for now it is just the kv implementation that is done, that's why we have kv specific, but we'll add indexed and blob too.

I think we need to move, for now, cassandra.rs from top level golem-executor-base to storage as we have done for sqlite_types.rs, which contains all sql related statement for the 3 types of storage.

@vigoo @noise64 WDYT ?

noise64 · 2024-09-26T15:34:43Z

golem-worker-executor-base/src/cassandra.rs

+ self.session
+ .query_unpaged(
+ Query::new(format!(
+ r#"


can these be either indented as the current source is indented (i know that these are not auto formatted, but we can keep them in place manually), or start them at zero indent?

noise64 · 2024-09-26T16:00:36Z

golem-worker-executor-base/src/cassandra.rs

+ }
+
+ pub fn with(&self, svc_name: &'static str, api_name: &'static str) -> CassandraLabelledApi {
+ CassandraLabelledApi {


What is the intended relation between CassandraSession, CassandraLabelledApi, CassandraKeyValueStorage?

iIm a bit confused, as the CassandraSession and CassandraLabelledApi suggest some generic functionality, but AFAIU all the specific key values storage implementation is in CassandraLabelledApi, which wraps a CassandraSession, which contains the specific key-value schema, and CassandraKeyValueStorage is calling CassandraSession, and wrapping it before every call into a CassandraLabelledApi.

Maybe I'm missing something, but

I see no reason for a separate CassandraSession, the schema is tied to CassandraKeyValueStorage

I think the work done by CassandraLabelledApi could be a generic wrapper method in CassandraKeyValueStorage, instead of duplicating all the API

and I think the queries should live in CassandraKeyValueStorage

So I do see now that we have in RedisLabbelled in golem-common, but that is not "IndexedKeyValue" level, rather a generic redis wrapper (ofc because redis is a key value too in general, the two layer matches somewhat). I think we can have CassandraLabelled, but that should be in common, and on the level of cassandra operations (e.g. execute query with the common labels and a custom "query-name" label)

@vigoo wdyt?

The problem to move CassandraLabelled into common is that we are going to have new dependencies in CLI, which I remember we want to avoid, that's why I kept everything in executor-worker modules until the refactor comes to be able to move all common stuff into common without impacting CLI or others.

well, we also have service base and / or we can use features.
which refactor is that?

The redis code in common adds metrics and logging to the redis client and it just directly wraps the redis operations, independent of any actual use case.

I see that in this top-level cassandra module there is key-value store specific queries, which is definitely not something we want. Everything key-value store specific should be in the storage/keyvalue/cassandra module.

If you want to have common metrics and logging for cassandra, like we have for redis, then you can wrap the cassandra library but that should just instrument (wrap) the library's functions and not add any logic on top if it.

Where this "common" cassandra wrapper is is another question, top-level worker-executor-base is definitely not a good place for it. Until we need it in any other service, we can keep it in worker-executor-base, but let's move it to the storage module at least.

noise64 · 2024-09-30T10:13:24Z

golem-worker-executor-base/src/storage/cassandra.rs

+ set_tracing: bool,
+}
+
+impl CassandraSession {


this struct and impl is still specific to our key value store schema, and not a generic cassandra session. anything that is using the kv_store table etc. tables should be named and placed as part of the KVStore implementation, and only things that are using generic cassandra primitives should be named as such cassandra

Yes, as I tried to explain yesterday, this is a common struct as CassandraLabelledApi, which will contain all the kv and indexed storage queries, tables, and functions.
This PR is KV related, and I have the Indexed related changes for this file and other files.
Once this PR is approved I'll create the Indexed storage PR, probably you will see what I mean at that time.

What we are trying to ask is to follow the structure of how it was done for Redis:

One layer wraps the 3rd party library and adds metrics and logging - without adding any higher level operations (no actual queries, just the same operations provided by the 3rd party library, wrapped)

KV store implementation built on this, only containing KV store related queries

Indexed store implementations built on this, only containing indexed store related queries

I see, probably we can create another ticket to fix sqlite implementation as it is done in the same way as Cassandra.
There is a struct like this
SqliteLabelledApi { svc_name: &'static str, api_name: &'static str, pool: SqlitePoolx, }
to avoid passing common parameters like svc_name & api_name for every method to the metrics and logging functionality, so this Labelled is holding the 3 storage kind of queries for simplicity, but it seems we are going to change that in Cassandra, then I think we need to change also in sqlite.

WDYT ?

yes, the sqlite one should also use similar naming and structuring

noise64

added some notes about the cassandra layzness and for creating followup issues

noise64 · 2024-10-04T08:37:16Z

golem-test-framework/src/config/mod.rs

@@ -54,6 +56,7 @@ pub trait TestDependencies {
 self.rdb().kill();
 self.redis_monitor().kill();
 self.redis().kill();
+ self.cassandra().kill();


now that cassandra is lazy, will this boot up cassandra just to kill it?
if so, i think we should use RwMutex Option instead of lazy cell

noise64 · 2024-10-04T08:39:15Z

golem-worker-executor-base/src/storage/cassandra.rs

+ })
+ }
+
+ pub async fn create_schema(&self) -> Result<(), String> {


i would still move the (test) schema creations out of the common cassandra (and sqlite) code, and even separate them by usage (for sqlite), but if we create an issue for it, then i'm okay with it for now

I think there is only one common schema for the storage, to where you want to move the schema creation ? ,
how you want to separate them ?

based on the storage type: kv, blob, indexed; and move to that package.
with that ideally we can select separately what to use, and only install what is needed (even if it is only for tests)

I don't think it makes sense, because I don't think we will never use kv storage with sqlite and indexed storage in Cassandra or some approach like that, and I don't see neither any advantage in performance or space usage or something useful, but I did it because I think we need to move on with this PR.

The three storage "types" (kv, indexed, blob) should be completely separate and separately configurable. Even though sqlite + cassandra for example is not a likely combination, it is possible that we will have more backend implementations in the future that make sense to combine them in ways we don't see right now. In general, using something different for kv-store (basically caching data) and indexed-store (our primary storage layer for durable execution) completely makes sense to me.

irach-ramos force-pushed the cassandra-key-value-storage-impl branch 2 times, most recently from fb27567 to a230e1b Compare September 23, 2024 15:34

noise64 requested changes Sep 26, 2024

View reviewed changes

irach-ramos force-pushed the cassandra-key-value-storage-impl branch from 887abdb to 1a13dd2 Compare September 27, 2024 08:23

irach-ramos requested review from noise64 and vigoo September 27, 2024 08:31

noise64 reviewed Sep 30, 2024

View reviewed changes

irach-ramos requested a review from noise64 October 1, 2024 08:52

irach-ramos force-pushed the cassandra-key-value-storage-impl branch from bc07cf9 to 58d07f1 Compare October 2, 2024 12:32

noise64 reviewed Oct 4, 2024

View reviewed changes

irach-ramos requested a review from noise64 October 4, 2024 14:40

irach-ramos added 8 commits October 10, 2024 13:54

Adding cassandra key value storage

52f7303

Fix sharding-tests

73c733b

code review

225035c

code review changes

26c60a4

code review changes

b0ee55f

change lazy to optional initialisation

517f5bc

move schema creation to kv storage

08d3800

merge

a616ae9

irach-ramos force-pushed the cassandra-key-value-storage-impl branch from 9e24fc4 to a616ae9 Compare October 10, 2024 14:39

Merge branch 'main' into cassandra-key-value-storage-impl

2ad6e13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding cassandra key value storage #961

Adding cassandra key value storage #961

irach-ramos commented Sep 20, 2024

noise64 Sep 26, 2024

irach-ramos Sep 27, 2024

vigoo Sep 27, 2024

irach-ramos Sep 27, 2024

noise64 Sep 26, 2024

noise64 Sep 26, 2024

noise64 Sep 26, 2024 •

edited

Loading

noise64 Sep 26, 2024 •

edited

Loading

irach-ramos Sep 27, 2024

noise64 Sep 27, 2024

vigoo Sep 27, 2024

noise64 Sep 30, 2024

irach-ramos Oct 1, 2024

vigoo Oct 1, 2024

irach-ramos Oct 1, 2024

noise64 Oct 1, 2024

noise64 left a comment

noise64 Oct 4, 2024

noise64 Oct 4, 2024

irach-ramos Oct 4, 2024

noise64 Oct 4, 2024 •

edited

Loading

irach-ramos Oct 7, 2024

vigoo Oct 7, 2024

Adding cassandra key value storage #961

Are you sure you want to change the base?

Adding cassandra key value storage #961

Conversation

irach-ramos commented Sep 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noise64 Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

noise64 Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noise64 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noise64 Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noise64 Sep 26, 2024 •

edited

Loading

noise64 Sep 26, 2024 •

edited

Loading

noise64 Oct 4, 2024 •

edited

Loading