Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cassandra key value storage #961

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

irach-ramos
Copy link
Contributor

No description provided.

@irach-ramos irach-ramos force-pushed the cassandra-key-value-storage-impl branch 2 times, most recently from fb27567 to a230e1b Compare September 23, 2024 15:34
golem-test-framework/src/config/cli.rs Show resolved Hide resolved
golem-common/src/config.rs Show resolved Hide resolved
golem-test-framework/src/config/env.rs Show resolved Hide resolved
golem-test-framework/src/config/env.rs Outdated Show resolved Hide resolved
golem-test-framework/src/config/env.rs Show resolved Hide resolved
golem-worker-executor-base/src/cassandra.rs Outdated Show resolved Hide resolved
})
}

pub async fn create_docker_schema(&self) -> Result<(), String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this docker specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.
I guess the user will have installed Cassandra, this is just for test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to only run this in tests, but that does not make it docker specific, this is code that creates the schema in any cassandra connection.

Also it is key-value-store specific, so we should have it in the keyvalue/cassandra module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for now it is just the kv implementation that is done, that's why we have kv specific, but we'll add indexed and blob too.

I think we need to move, for now, cassandra.rs from top level golem-executor-base to storage as we have done for sqlite_types.rs, which contains all sql related statement for the 3 types of storage.

@vigoo @noise64 WDYT ?

self.session
.query_unpaged(
Query::new(format!(
r#"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be either indented as the current source is indented (i know that these are not auto formatted, but we can keep them in place manually), or start them at zero indent?

}

pub fn with(&self, svc_name: &'static str, api_name: &'static str) -> CassandraLabelledApi {
CassandraLabelledApi {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended relation between CassandraSession, CassandraLabelledApi, CassandraKeyValueStorage?

iIm a bit confused, as the CassandraSession and CassandraLabelledApi suggest some generic functionality, but AFAIU all the specific key values storage implementation is in CassandraLabelledApi, which wraps a CassandraSession, which contains the specific key-value schema, and CassandraKeyValueStorage is calling CassandraSession, and wrapping it before every call into a CassandraLabelledApi.

Maybe I'm missing something, but

  • I see no reason for a separate CassandraSession, the schema is tied to CassandraKeyValueStorage
  • I think the work done by CassandraLabelledApi could be a generic wrapper method in CassandraKeyValueStorage, instead of duplicating all the API
  • and I think the queries should live in CassandraKeyValueStorage

Copy link
Contributor

@noise64 noise64 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I do see now that we have in RedisLabbelled in golem-common, but that is not "IndexedKeyValue" level, rather a generic redis wrapper (ofc because redis is a key value too in general, the two layer matches somewhat). I think we can have CassandraLabelled, but that should be in common, and on the level of cassandra operations (e.g. execute query with the common labels and a custom "query-name" label)

Copy link
Contributor

@noise64 noise64 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigoo wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem to move CassandraLabelled into common is that we are going to have new dependencies in CLI, which I remember we want to avoid, that's why I kept everything in executor-worker modules until the refactor comes to be able to move all common stuff into common without impacting CLI or others.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we also have service base and / or we can use features.
which refactor is that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redis code in common adds metrics and logging to the redis client and it just directly wraps the redis operations, independent of any actual use case.

I see that in this top-level cassandra module there is key-value store specific queries, which is definitely not something we want. Everything key-value store specific should be in the storage/keyvalue/cassandra module.

If you want to have common metrics and logging for cassandra, like we have for redis, then you can wrap the cassandra library but that should just instrument (wrap) the library's functions and not add any logic on top if it.

Where this "common" cassandra wrapper is is another question, top-level worker-executor-base is definitely not a good place for it. Until we need it in any other service, we can keep it in worker-executor-base, but let's move it to the storage module at least.

set_tracing: bool,
}

impl CassandraSession {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this struct and impl is still specific to our key value store schema, and not a generic cassandra session. anything that is using the kv_store table etc. tables should be named and placed as part of the KVStore implementation, and only things that are using generic cassandra primitives should be named as such cassandra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I tried to explain yesterday, this is a common struct as CassandraLabelledApi, which will contain all the kv and indexed storage queries, tables, and functions.
This PR is KV related, and I have the Indexed related changes for this file and other files.
Once this PR is approved I'll create the Indexed storage PR, probably you will see what I mean at that time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are trying to ask is to follow the structure of how it was done for Redis:

  • One layer wraps the 3rd party library and adds metrics and logging - without adding any higher level operations (no actual queries, just the same operations provided by the 3rd party library, wrapped)
  • KV store implementation built on this, only containing KV store related queries
  • Indexed store implementations built on this, only containing indexed store related queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, probably we can create another ticket to fix sqlite implementation as it is done in the same way as Cassandra.
There is a struct like this
SqliteLabelledApi { svc_name: &'static str, api_name: &'static str, pool: SqlitePoolx, }
to avoid passing common parameters like svc_name & api_name for every method to the metrics and logging functionality, so this Labelled is holding the 3 storage kind of queries for simplicity, but it seems we are going to change that in Cassandra, then I think we need to change also in sqlite.

WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the sqlite one should also use similar naming and structuring

@irach-ramos irach-ramos force-pushed the cassandra-key-value-storage-impl branch from bc07cf9 to 58d07f1 Compare October 2, 2024 12:32
Copy link
Contributor

@noise64 noise64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some notes about the cassandra layzness and for creating followup issues

@@ -54,6 +56,7 @@ pub trait TestDependencies {
self.rdb().kill();
self.redis_monitor().kill();
self.redis().kill();
self.cassandra().kill();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that cassandra is lazy, will this boot up cassandra just to kill it?
if so, i think we should use RwMutex Option instead of lazy cell

})
}

pub async fn create_schema(&self) -> Result<(), String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would still move the (test) schema creations out of the common cassandra (and sqlite) code, and even separate them by usage (for sqlite), but if we create an issue for it, then i'm okay with it for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is only one common schema for the storage, to where you want to move the schema creation ? ,
how you want to separate them ?

Copy link
Contributor

@noise64 noise64 Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the storage type: kv, blob, indexed; and move to that package.
with that ideally we can select separately what to use, and only install what is needed (even if it is only for tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense, because I don't think we will never use kv storage with sqlite and indexed storage in Cassandra or some approach like that, and I don't see neither any advantage in performance or space usage or something useful, but I did it because I think we need to move on with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three storage "types" (kv, indexed, blob) should be completely separate and separately configurable. Even though sqlite + cassandra for example is not a likely combination, it is possible that we will have more backend implementations in the future that make sense to combine them in ways we don't see right now. In general, using something different for kv-store (basically caching data) and indexed-store (our primary storage layer for durable execution) completely makes sense to me.

@irach-ramos irach-ramos force-pushed the cassandra-key-value-storage-impl branch from 9e24fc4 to a616ae9 Compare October 10, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants