-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata queries with configurable server-side timeouts #1171
base: branch-0.15.x
Are you sure you want to change the base?
Metadata queries with configurable server-side timeouts #1171
Conversation
It's in legacy serialization testing code, so it's going to be removed soon anyway. (cherry picked from commit 0eb98bc)
|
As you can see, our SSL and Authenticate workflows fail, because the ScyllaDB run there is so old that it does not recognize
I guess we urgently need to create new images, to really test that the driver is compatible with those features in the new Scyllas, not 4-years-old ones... cc @dkropachev |
3271538
to
a5adae0
Compare
From what I see these are regular images, based of scylladb/scylla/4.3.rc0 with some patches at |
Done up here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice piece of code - especially the test cases
scylla/src/transport/topology.rs
Outdated
/// Tests that ControlConnection enforces the provided custom timeout | ||
/// iff ScyllaDB is the target node (else ignores the custom timeout). | ||
#[cfg(not(scylla_cloud_tests))] | ||
#[tokio::test] | ||
#[ntest::timeout(2000)] | ||
async fn test_custom_timeouts() { | ||
setup_tracing(); | ||
|
||
let proxy_addr = SocketAddr::new(scylla_proxy::get_exclusive_local_address(), 9042); | ||
let uri = std::env::var("SCYLLA_URI").unwrap_or_else(|_| "127.0.0.1:9042".to_string()); | ||
let node_addr: SocketAddr = resolve_hostname(&uri).await; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ This test requires a running Scylla / C* cluster, right? Is it possible to avoid this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to avoid this by using the proxy in the dry mode. We could mock the whole CQL handshake (OPTIONS -> SUPPORTED -> STARTUP -> READY -> REGISTER -> READY
) and then intercept QUERY
s and PREPARE
s as before. An additional gain would be to test multiple scenarios at ones, i.e. sharded and non-sharded endpoints, independently of the actual cluster being deployed for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the test use the dry mode to avoid dependency on a running cluster.
impl ControlConnection { | ||
async fn query_metadata( | ||
self, | ||
connect_port: u16, | ||
keyspace_to_fetch: &[String], | ||
fetch_schema: bool, | ||
) -> Result<Metadata, QueryError> { | ||
let peers_query = self.clone().query_peers(connect_port); | ||
let keyspaces_query = self.query_keyspaces(keyspace_to_fetch, fetch_schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit: "topology: make metadata fetchers methods on ControlConnection "
🔧 For me it is extremely weird and unintuitive to have multiple impl blocks for a struct (ControlConnection), scattered in various places of 2 modules (control_connection and topology).
Can we please put all the impl ControlConnection
into the control_connection
module, and preferably make them a single impl
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.reddit.com/r/rust/comments/w5l320/is_having_multiple_impl_blocks_idiomatic/
It's a perfectly valid way of logical separation. In this particular case, we separate metadata-related functionalities (defined in the impl block in metadata.rs
main module) from the lower-level functionalities (defined in the impl block in control_connection
module). Also, what we get is that methods defined out of control_connection
module can't access mod-private fields and methods of ControlConnection
.
impl ControlConnection { | ||
async fn query_metadata( | ||
self, | ||
connect_port: u16, | ||
keyspace_to_fetch: &[String], | ||
fetch_schema: bool, | ||
) -> Result<Metadata, QueryError> { | ||
let peers_query = self.clone().query_peers(connect_port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌱 When porting the PR to the main
branch, after applying my previous comment about ControlConnection impls, I think it would be a good idea to extract control_connection
module to a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ControlConnection
could be extracted to a separate file, but with those higher-level methods (query_peers
etc.) left in the metadata.rs
file. The reason is that ControlConnection
itself is a medium, which should be logically separate from the logic performing queries using it.
Without this derive, one cannot call unwrap_err() on query_iter()'s result.
This makes the code more rusty and leverages the type system. Also, it extracts the constants for shared use in the next commit.
Tests need this function to mock ScyllaDB's SUPPORTED frames. For use with the proxy, especially in the dry mode.
a5adae0
to
6ab1f65
Compare
v1.1: resolved comments,
|
The test asserts that for ScyllaDB the timeout is enforced (if set) and for Cassandra is always ignored, in the following cases: - when explicitly disabled (no custom timeout follows), - when explicitly set to some (only set for ScyllaDB),
The timeout is now stored in SessionConfig and passed down to the MetadataReader, which sets up all its `ControlConnection`s with that timeout.
The tests asserts that for ScyllaDB the timeout is enforced (if set) and for Cassandra is always ignored, in the following cases: - when explicitly disabled (no custom timeout follows), - when explicitly set to some (only set for ScyllaDB), - when left as implicit default (only set to ScyllaDB).
6ab1f65
to
0a53197
Compare
There is no point in merging this into 0.15. branch, we're not going to release it another 0.15 release. |
Motivation
Some users tune their server-side timeouts so that they are tighter. For clusters with a large schema, however, this sometimes made schema queries time out. This PR excludes metadata queries from the default server-side timeout by overriding it with a custom one, to prevent timeouts upon querying metadata.
What's done
Adds
USING TIMEOUT
clause to metadata queries when applicable.By metadata I mean schema + topology. I did it for both for consistency, even though only schema was required in the issue).
Note: this is purposefully opened against
branch-0.15.x
, as it has been considered an urgent fix to real user problems. We're going to release a minor (0.15.2) with this. After it's accepted and #1166 is done, I'll port this tomain
.Implementation
To ensure that no fetches are accidentally omitted from having the timeout added, both now and in the future, I created a new abstraction:
ControlConnection
, which is just a wrapper overArc<Connection>
that exposes some methods ofArc<Connection>
(query_iter
,execute_iter
,prepare
, and some minor getters), taking care of adding the timeout before execution if applicable.A corresponding setting is added to
SessionConfig
, andSessionBuilder
gets a new method for configuring it.Testing
I added two tests:
ControlConnection
, which asserts that for ScyllaDB the timeout is enforced (if set) and for Cassandra is always ignored, in the following cases:Fixes: #1052
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.