-
Notifications
You must be signed in to change notification settings - Fork 12
implement cass_session_get_metrics
and enable some integration tests
#280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
55ca2d9
to
3b5129d
Compare
Ok, it looks like the I've enabled the test under |
v1.1: Enabled |
v1.2: Well, I gave it a shot, but it failed for one of the Scylla releases..... So it unfortunately is flaky (and will be, unless we address microsecond granularity on the rust-driver side). Disabled this test again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to address this one. Is there any way to force
higher >=1ms latencies in the test?
What about your idea with setting coalescing to > 1ms?
It is not reliable enough.. Notice that there may be a burst of requests that are handled (and appended to request queue) just before the |
Hmm.... When is the "before" measurement of the latency done? If it is before LBP is consulted you could put sleep(1ms) in custom LBP. |
Another possible place for that would be history listener. |
I see that we measure latency per-attempt. We completely ignore failing attempts when measuring the latency. Is it expected behaviour? |
I have no idea tbh. @piodul ? This means that both approaches (LBP, history) are not viable, right? |
LBP will not work for sure. But |
Me neither. What do other drivers measure? |
v1.3: Applied the |
Rebased on master |
We 0-initialize deprecated fields, so we retain the binary compatibility with cpp-driver.
Since we implemented metrics, we can enable HeartbeatFailed test with some adjustments to log filtering. This test seems to fail under valgrind. This is why I enable it to run next to other test that cannot be run under valgrind. Note: The original test seems to be flaky for Cassandra. The following scenario occurred: 1. node2 is paused 2. keepaliver notifies the pool refiller about that 3. refiller removes the connection to node2 (metrics::total_connections -= 1) 4. in the test, we read get_metrics().total_connections < initial_connections - we go out of the loop 5. refiller tries to open a connection again (metrics::total_connections += 1) 6. we read get_metrics().total_connections, and expect total_connections to be less than initial_connections - but it is not. This is why, to combat this, I adjusted the test so the same metrics snapshot is used to leave the loop and make an assertion. In this case, the aforementioned "unlucky" scenario will not happen.
To be quite honest, I thought we'll be able to enable more tests, but: 1. StatsConnections Requires `cass_cluster_set_num_threads_io`, `cass_cluster_set_core_connections_per_host` and `cass_cluster_set_constant_reconnect`. 2. ErrorsConnectionTimeouts Requires `cass_cluster_set_core_connections_per_host`. 3. SpeculativeExecutionRequests Requires `cass_session_get_speculative_execution_metrics`. 4. Requests This one is interesting. It turns out that cpp-driver stores latency stats as microseconds, while rust-driver stores them as milliseconds. Because of that, the mean and median latency is rounded to 0 (at least for my machine). The test expects them to be greater than 0, which makes sense assuming the driver collects the stats with microsecond precision. I'm not sure how to address this one. Is there any way to force higher >=1ms latencies in the test?
Without this, valgrind complains about access to uninitialized memory.
Why does this test not work without adjustments? Well, this is because rust-driver collects latencies with millisecond graunularity. In result, most of the latencies during the tests in local setup are rounded to 0ms. This is why, we somehow need to simulate higher latencies during the test. There is one piece of code that user controls and is executed in rust-driver in between start and end time measurements - namely `HistoryListener::log_attempt_start`. This is where we can add a sleep to simulate higher latencies in local setup. And so, I implemented `SleepingHistoryListener` that does just that. In addition, I implemented the testing API to set such listener on the statement. The "Requests" test is adjusted accordingly, and enabled. Note: Since all latencies during the test in local setup are now expected to be around 1ms, I loosened the stddev check to be `>= 0` instead of `> 0`.
Rebased on master (sped up ci) |
Cluster cluster = | ||
default_cluster().with_connection_heartbeat_interval(1).with_connection_idle_timeout(5); | ||
connect(cluster); | ||
|
||
cass_uint64_t initial_connections = session_.metrics().stats.total_connections; | ||
pause_node(2); | ||
start_timer(); | ||
while (session_.metrics().stats.total_connections >= initial_connections && | ||
|
||
CassMetrics metrics = session_.metrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CassMetrics metrics = session_.metrics(); | |
CassMetrics const metrics = session_.metrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? We update the metrics in a loop.
@wprzytula I addressed all of your commits. There was nothing to change. |
Fixes: #257
Ref: #132
This PR implements
cass_session_get_metrics
and enables two integration tests.Metrics tests
To be quite honest, I thought we'll be able to enable more tests, but:
Requires
cass_cluster_set_num_threads_io
,cass_cluster_set_core _connections_per_host
and
cass_cluster_set_constant_reconnect
.Requires
cass_cluster_set_core_connections_per_host
.Requires
cass_session_get_speculative_execution_metrics
.This one is interesting. It turns out that cpp-driver stores latency
stats as microseconds, while rust-driver stores them as milliseconds.
Because of that, the mean and median latency is rounded to 0 (at least for my machine).
The test expects them to be greater than 0, which makes sense assuming
the driver collects the stats with microsecond precision.
I'm not sure how to address this one. Is there any way to forceThanks to @Lorak-mmk suggestion, I was able to apply thehigher >=1ms latencies in the test?
HistoryListener
hack in the test - this allows us to enforce>=1ms
latencies for the requests.Pre-review checklist
[ ] I have implemented Rust unit tests for the features/changes introduced..github/workflows/build.yml
ingtest_filter
..github/workflows/cassandra.yml
ingtest_filter
.