Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HDR histogram support to c-s frontend #95

Open
soyacz opened this issue Jun 6, 2024 · 5 comments
Open

Add HDR histogram support to c-s frontend #95

soyacz opened this issue Jun 6, 2024 · 5 comments
Assignees

Comments

@soyacz
Copy link

soyacz commented Jun 6, 2024

Running elasticity test (grow-shrink) failed due:

2024-06-06 14:26:46.373: (DisruptionEvent Severity.ERROR) period_type=end event_id=5d608d77-aaa3-4259-832f-aa1eace72925 duration=30m43s: nemesis_name=GrowShrinkClusterParallel target_node=Node perf-latency-grow-shrink-ubuntu-db-node-dcfb0b61-3 [34.201.20.143 | 10.12.0.246] errors='NoneType' object has no attribute 'get_percentile_to_value_dict'
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 339, in _build_histograms_summary_with_interval_by_tag
end_interval).build_histogram_summary_by_tag(path, hdr_tag)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 332, in build_histogram_summary_by_tag
return _CSRangeHistogramBuilder._get_summary_for_operation_by_hdr_tag(histogram)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 301, in _get_summary_for_operation_by_hdr_tag
if parsed_summary := _CSRangeHistogramBuilder._convert_raw_histogram(histogram.histogram, histogram.start_time,
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 311, in _convert_raw_histogram
if percentiles := histogram.get_percentile_to_value_dict(PERCENTILES):
AttributeError: 'NoneType' object has no attribute 'get_percentile_to_value_dict'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5202, in wrapper
result = method(*args[1:], **kwargs)
File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 4056, in disrupt_grow_shrink_cluster_parallel
self.steady_state_latency()
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 215, in wrapped
result["hdr"] = args[0].tester.get_cs_range_histogram_by_interval(stress_operation=workload,
File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 3709, in get_cs_range_histogram_by_interval
return make_cs_range_histogram_summary_by_interval(
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 64, in make_cs_range_histogram_summary_by_interval
return builder.build_histograms_summary_with_interval(path, interval)
File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/csrangehistogram.py", line 197, in build_histograms_summary_with_interval
if res := future.result():
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
AttributeError: 'NoneType' object has no attribute 'get_percentile_to_value_dict'

and later:

2024-06-06 15:01:30.372: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=5247dc39-d640-4e71-b661-027bda159ef6, source=PerformanceRegressionTest.test_latency_write_with_nemesis (performance_regression_test.PerformanceRegressionTest)() message=Traceback (most recent call last):
File "/home/ubuntu/scylla-cluster-tests/performance_regression_test.py", line 626, in test_latency_write_with_nemesis
self.run_workload(stress_cmd=self.params.get('stress_cmd_w'), nemesis=True)
File "/home/ubuntu/scylla-cluster-tests/performance_regression_test.py", line 343, in run_workload
check_latency()
File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 3175, in check_latency_during_ops
latency_results = json.load(file)
File "/usr/local/lib/python3.10/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor dcfb0b61-f2b7-486f-9d17-53ecc028830f
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs dcfb0b61-f2b7-486f-9d17-53ecc028830f

Logs:

No logs captured during this run.

Jenkins job URL
Argus

@fruch
Copy link

fruch commented Jun 6, 2024

Seems a bit there are few assumptions here about data that isn't available i.e. HDR histograms

I would seek a way to skip it for cql-stress

@soyacz
Copy link
Author

soyacz commented Jun 7, 2024

yes, simple workaround is to disable hdr histogram analysis with param which I did for my testing.
Otherwise, we need to implement workaround for cql-stress until this feature is not there.

@piodul
Copy link
Collaborator

piodul commented Jun 7, 2024

Just wanted to point out that there is support for outputting hdr histograms in cql-stress, but AFAIK it was only done for the scylla-bench frontend. Perhaps it shouldn't be too difficult to reuse the code for the cassandra-stress frontend if it's the same format.

@roydahan roydahan changed the title error when parsing cql-stress results Add HDR histogram support to c-s frontend Jun 23, 2024
@roydahan roydahan transferred this issue from scylladb/scylla-cluster-tests Jun 23, 2024
@soyacz
Copy link
Author

soyacz commented Sep 20, 2024

seems this one is the last feature needed to allow cql-stress replace c-s in perf tests. @roydahan

@roydahan
Copy link
Collaborator

@dkropachev, @muzarski we will need to prioritze it sometime soon, so we can start test performance also with rust driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants