-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a track for tsdb data #222
Conversation
tsdb/track.json
Outdated
"corpora": [ | ||
{ | ||
"name": "tsdb", | ||
"#base-url": "https://rally-tracks.elastic.co/tsdb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've commented this out because I'd like a review on the tools first and then I'll upload the anonymized file.
Adds a new rally track for tsdb data. The data comes from a k8s cluster and we anonymize it.
@michaelbaamonde I assigned this to you, since you're also working on CI for raly-tracks. |
@michaelbaamonde I had started the review already, assigning myself so that we don't duplicate the effort. But please feel free to add your thoughts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tools look good, thanks! As you said, I'd like to see the data now to test the track itself.
Were you looking for comments on the openness/anonymization balance or for someone running the script? I've not done either of those things, please tell me if you would like me to.
Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>
Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>
@pquentin I've applied that changes you requested. |
I've talked about the openness/anonymization balance with some security folks and have asked them for a review too. If you are interested in running the script you can - I'll link you to the non-anonymized data in another channel. I was mostly hoping for a review on style and to double check that the script does what I say it does. I figure more eyes are better. |
I haven't yet had a chance to look over this in detail, but one thing that we need to figure out before merging is how to license this track. I'll start a conversation offline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the technical part looks good to me! Left two nits.
I tested this successfully, and got 45k docs/s on my laptop. I guess the only parts left to sort out are infosec and legal.
tsdb/README.md
Outdated
## TSDB Track | ||
|
||
This data is anonymized monitoring data from elastic-apps designed to test | ||
our TSDB project. TSDB needs us to be careful how we anonymize. Too much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please mention somewhere prominent that by default it only works with Elasticsearch 8.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:+!:
tsdb/track.json
Outdated
{ | ||
"source-file": "documents.json", | ||
"document-count": 122613113, | ||
"#compressed-bytes": 4820107188, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth while I got the same amount of uncompressed-bytes, I got a different amount of compressed-bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I've never compressed the file! I'll fix that up.
…On Fri, Dec 3, 2021, 12:40 PM Quentin Pradet ***@***.***> wrote:
***@***.**** approved this pull request.
Thanks, the technical part looks good to me! Left two nits.
I tested this successfully, and got 45k docs/s on my laptop. I guess the
only parts left to sort out are infosec and legal.
------------------------------
In tsdb/README.md
<#222 (comment)>:
> @@ -0,0 +1,121 @@
+## TSDB Track
+
+This data is anonymized monitoring data from elastic-apps designed to test
+our TSDB project. TSDB needs us to be careful how we anonymize. Too much
Can you please mention somewhere prominent that by default it only works
with Elasticsearch 8.x?
------------------------------
In tsdb/track.json
<#222 (comment)>:
> + "description": "metricbeat information for elastic-app k8s cluster",
+ "indices": [
+ {
+ "name": "tsdb",
+ "body": "index.json"
+ }
+ ],
+ "corpora": [
+ {
+ "name": "tsdb",
+ "#base-url": "https://rally-tracks.elastic.co/tsdb",
+ "documents": [
+ {
+ "source-file": "documents.json",
+ "document-count": 122613113,
+ "#compressed-bytes": 4820107188,
For what it's worth while I got the same amount of uncompressed-bytes, I
got a different amount of compressed-bytes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#222 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIWXJNGWUHS7A5E4ZOTUPD6ILANCNFSM5JCVUMVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
tsdb/README.md
Outdated
* `number_of_replicas` (default: 0) | ||
* `number_of_shards` (default: 1) | ||
* `force_merge_max_num_segments` (default: unset): An integer specifying the max amount of segments the force-merge operation should use. | ||
* `index_mode` (default: standard): Whether to make a standard index (`standard`) or time series index (`time_series`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should default to time_series
.
@pquentin I've pushed some changes, mostly picking up what you asked me to change, but also making it run in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Adds a new rally track for tsdb data. The data comes from a k8s cluster
and we anonymize it.