Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a track for tsdb data #222

Merged
merged 14 commits into from
Jan 6, 2022
Merged

Create a track for tsdb data #222

merged 14 commits into from
Jan 6, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Nov 30, 2021

Adds a new rally track for tsdb data. The data comes from a k8s cluster
and we anonymize it.

tsdb/track.json Outdated
"corpora": [
{
"name": "tsdb",
"#base-url": "https://rally-tracks.elastic.co/tsdb",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented this out because I'd like a review on the tools first and then I'll upload the anonymized file.

Adds a new rally track for tsdb data. The data comes from a k8s cluster
and we anonymize it.
@dliappis
Copy link
Contributor

dliappis commented Dec 1, 2021

@michaelbaamonde I assigned this to you, since you're also working on CI for raly-tracks.

@pquentin pquentin self-assigned this Dec 1, 2021
@pquentin pquentin requested review from pquentin and removed request for michaelbaamonde December 1, 2021 11:05
@pquentin
Copy link
Member

pquentin commented Dec 1, 2021

@michaelbaamonde I had started the review already, assigning myself so that we don't duplicate the effort. But please feel free to add your thoughts!

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tools look good, thanks! As you said, I'd like to see the data now to test the track itself.

Were you looking for comments on the openness/anonymization balance or for someone running the script? I've not done either of those things, please tell me if you would like me to.

tsdb/README.md Outdated Show resolved Hide resolved
tsdb/README.md Outdated Show resolved Hide resolved
tsdb/_tools/anonymize.py Show resolved Hide resolved
tsdb/_tools/anonymize.py Outdated Show resolved Hide resolved
tsdb/_tools/anonymize.py Outdated Show resolved Hide resolved
tsdb/_tools/anonymize.py Outdated Show resolved Hide resolved
tsdb/README.md Outdated Show resolved Hide resolved
nik9000 and others added 5 commits December 1, 2021 10:28
Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>
Co-authored-by: Quentin Pradet <quentin.pradet@gmail.com>
@nik9000
Copy link
Member Author

nik9000 commented Dec 1, 2021

@pquentin I've applied that changes you requested.

@nik9000 nik9000 requested a review from pquentin December 1, 2021 17:24
@nik9000
Copy link
Member Author

nik9000 commented Dec 1, 2021

Were you looking for comments on the openness/anonymization balance or for someone running the script? I've not done either of those things, please tell me if you would like me to.

I've talked about the openness/anonymization balance with some security folks and have asked them for a review too. If you are interested in running the script you can - I'll link you to the non-anonymized data in another channel. I was mostly hoping for a review on style and to double check that the script does what I say it does. I figure more eyes are better.

@michaelbaamonde
Copy link
Contributor

I haven't yet had a chance to look over this in detail, but one thing that we need to figure out before merging is how to license this track. I'll start a conversation offline.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the technical part looks good to me! Left two nits.

I tested this successfully, and got 45k docs/s on my laptop. I guess the only parts left to sort out are infosec and legal.

tsdb/README.md Outdated
## TSDB Track

This data is anonymized monitoring data from elastic-apps designed to test
our TSDB project. TSDB needs us to be careful how we anonymize. Too much
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please mention somewhere prominent that by default it only works with Elasticsearch 8.x?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+!:

tsdb/track.json Outdated
{
"source-file": "documents.json",
"document-count": 122613113,
"#compressed-bytes": 4820107188,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth while I got the same amount of uncompressed-bytes, I got a different amount of compressed-bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nik9000
Copy link
Member Author

nik9000 commented Dec 3, 2021 via email

tsdb/README.md Outdated
* `number_of_replicas` (default: 0)
* `number_of_shards` (default: 1)
* `force_merge_max_num_segments` (default: unset): An integer specifying the max amount of segments the force-merge operation should use.
* `index_mode` (default: standard): Whether to make a standard index (`standard`) or time series index (`time_series`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should default to time_series.

@nik9000
Copy link
Member Author

nik9000 commented Jan 4, 2022

@pquentin I've pushed some changes, mostly picking up what you asked me to change, but also making it run in time_series mode by default and getting it up to date with ES's master branch. For the next while I think this'll be a "master branch only" sort of hting.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nik9000 nik9000 merged commit cec1475 into elastic:master Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants