Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder series key tagset #12391

Merged
merged 10 commits into from
Mar 7, 2019
Merged

Reorder series key tagset #12391

merged 10 commits into from
Mar 7, 2019

Conversation

e-dard
Copy link
Contributor

@e-dard e-dard commented Mar 6, 2019

This PR makes a breaking change to the 2.x line.

Currently we use the special tag keys _m and _f to index the measurement name and field key.
This results in keys ordered like this:

<orgbucketid>,ABBA=cool,_f=temp,_m=cpu,server=b
<orgbucketid>,ABBA=cool,_f=zz,_m=mem,server=a
<orgbucketid>,_f=temp,_m=cpu,server=a
<orgbucketid>,_f=yy,_m=mem,server=a
<orgbucketid>,_f=yy,_m=mem,server=b

With this change we will now use the null byte for measurement name, and byte 255 for the field key.

Keys will now be ordered as follows:

<orgbucketid>,\x00=cpu,ABBA=cool,server=b,\xff=temp
<orgbucketid>,\x00=cpu,server=a,\xff=temp
<orgbucketid>,\x00=mem,ABBA=cool,server=a,\xff=zz
<orgbucketid>,\x00=mem,server=a,\xff=yy
<orgbucketid>,\x00=mem,server=b,\xff=yy

which is preferable in terms of access patterns, and means we need two bytes less per series key.

Finally, this change will ensure that tag sets are ordered the same way in series keys on both the 1.x and 2.x lines. We can guarantee this because we will only ever be adding a tag pair to the beginning and end of a tag set. This should lead to more elegant and performant implementations of import/export of TSM data between the 1.x and 2.x lines.

Further, this PR adds another change:

  • all user provided tag keys, values and field key will be validated as only containing valid utf-8 characters.

@e-dard e-dard assigned nathanielc and unassigned nathanielc Mar 6, 2019
@e-dard e-dard requested review from nathanielc and benbjohnson and removed request for stuartcarnie March 6, 2019 16:28
Copy link
Contributor

@benbjohnson benbjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid to me. 💫

models/points.go Outdated Show resolved Hide resolved
storage/config.go Outdated Show resolved Hide resolved
storage/engine.go Show resolved Hide resolved
tsdb/explode.go Outdated Show resolved Hide resolved
Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked this over again LGTM

We will want to validate that all tag key/value data is valid unicode.
This commit changes the validation helper to only validate provided
tags, since measurements are currently very likely to contain invalid
utf-8 characters.

There are two exceptions to the tag validation: the validation of the
special tag keys for measurements and field keys.
The storage engine will now drop any points that contain invalid tag
data. Special tag keys for the measurement and field key will be
excepted from this validation.
@e-dard e-dard changed the title [WIP] Reorder series key tagset Reorder series key tagset Mar 7, 2019
Copy link
Contributor

@fntlnz fntlnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👼

@e-dard e-dard merged commit 27970f8 into master Mar 7, 2019
@e-dard e-dard deleted the er-tags branch March 7, 2019 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants