Reorder series key tagset #12391

e-dard · 2019-03-06T16:27:46Z

This PR makes a breaking change to the 2.x line.

Currently we use the special tag keys _m and _f to index the measurement name and field key.
This results in keys ordered like this:

<orgbucketid>,ABBA=cool,_f=temp,_m=cpu,server=b
<orgbucketid>,ABBA=cool,_f=zz,_m=mem,server=a
<orgbucketid>,_f=temp,_m=cpu,server=a
<orgbucketid>,_f=yy,_m=mem,server=a
<orgbucketid>,_f=yy,_m=mem,server=b

With this change we will now use the null byte for measurement name, and byte 255 for the field key.

Keys will now be ordered as follows:

<orgbucketid>,\x00=cpu,ABBA=cool,server=b,\xff=temp
<orgbucketid>,\x00=cpu,server=a,\xff=temp
<orgbucketid>,\x00=mem,ABBA=cool,server=a,\xff=zz
<orgbucketid>,\x00=mem,server=a,\xff=yy
<orgbucketid>,\x00=mem,server=b,\xff=yy

which is preferable in terms of access patterns, and means we need two bytes less per series key.

Finally, this change will ensure that tag sets are ordered the same way in series keys on both the 1.x and 2.x lines. We can guarantee this because we will only ever be adding a tag pair to the beginning and end of a tag set. This should lead to more elegant and performant implementations of import/export of TSM data between the 1.x and 2.x lines.

Further, this PR adds another change:

all user provided tag keys, values and field key will be validated as only containing valid utf-8 characters.

benbjohnson

Looks solid to me. 💫

models/points.go

storage/config.go

storage/engine.go

tsdb/explode.go

nathanielc

I looked this over again LGTM

We will want to validate that all tag key/value data is valid unicode. This commit changes the validation helper to only validate provided tags, since measurements are currently very likely to contain invalid utf-8 characters. There are two exceptions to the tag validation: the validation of the special tag keys for measurements and field keys.

The storage engine will now drop any points that contain invalid tag data. Special tag keys for the measurement and field key will be excepted from this validation.

fntlnz

LGTM 👼

e-dard requested review from zeebo and stuartcarnie March 6, 2019 16:27

e-dard assigned nathanielc and unassigned nathanielc Mar 6, 2019

e-dard requested review from nathanielc and benbjohnson and removed request for stuartcarnie March 6, 2019 16:28

benbjohnson approved these changes Mar 6, 2019

View reviewed changes

zeebo reviewed Mar 6, 2019

View reviewed changes

models/points.go Outdated Show resolved Hide resolved

storage/config.go Outdated Show resolved Hide resolved

storage/engine.go Show resolved Hide resolved

tsdb/explode.go Outdated Show resolved Hide resolved

nathanielc approved these changes Mar 6, 2019

View reviewed changes

zeebo approved these changes Mar 6, 2019

View reviewed changes

e-dard force-pushed the er-tags branch 2 times, most recently from 3d3022b to 183741d Compare March 6, 2019 18:39

benbjohnson mentioned this pull request Mar 6, 2019

Merge point parse & explode #12377

Merged

e-dard force-pushed the er-tags branch from 183741d to 59f996e Compare March 6, 2019 20:30

nathanielc approved these changes Mar 6, 2019

View reviewed changes

e-dard added 9 commits March 7, 2019 09:56

Change location and value for internal tag keys

f029f16

ExplodePoints now complies with new keys

1cb20b6

Storage engine now validates all tags are utf-8

f21be14

The storage engine will now drop any points that contain invalid tag data. Special tag keys for the measurement and field key will be excepted from this validation.

Update emitted keys and tests

3f1bec0

Fix expected flux cases

8bdf857

Remove erroneous print

098ec71

ddress PR feedback

582ed68

Fix test case

807ee67

e-dard force-pushed the er-tags branch from 59f996e to 807ee67 Compare March 7, 2019 09:56

e-dard changed the title ~~[WIP] Reorder series key tagset~~ Reorder series key tagset Mar 7, 2019

e-dard force-pushed the er-tags branch from 33ab71d to f5019ab Compare March 7, 2019 11:32

Update CHANGELOG

40b53d4

e-dard force-pushed the er-tags branch from f5019ab to 40b53d4 Compare March 7, 2019 11:35

fntlnz approved these changes Mar 7, 2019

View reviewed changes

e-dard merged commit 27970f8 into master Mar 7, 2019

e-dard deleted the er-tags branch March 7, 2019 11:51

e-dard mentioned this pull request Mar 12, 2019

Consider enforcing no leading underscore in tag keys #11961

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorder series key tagset #12391

Reorder series key tagset #12391

e-dard commented Mar 6, 2019 •

edited

Loading

benbjohnson left a comment

nathanielc left a comment

fntlnz left a comment

Reorder series key tagset #12391

Reorder series key tagset #12391

Conversation

e-dard commented Mar 6, 2019 • edited Loading

benbjohnson left a comment

Choose a reason for hiding this comment

nathanielc left a comment

Choose a reason for hiding this comment

fntlnz left a comment

Choose a reason for hiding this comment

e-dard commented Mar 6, 2019 •

edited

Loading