-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorder series key tagset #12391
Reorder series key tagset #12391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid to me. 💫
3d3022b
to
183741d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked this over again LGTM
We will want to validate that all tag key/value data is valid unicode. This commit changes the validation helper to only validate provided tags, since measurements are currently very likely to contain invalid utf-8 characters. There are two exceptions to the tag validation: the validation of the special tag keys for measurements and field keys.
The storage engine will now drop any points that contain invalid tag data. Special tag keys for the measurement and field key will be excepted from this validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👼
This PR makes a breaking change to the 2.x line.
Currently we use the special tag keys
_m
and_f
to index the measurement name and field key.This results in keys ordered like this:
With this change we will now use the null byte for measurement name, and byte
255
for the field key.Keys will now be ordered as follows:
which is preferable in terms of access patterns, and means we need two bytes less per series key.
Finally, this change will ensure that tag sets are ordered the same way in series keys on both the 1.x and 2.x lines. We can guarantee this because we will only ever be adding a tag pair to the beginning and end of a tag set. This should lead to more elegant and performant implementations of import/export of TSM data between the 1.x and 2.x lines.
Further, this PR adds another change: