-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP First draft for version field #58256
Conversation
638ada3
to
9b9b556
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first quick round, this looks like a great start. The encoding makes sense to me.
} | ||
System.arraycopy(min.bytes, 0, bytes, 0, BYTES); | ||
System.arraycopy(max.bytes, 0, bytes, BYTES, BYTES); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it take the BytesRef offset into account?
import org.elasticsearch.index.mapper.VersionEncoder; | ||
import org.elasticsearch.index.mapper.VersionEncoder.SortMode; | ||
|
||
public class VersionRangeField extends Field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rename it to Binary16RangeField or something like that since it doesn't do anything that is specific to versions?
+ VersionEncoder.decodeVersion(new BytesRef(min), SortMode.SEMVER) | ||
+ " : " | ||
+ VersionEncoder.decodeVersion(new BytesRef(max), SortMode.SEMVER) | ||
+ "]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do that since the BytesRef may be truncated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure either, the alternative is only to print the BytesRef hex output if we need some "toString".
VERSION("version_range", LengthType.VARIABLE) { | ||
|
||
// TODO check if these are really safe min/max values | ||
private BytesRef MIN_VALUE = new BytesRef(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this helps not create an empty byte[]
private BytesRef MIN_VALUE = new BytesRef(0); | |
private BytesRef MIN_VALUE = new BytesRef(); |
public BytesRef nextDown(Object value) { | ||
// TODO currently no up/down | ||
return (BytesRef) value; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to look at NumericUtils#add in Lucene which does something along those lines
throw e; | ||
} | ||
} | ||
if (fieldType.indexOptions() != IndexOptions.NONE || fieldType.stored()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather like that we encode the first 16 bytes of the encoded version in points to get efficient range queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this as TODO, some pointers would be helpful
|
||
if (includeDefaults || ignoreMalformed.explicit()) { | ||
builder.field("ignore_malformed", ignoreMalformed.value()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this must support all options that are supported by the parser, so it looks like some of them are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added output for "mode" for now
// encoded string, need to re-encode | ||
return encodeVersion(((BytesRef) value).utf8ToString(), mode); | ||
} else { | ||
throw new IllegalArgumentException("Illegal value type: " + value.getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe include the value as well in the error message?
// encode whether version has pre-release parts | ||
if (preReleaseId != null) { | ||
encodedVersion.append(PRERELESE_SEPARATOR_BYTE); // versions with pre-release part sort before ones without | ||
String[] preReleaseParts = preReleaseId.substring(1).split(DOT_SEPARATOR_REGEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to support version strings that have consecutive dots, then I think we should use another split method as this one would skip the empty string in such a case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen reason for supporting consecutive dots yet, currently I would try to catch that in initial validation. Do you have something in mind that would require this?
if (first == false) { | ||
encodedVersion.append(DOT_SEPARATOR_BYTE); | ||
} | ||
boolean isNumeric = preReleasePart.chars().allMatch(x -> Character.isDigit(x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should treat the empty string as a numeric or as a string, does the spec say anything about this corner case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specs say that "Identifiers MUST NOT be empty". Identifiers are the dot-separated parts, so I think we can check this in some form of validation already (e.g. check and error on consecutive dots like mentioned above)
This is a first draft of a new field type aimed at storing software version strings like e.g. versions following the Semantic Versioning scheme. The field should offer exact matching and range queries similar to e.g. what we currently have for IP fields.
This POC contains of some early ideas about how we could encode the version strings into a binary representation that would allow ordering according to e.g. the Semantiv Versioning specification (i.e. precedence in part 11. in https://semver.org) but also potentially allow for more flexibility. Early tests for a "version" field type and a "version_range" range type show how basic search and aggs are possible with this encoding.