WIP First draft for version field #58256

cbuescher · 2020-06-17T13:18:02Z

This is a first draft of a new field type aimed at storing software version strings like e.g. versions following the Semantic Versioning scheme. The field should offer exact matching and range queries similar to e.g. what we currently have for IP fields.
This POC contains of some early ideas about how we could encode the version strings into a binary representation that would allow ordering according to e.g. the Semantiv Versioning specification (i.e. precedence in part 11. in https://semver.org) but also potentially allow for more flexibility. Early tests for a "version" field type and a "version_range" range type show how basic search and aggs are possible with this encoding.

jpountz

I did a first quick round, this looks like a great start. The encoding makes sense to me.

jpountz · 2020-06-18T11:27:18Z

server/src/main/java/org/apache/lucene/document/VersionRangeField.java

+        }
+        System.arraycopy(min.bytes, 0, bytes, 0, BYTES);
+        System.arraycopy(max.bytes, 0, bytes, BYTES, BYTES);
+    }


should it take the BytesRef offset into account?

jpountz · 2020-06-18T11:58:12Z

server/src/main/java/org/apache/lucene/document/VersionRangeField.java

+import org.elasticsearch.index.mapper.VersionEncoder;
+import org.elasticsearch.index.mapper.VersionEncoder.SortMode;
+
+public class VersionRangeField extends Field {


Maybe we should rename it to Binary16RangeField or something like that since it doesn't do anything that is specific to versions?

jpountz · 2020-06-18T14:59:06Z

server/src/main/java/org/apache/lucene/document/VersionRangeField.java

+          + VersionEncoder.decodeVersion(new BytesRef(min), SortMode.SEMVER)
+          + " : "
+          + VersionEncoder.decodeVersion(new BytesRef(max), SortMode.SEMVER)
+          + "]";


I don't think we can do that since the BytesRef may be truncated?

I wasn't sure either, the alternative is only to print the BytesRef hex output if we need some "toString".

jpountz · 2020-06-18T15:06:13Z

server/src/main/java/org/elasticsearch/index/mapper/RangeType.java

+    VERSION("version_range", LengthType.VARIABLE) {
+
+        // TODO check if these are really safe min/max values
+        private BytesRef MIN_VALUE = new BytesRef(0);


nit: this helps not create an empty byte[]

Suggested change

private BytesRef MIN_VALUE = new BytesRef(0);

private BytesRef MIN_VALUE = new BytesRef();

jpountz · 2020-06-18T15:10:19Z

server/src/main/java/org/elasticsearch/index/mapper/RangeType.java

+        public BytesRef nextDown(Object value) {
+            // TODO currently no up/down
+            return (BytesRef) value;
+        }


you might want to look at NumericUtils#add in Lucene which does something along those lines

jpountz · 2020-06-18T15:30:48Z

server/src/main/java/org/elasticsearch/index/mapper/VersionStringFieldMapper.java

+                throw e;
+            }
+        }
+        if (fieldType.indexOptions() != IndexOptions.NONE || fieldType.stored())  {


I'd rather like that we encode the first 16 bytes of the encoded version in points to get efficient range queries

added this as TODO, some pointers would be helpful

jpountz · 2020-06-18T20:01:47Z

server/src/main/java/org/elasticsearch/index/mapper/VersionStringFieldMapper.java

+
+        if (includeDefaults || ignoreMalformed.explicit()) {
+            builder.field("ignore_malformed", ignoreMalformed.value());
+        }


this must support all options that are supported by the parser, so it looks like some of them are missing

added output for "mode" for now

jpountz · 2020-06-18T20:03:13Z

server/src/main/java/org/elasticsearch/index/mapper/VersionStringFieldMapper.java

+                // encoded string, need to re-encode
+                return encodeVersion(((BytesRef) value).utf8ToString(), mode);
+            } else {
+                throw new IllegalArgumentException("Illegal value type: " + value.getClass());


maybe include the value as well in the error message?

jpountz · 2020-06-18T20:07:15Z

server/src/main/java/org/elasticsearch/index/mapper/VersionEncoder.java

+        // encode whether version has pre-release parts
+        if (preReleaseId != null) {
+            encodedVersion.append(PRERELESE_SEPARATOR_BYTE);  // versions with pre-release part sort before ones without
+            String[] preReleaseParts = preReleaseId.substring(1).split(DOT_SEPARATOR_REGEX);


If we want to support version strings that have consecutive dots, then I think we should use another split method as this one would skip the empty string in such a case.

I haven't seen reason for supporting consecutive dots yet, currently I would try to catch that in initial validation. Do you have something in mind that would require this?

jpountz · 2020-06-18T20:08:05Z

server/src/main/java/org/elasticsearch/index/mapper/VersionEncoder.java

+                if (first == false) {
+                    encodedVersion.append(DOT_SEPARATOR_BYTE);
+                }
+                boolean isNumeric = preReleasePart.chars().allMatch(x -> Character.isDigit(x));


I wonder if we should treat the empty string as a numeric or as a string, does the spec say anything about this corner case?

The specs say that "Identifiers MUST NOT be empty". Identifiers are the dot-separated parts, so I think we can check this in some form of validation already (e.g. check and error on consecutive dots like mentioned above)

cbuescher · 2020-07-17T13:15:46Z

@jpountz thanks for the reviews and guidance, I opened #59773 with the "version" field part of this draft, will follow up with a separate PR for the "version_range" work thats based on that, so I will close this draft.

cbuescher added the WIP label Jun 17, 2020

WIP First draft version field

9b9b556

cbuescher force-pushed the versionField-draft branch from 638ada3 to 9b9b556 Compare June 17, 2020 13:45

add prefix query

c743bb8

jpountz reviewed Jun 18, 2020

View reviewed changes

Christoph Büscher added 11 commits June 24, 2020 12:04

Merge branch 'master' into versionField-draft

e52aa7a

Iteration

e1c63e9

Enable lt, gt

d1c74f3

Add some more validation work

d006014

Start moving stuff to xpack

9fe2fc9

Merge branch 'master' into versionField-draft

0d59b24

Creating new xpack project

e5fd97a

line length

5f02f0e

Some simplifications to enums

79fafe4

WIP started on wildcard

629f824

Merge branch 'master' into versionField-draft

d089c69

rudolf mentioned this pull request Jul 6, 2020

Saved Objects should reduce the migrationVersion field count elastic/kibana#70815

Closed

Christoph Büscher added 8 commits July 8, 2020 17:30

WIP adding subfields

e017c8b

Experiment: index malformed values into subfield

9cda4f1

Remove 'natural' sort mode

ccd0687

iter

e81d8af

WIP points

9dacce6

WIP script docvalues

f0e105f

Merge branch 'master' into versionField-draft

bb643bd

WIP docvalues

7199cc9

ebeahan mentioned this pull request Jul 15, 2020

Adopting new version field type elastic/ecs#887

Closed

cbuescher closed this Jul 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP First draft for version field #58256

WIP First draft for version field #58256

cbuescher commented Jun 17, 2020

jpountz left a comment

jpountz Jun 18, 2020

jpountz Jun 18, 2020

jpountz Jun 18, 2020

cbuescher Jun 24, 2020

jpountz Jun 18, 2020

jpountz Jun 18, 2020

jpountz Jun 18, 2020

cbuescher Jun 24, 2020

jpountz Jun 18, 2020

cbuescher Jun 24, 2020

jpountz Jun 18, 2020

jpountz Jun 18, 2020

cbuescher Jun 24, 2020

jpountz Jun 18, 2020

cbuescher Jun 24, 2020

cbuescher commented Jul 17, 2020

	private BytesRef MIN_VALUE = new BytesRef(0);
	private BytesRef MIN_VALUE = new BytesRef();

WIP First draft for version field #58256

WIP First draft for version field #58256

Conversation

cbuescher commented Jun 17, 2020

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cbuescher commented Jul 17, 2020