Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse composite patterns using ClassicFormat.parseObject #40100

Merged
merged 24 commits into from
Mar 27, 2019

Conversation

pgomulka
Copy link
Contributor

@pgomulka pgomulka commented Mar 15, 2019

Java-time fails parsing composite patterns when first pattern matches
only the prefix of the input. It expects pattern in longest-shortest
order.
Joda does not suffer from this

Closes #39916

Java-time fails parsing composite patterns when first pattern matches
only the prefix of the input. It expects pattern in longest-shortest
order.
Joda does not suffer from this

closes elastic#39916
@pgomulka pgomulka self-assigned this Mar 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@pgomulka pgomulka changed the title Parse patterns with optionals using parseUnresolved Parse composite patterns using parseUnresolved Mar 15, 2019
@pgomulka
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample

@@ -1585,7 +1585,7 @@ static JavaDateFormatter merge(String pattern, List<DateFormatter> formatters) {
if (printer == null) {
printer = javaDateFormatter.getPrinter();
}
dateTimeFormatters.add(javaDateFormatter.getParser());
dateTimeFormatters.addAll(javaDateFormatter.getParsers());
Copy link
Contributor Author

@pgomulka pgomulka Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.
Possibly the roundUpBuilder should also be used inside JavaDateFormatter constructor?

also I suspect that roundUpBuilder will suffer from the same problem when it is a composite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.

Maybe we can refactor this in a separate PR then? Also, this code should only be there temporary? As soon as we get rid of Joda time in the code base, I expect that we can get rid of quite a few abstractions.

I suspect that roundUpBuilder will suffer from the same problem when it is a composite?

Is it intended to be used as a composite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to be used as a composite?

It is used as a composite in the merge method when constructing the roundUpParser with the appendOptional.
I can imagine we can have the same pattern as on the issue (yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS) but used with this parser. Will try to come up with a testcase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed that with @spinscale and there is no way we could create a pattern that would suffer from the same problem.
For index name calculations like prefix-{2010-01-01/d{yyyy-MM-dd||yyyy-MM-ddTHH}} it would fail parsing, as the only thing expected after | is the timezone (it would fail saying unexpected |)

@pgomulka
Copy link
Contributor Author

pgomulka commented Mar 18, 2019

this change is a bit slower.

on my laptop on this PR branch:

# Run complete. Total time: 00:02:05

Benchmark                             Mode  Cnt     Score    Error  Units
DateFormatterBenchmark.parseJavaDate  avgt   30   923.107 ± 31.714  ns/op
DateFormatterBenchmark.parseJodaDate  avgt   30  1001.870 ± 29.654  ns/op

vs on master

# Run complete. Total time: 00:02:05

Benchmark                             Mode  Cnt     Score    Error  Units
DateFormatterBenchmark.parseJavaDate  avgt   30   807.209 ± 10.100  ns/op
DateFormatterBenchmark.parseJodaDate  avgt   30  1023.225 ± 61.066  ns/op

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of suggestions.

@@ -1585,7 +1585,7 @@ static JavaDateFormatter merge(String pattern, List<DateFormatter> formatters) {
if (printer == null) {
printer = javaDateFormatter.getPrinter();
}
dateTimeFormatters.add(javaDateFormatter.getParser());
dateTimeFormatters.addAll(javaDateFormatter.getParsers());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.

Maybe we can refactor this in a separate PR then? Also, this code should only be there temporary? As soon as we get rid of Joda time in the code base, I expect that we can get rid of quite a few abstractions.

I suspect that roundUpBuilder will suffer from the same problem when it is a composite?

Is it intended to be used as a composite?

}

@Override
public DateFormatter withLocale(Locale locale) {
//TODO not sure that we can keep that shortcut, and the one above
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not because previously we had a common locale for the composite parser and now it is possible that each one has a different locale (not sure whether we should allow this though and throw an IllegalArgumentException in the constructor if that it is the case?) Also, we should probably still keep this optimization in case we only have one parser I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. I don't think we have a code were we define composite formatters with different Locale
We could also refactor this into verifying all parsers having the same Locale or Zone before returning this.

}
}
}
return firstParser().parse(input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this doing a second parse attempt with the first parser, if going through all the parsers did not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I guess we could throw an exception after the loop. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just be sure via ctor check, that the size is never 0? maybe just use an array instead of an array list, than the below check would always be a fixed size check as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never be 0, if no parsers are provided, the printer will be used.
The only reason for this >1 check is to at least when there is only one parser (the regular, not composite pattern) it is not parsing the input twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, will add an exception indicating that all parsers has failed. This won't mislead that the first parser failed (as it is at the moment)

pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Mar 27, 2019
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes elastic#39916
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Mar 27, 2019
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.

closes elastic#39916
pgomulka added a commit that referenced this pull request Mar 27, 2019
) (#40501)

Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes #39916
backport #40100
pgomulka added a commit that referenced this pull request Mar 27, 2019
) (#40502)

Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes #39916
backport #40100
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review here. I just have one ask for ensuring this works in all cases.

@@ -343,6 +343,17 @@ public void testDuellingFormatsValidParsing() {
assertSameDate("2012-W1-1", "weekyear_week_day");
}

public void testCompositeParsing(){
//in all these examples the second pattern will be used
assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some more tests:

  • 3 patterns
  • random ordering of many patterns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add more test cases. Do you have anything specific in mind?
It won't harm adding testcases with 3 patterns, but I am not convinced we need a random ordering here. What scenario would it cover?

pgomulka added a commit that referenced this pull request Apr 4, 2019
) (#40503)

Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.

closes #39916
backport #40100
pgomulka added a commit that referenced this pull request Sep 27, 2019
…#46654)

Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above).
It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used.
We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in
relates #40100
The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771

Those below will expect this change first
relates #46242
relates #45284
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Sep 27, 2019
…elastic#46654)

Currently DateMathParser with roundUp = true is relying on the DateFormatter build with
combined optional sub parsers with defaulted fields (depending on the formatter).
That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
Java.time implementation expects optional parsers in order from most specific to
least specific (reverse in the example above).
It is causing a problem because the first parsing succeeds but does not consume the full input.
The second parser should be used.
We can work around this with keeping a list of RoundUpParsers and iterate over them choosing
the one that parsed full input. The same approach we used for regular (non date math) in
relates elastic#40100
The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771

Those below will expect this change first
relates elastic#46242
relates elastic#45284
pgomulka added a commit that referenced this pull request Sep 30, 2019
…654) (#47217)

Currently DateMathParser with roundUp = true is relying on the DateFormatter build with
combined optional sub parsers with defaulted fields (depending on the formatter).
That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
Java.time implementation expects optional parsers in order from most specific to
least specific (reverse in the example above).
It is causing a problem because the first parsing succeeds but does not consume the full input.
The second parser should be used.
We can work around this with keeping a list of RoundUpParsers and iterate over them choosing
the one that parsed full input. The same approach we used for regular (non date math) in
relates #40100
The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771

Those below will expect this change first
relates #46242
relates #45284
backport #46654
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Dec 9, 2019
…654) (elastic#47217)

Currently DateMathParser with roundUp = true is relying on the DateFormatter build with
combined optional sub parsers with defaulted fields (depending on the formatter).
That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
Java.time implementation expects optional parsers in order from most specific to
least specific (reverse in the example above).
It is causing a problem because the first parsing succeeds but does not consume the full input.
The second parser should be used.
We can work around this with keeping a list of RoundUpParsers and iterate over them choosing
the one that parsed full input. The same approach we used for regular (non date math) in
relates elastic#40100
The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771

Those below will expect this change first
relates elastic#46242
relates elastic#45284
backport elastic#46654
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Feb 7, 2020
…654) (elastic#47217)

Currently DateMathParser with roundUp = true is relying on the DateFormatter build with
combined optional sub parsers with defaulted fields (depending on the formatter).
That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
Java.time implementation expects optional parsers in order from most specific to
least specific (reverse in the example above).
It is causing a problem because the first parsing succeeds but does not consume the full input.
The second parser should be used.
We can work around this with keeping a list of RoundUpParsers and iterate over them choosing
the one that parsed full input. The same approach we used for regular (non date math) in
relates elastic#40100
The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771

Those below will expect this change first
relates elastic#46242
relates elastic#45284
backport elastic#46654
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple date formats in date_histogram aggregation
9 participants