-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse composite patterns using ClassicFormat.parseObject #40100
Parse composite patterns using ClassicFormat.parseObject #40100
Conversation
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest-shortest order. Joda does not suffer from this closes elastic#39916
Pinging @elastic/es-core-infra |
@elasticmachine run elasticsearch-ci/packaging-sample |
@@ -1585,7 +1585,7 @@ static JavaDateFormatter merge(String pattern, List<DateFormatter> formatters) { | |||
if (printer == null) { | |||
printer = javaDateFormatter.getPrinter(); | |||
} | |||
dateTimeFormatters.add(javaDateFormatter.getParser()); | |||
dateTimeFormatters.addAll(javaDateFormatter.getParsers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.
Possibly the roundUpBuilder should also be used inside JavaDateFormatter constructor?
also I suspect that roundUpBuilder will suffer from the same problem when it is a composite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.
Maybe we can refactor this in a separate PR then? Also, this code should only be there temporary? As soon as we get rid of Joda time in the code base, I expect that we can get rid of quite a few abstractions.
I suspect that roundUpBuilder will suffer from the same problem when it is a composite?
Is it intended to be used as a composite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended to be used as a composite?
It is used as a composite in the merge
method when constructing the roundUpParser with the appendOptional
.
I can imagine we can have the same pattern as on the issue (yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
) but used with this parser. Will try to come up with a testcase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed that with @spinscale and there is no way we could create a pattern that would suffer from the same problem.
For index name calculations like prefix-{2010-01-01/d{yyyy-MM-dd||yyyy-MM-ddTHH}}
it would fail parsing, as the only thing expected after |
is the timezone (it would fail saying unexpected |
)
this change is a bit slower. on my laptop on this PR branch:
vs on master
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of suggestions.
server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java
Outdated
Show resolved
Hide resolved
@@ -1585,7 +1585,7 @@ static JavaDateFormatter merge(String pattern, List<DateFormatter> formatters) { | |||
if (printer == null) { | |||
printer = javaDateFormatter.getPrinter(); | |||
} | |||
dateTimeFormatters.add(javaDateFormatter.getParser()); | |||
dateTimeFormatters.addAll(javaDateFormatter.getParsers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.
Maybe we can refactor this in a separate PR then? Also, this code should only be there temporary? As soon as we get rid of Joda time in the code base, I expect that we can get rid of quite a few abstractions.
I suspect that roundUpBuilder will suffer from the same problem when it is a composite?
Is it intended to be used as a composite?
server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public DateFormatter withLocale(Locale locale) { | ||
//TODO not sure that we can keep that shortcut, and the one above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not because previously we had a common locale for the composite parser and now it is possible that each one has a different locale (not sure whether we should allow this though and throw an IllegalArgumentException
in the constructor if that it is the case?) Also, we should probably still keep this optimization in case we only have one parser I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. I don't think we have a code were we define composite formatters with different Locale
We could also refactor this into verifying all parsers having the same Locale
or Zone
before returning this.
server/src/main/java/org/elasticsearch/common/time/JavaDateMathParser.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
} | ||
return firstParser().parse(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this doing a second parse attempt with the first parser, if going through all the parsers did not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I guess we could throw an exception after the loop. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just be sure via ctor check, that the size is never 0
? maybe just use an array instead of an array list, than the below check would always be a fixed size check as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never be 0, if no parsers are provided, the printer will be used.
The only reason for this >1 check is to at least when there is only one parser (the regular, not composite pattern) it is not parsing the input twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, will add an exception indicating that all parsers has failed. This won't mislead that the first parser failed (as it is at the moment)
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read. closes elastic#39916
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read. closes elastic#39916
) (#40501) Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read. closes #39916 backport #40100
) (#40502) Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read. closes #39916 backport #40100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review here. I just have one ask for ensuring this works in all cases.
@@ -343,6 +343,17 @@ public void testDuellingFormatsValidParsing() { | |||
assertSameDate("2012-W1-1", "weekyear_week_day"); | |||
} | |||
|
|||
public void testCompositeParsing(){ | |||
//in all these examples the second pattern will be used | |||
assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some more tests:
- 3 patterns
- random ordering of many patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to add more test cases. Do you have anything specific in mind?
It won't harm adding testcases with 3 patterns, but I am not convinced we need a random ordering here. What scenario would it cover?
) (#40503) Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read. closes #39916 backport #40100
…#46654) Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above). It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used. We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in relates #40100 The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771 Those below will expect this change first relates #46242 relates #45284
…elastic#46654) Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above). It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used. We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in relates elastic#40100 The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771 Those below will expect this change first relates elastic#46242 relates elastic#45284
…654) (#47217) Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above). It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used. We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in relates #40100 The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771 Those below will expect this change first relates #46242 relates #45284 backport #46654
…654) (elastic#47217) Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above). It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used. We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in relates elastic#40100 The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771 Those below will expect this change first relates elastic#46242 relates elastic#45284 backport elastic#46654
…654) (elastic#47217) Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above). It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used. We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in relates elastic#40100 The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771 Those below will expect this change first relates elastic#46242 relates elastic#45284 backport elastic#46654
Java-time fails parsing composite patterns when first pattern matches
only the prefix of the input. It expects pattern in longest-shortest
order.
Joda does not suffer from this
Closes #39916