Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Alternate Date Formats #258

Closed

Conversation

GabeFernandez310
Copy link

@GabeFernandez310 GabeFernandez310 commented Apr 10, 2023

Description

Supports using date formats obtained from OpenSearch. Contains some changes from OpenSearchDataType rework (8b2f65d). Resolves some issues previously seen on the forum (please see Issues Resolved section below). This also adds support to use custom formats in the index mapping. Valid custom formats must have enough information to be parsed as a Date, Time, or Timestamp, and otherwise throw an exception after failing to parse.

Currently failing due to test failures likely unrelated to these specific changes

Based on PoC (Here: https://github.com/Bit-Quill/opensearch-project-sql/pull/169/files). Therefore, it also fails to parse the same formats as those listed in the PoC.

Note: With this change, issues may be introduced where data is read differently if a user's mapping defines a format. This is because the plugin previously used a hardcoded formatter to parse the data, while with these changes it will use the formatter defined in the mapping.

Supported formats include the following:

epoch_millis
epoch_millis
epoch_second
date_optional_time
strict_date_optional_time
strict_date_optional_time_nanos
basic_date
basic_date_time
basic_date_time_no_millis
basic_ordinal_date
basic_ordinal_date_time
basic_ordinal_date_time_no_millis
basic_time
basic_time_no_millis
basic_t_time
basic_t_time_no_millis
basic_week_date
strict_basic_week_date
basic_week_date_time
strict_basic_week_date_time
basic_week_date_time_no_millis
strict_basic_week_date_time_no_millis
date
strict_date
date_hour
strict_date_hour
date_hour_minute
strict_date_hour_minute
date_hour_minute_second
strict_date_hour_minute_second
date_hour_minute_second_fraction
strict_date_hour_minute_second_fraction
date_hour_minute_second_millis
strict_date_hour_minute_second_millis
date_time
strict_date_time
date_time_no_millis
strict_date_time_no_millis
hour
strict_hour
hour_minute
strict_hour_minute
hour_minute_second
strict_hour_minute_second
hour_minute_second_fraction
strict_hour_minute_second_fraction
hour_minute_second_millis
strict_hour_minute_second_millis
ordinal_date
strict_ordinal_date
ordinal_date_time
strict_ordinal_date_time
ordinal_date_time_no_millis
strict_ordinal_date_time_no_millis
time
strict_time
time_no_millis
strict_time_no_millis
t_time
strict_t_time
t_time_no_millis
strict_t_time_no_millis
week_date
strict_week_date
week_date_time
strict_week_date_time
week_date_time_no_millis
strict_week_date_time_no_millis
weekyear_week_day
strict_weekyear_week_day
year_month_day
strict_year_month_day
yyyy-MM-dd
HH:mm:ss
yyyy-MM-dd||uuuu-DDD
hour_minute_second||t_time

The following formats are not supported as no default formatters are available for these from OpenSearch core.

weekyear
strictweekyear
year
strict_year
year_month
strict_year_month

Issues Resolved

opensearch-project#794
https://forum.opensearch.org/t/sql-select-fails-on-date-fields-format-epoch-second/11521

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

acarbonetto and others added 11 commits March 29, 2023 16:06
…Search types

Signed-off-by: Andrew Carbonetto <andrewc@bitquilltech.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>

public List<DateFormatter> getNamedFormatters(String formats) {
return getFormatList(formats).stream().filter(f -> {
try {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try... catch blocks are inefficient (compared with a boolean comparison) and shouldn't be used in loops or streams.
Much better to filter on the patterns of acceptable strings.
If there's some edge cases, you can include a try...catch around the entire stream.

public List<DateTimeFormatter> getRegularFormatters() {
return getFormatList(formatString).stream().map(f -> {
try {
return DateTimeFormatter.ofPattern(f);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. avoid using try...catch within a loop or stream.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this is the only way to distinguish formatters.
I tried to reuse (call) some methods from OpenSearch core libs, but they are not exported.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I tried to do the exact same thing 😆. I agree. There probably is not another way to do this.

Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
@codecov
Copy link

codecov bot commented Apr 18, 2023

Codecov Report

Merging #258 (4e9bb15) into integ-support-date-formats (b90065f) will decrease coverage by 2.81%.
The diff coverage is 100.00%.

@@                       Coverage Diff                        @@
##             integ-support-date-formats     #258      +/-   ##
================================================================
- Coverage                         99.98%   97.17%   -2.81%     
- Complexity                         2493     4136    +1643     
================================================================
  Files                               193      372     +179     
  Lines                              5720    10421    +4701     
  Branches                            359      707     +348     
================================================================
+ Hits                               5719    10127    +4408     
- Misses                                1      287     +286     
- Partials                              0        7       +7     
Flag Coverage Δ
sql-engine 97.17% <100.00%> (-2.81%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ch/sql/opensearch/client/OpenSearchNodeClient.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/client/OpenSearchRestClient.java 100.00% <100.00%> (ø)
...h/sql/opensearch/data/type/OpenSearchDataType.java 100.00% <100.00%> (ø)
...h/sql/opensearch/data/type/OpenSearchDateType.java 100.00% <100.00%> (ø)
...h/sql/opensearch/data/type/OpenSearchTextType.java 100.00% <100.00%> (ø)
...nsearch/data/value/OpenSearchExprValueFactory.java 100.00% <100.00%> (ø)
...pensearch/sql/opensearch/mapping/IndexMapping.java 100.00% <100.00%> (ø)

... and 172 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
… With Datetime Types

Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
public List<DateTimeFormatter> getRegularFormatters() {
return getFormatList(formatString).stream().map(f -> {
try {
return DateTimeFormatter.ofPattern(f);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this is the only way to distinguish formatters.
I tried to reuse (call) some methods from OpenSearch core libs, but they are not exported.

Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
(cherry picked from commit c91f48d9763499940ecf302212fcd592aecb6018)
…earch-project-sql into dev-support-date-formats
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Comment on lines 99 to 102
new DateTimeFormatterBuilder()
.appendOptional(SQL_LITERAL_DATE_TIME_FORMAT)
.appendOptional(STRICT_DATE_OPTIONAL_TIME_FORMATTER)
.appendOptional(STRICT_HOUR_MINUTE_SECOND_FORMATTER);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, you use 3 hardcoded formats instead of formats from mapping

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be fine? This is only slightly modified from how it was implemented here (I modified it as a potential way to try and use user-defined formats). However, this might not even be needed anymore. constructTimestamp is only called in one place, if the call to parseTimestampString fails to parse the value, so probably the call to to constructTimestamp isn't even needed. I will try to rework.

Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
…nsearch-project-sql into dev-support-date-formats
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
@acarbonetto
Copy link

acarbonetto commented Apr 27, 2023

@GabeFernandez310 can you please add a use case to the PR description?
It would also be worthwhile to list the supported date formats for this PR (link).

If there are any 'breaking' changes, please list them in the PR description. I believe, the only breaking changes would happen if a date format was defined for a mapping which was previously unused - and now it may not resolve that date if the date doesn't match the default format AND the format provided doesn't include the default type.

Honestly, the above change seems like it would be bad data anyways... so is this really a breaking change?

@@ -483,6 +484,7 @@ public void joinQuerySelectOnlyOnOneTable() throws Exception {
assertContainsData(getDataRows(response), fields);
}

@Disabled("Disabled temporarily due to JSON format incompatibility with V2 and Legacy")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by this? Is this test case now supported in V2 and it fails?

Copy link
Author

@GabeFernandez310 GabeFernandez310 Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been a mistake. I think this was failing due to the data type refactor at first, and when I asked about it JSON format was the reason that was suggested for why it started failing all of a sudden, so I just put that as the description. This has been fixed since then, so I will remove these annotations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GabeFernandez310 and others added 2 commits April 27, 2023 11:26
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
* Keep up with refactoring in OpenSearch.

Signed-off-by: MaxKsyunz <maxk@bitquilltech.com>

* Updating code formatting.

Signed-off-by: MaxKsyunz <maxk@bitquilltech.com>

---------

Signed-off-by: MaxKsyunz <maxk@bitquilltech.com>
{"index": {}}
{"name": "yyyy-MM-dd||uuuu-DDD", "yyyy-MM-dd||uuuu-DDD": "1984-04-12"}
{"index": {}}
{"name": "hour_minute_second||t_time", "hour_minute_second||t_time": "09:07:42"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add multi-format field that covers both dates and times?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"index": {}}
{"name": "HH:mm:ss", "HH:mm:ss": "09:07:42"}
{"index": {}}
{"name": "yyyy-MM-dd||uuuu-DDD", "yyyy-MM-dd||uuuu-DDD": "1984-04-12"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have data that formats to each of the provided date formats to make sure that they are all working.
ie add data that conforms to uuuu-DDD format

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{"index": {}}
{"name": "yyyy-MM-dd||uuuu-DDD", "yyyy-MM-dd||uuuu-DDD": "1984-04-12"}
{"index": {}}
{"name": "hour_minute_second||t_time", "hour_minute_second||t_time": "09:07:42"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add data that is formatted by the t_time formatter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
public static Map<String, OpenSearchDataType> parseMapping(Map<String, Object> indexMapping) {
Map<String, OpenSearchDataType> result = new LinkedHashMap<>();
if (indexMapping != null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I prefer to fail early on functions. e.g.

if (indexMapping == null) {
  return result;
}
...

switch (mappingType) {
case Object:
case Nested:
if (innerMap.isEmpty()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be applied prior to the switch as really any innerMap that's empty should just return the cached instance.

return new ExprTimestampValue(Instant.ofEpochMilli(value.longValue()));
return formatReturn(
returnFormat,
new ExprTimestampValue(Instant.ofEpochMilli(value.longValue())));
} else if (value.isString()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else not required

"Construct ExprTimestampValue from \"%s\" failed, unsupported date format.",
value.stringValue()),
ignored);
}
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else not required, as the other options all return/throw

dt = (OpenSearchDateType) type;
returnFormat = dt.getExprType();
} else {
dt = OpenSearchDateType.of();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this happen? I mean, when would we parseTimestamp on a non-date type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs to be reworked. Currently typeActionMap still uses an OpenSearchDataType created from an ExprCoreType in some places. I believe I tried to change this at some point, but was seeing test failures because in the case that an OpenSearchDataType is passed in, it was failing on the line

dt = (OpenSearchDateType) type;

because it could not cast the passed in type argument to an OpenSearchDateType.

This if-else branch may just need to be removed, and the code may need to be updated wherever the parseTimestamp function is called using a type argument that is not an OpenSearchDateType.

@@ -28,6 +28,7 @@
import java.util.Map;
import lombok.RequiredArgsConstructor;
import org.apache.lucene.index.LeafReaderContext;
import org.junit.Ignore;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this?

private static final String FORMAT_DELIMITER = "\\|\\|";


// a read-only collection of relations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment

Comment on lines 18 to 19
* Of type join with relations. See
* <a href="https://opensearch.org/docs/latest/opensearch/supported-field-types/join/">doc</a>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment

Comment on lines 90 to 91
var res = new OpenSearchDateType(format);
return res;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can return new ...

returnFormat,
new ExprTimestampValue(
new ExprTimeValue(LocalTime.from(parsed))
.timestampValue(new FunctionProperties(Instant.EPOCH, ZoneOffset.UTC))));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as LocalDate.now() I suppose.
We need a way to use here already created for this query instance of FunctionProperties.

GumpacG and others added 8 commits April 28, 2023 08:00
Signed-off-by: GabeFernandez310 <Gabriel.Fernandez@improving.com>
…earch-project-sql into dev-support-date-formats
Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
@GumpacG
Copy link

GumpacG commented May 8, 2023

Closing this as there will be a follow up PR to isolate the support for datetime formats only.

@GumpacG GumpacG closed this May 8, 2023
@Yury-Fridlyand Yury-Fridlyand deleted the dev-support-date-formats branch July 31, 2023 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants