Skip to content

Commit

Permalink
Array values are preserved (opensearch-project#1300) (opensearch-proj…
Browse files Browse the repository at this point in the history
…ect#3095)

Signed-off-by: Norman Jordan <norman.jordan@improving.com>
(cherry picked from commit e109417)
  • Loading branch information
normanj-bitquill authored and penghuo committed Oct 25, 2024
1 parent f6ca54c commit b101b6a
Show file tree
Hide file tree
Showing 23 changed files with 289 additions and 98 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ public enum Key {
/** PPL Settings. */
PPL_ENABLED("plugins.ppl.enabled"),

/** Query Settings. */
FIELD_TYPE_TOLERANCE("plugins.query.field_type_tolerance"),

/** Common Settings for SQL and PPL. */
QUERY_MEMORY_LIMIT("plugins.query.memory_limit"),
QUERY_SIZE_LIMIT("plugins.query.size_limit"),
Expand Down
10 changes: 5 additions & 5 deletions docs/user/beyond/partiql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,11 +202,11 @@ Selecting top level for object fields, object fields of array value and nested f

os> SELECT city, accounts, projects FROM people;
fetched rows / total rows = 1/1
+-----------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------+
| city | accounts | projects |
|-----------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------|
| {'name': 'Seattle', 'location': {'latitude': 10.5}} | {'id': 1} | [{'name': 'AWS Redshift Spectrum querying'},{'name': 'AWS Redshift security'},{'name': 'AWS Aurora security'}] |
+-----------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------+
+-----------------------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------+
| city | accounts | projects |
|-----------------------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------|
| {'name': 'Seattle', 'location': {'latitude': 10.5}} | [{'id': 1},{'id': 2}] | [{'name': 'AWS Redshift Spectrum querying'},{'name': 'AWS Redshift security'},{'name': 'AWS Aurora security'}] |
+-----------------------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------+

Example 2: Selecting Deeper Levels
----------------------------------
Expand Down
47 changes: 10 additions & 37 deletions docs/user/ppl/general/datatypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ The PPL support the following data types.
+---------------+
| timestamp |
+---------------+
| datetime |
+---------------+
| date |
+---------------+
| time |
Expand Down Expand Up @@ -114,7 +112,7 @@ Numeric values ranged from -2147483648 to +2147483647 are recognized as integer
Date and Time Data Types
========================

The date and time data types are the types that represent temporal values and PPL plugin supports types including DATE, TIME, DATETIME, TIMESTAMP and INTERVAL. By default, the OpenSearch DSL uses date type as the only date and time related type, which has contained all information about an absolute time point. To integrate with PPL language, each of the types other than timestamp is holding part of temporal or timezone information, and the usage to explicitly clarify the date and time types is reflected in the datetime functions (see `Functions <functions.rst>`_ for details), where some functions might have restrictions in the input argument type.
The date and time data types are the types that represent temporal values and PPL plugin supports types including DATE, TIME, TIMESTAMP and INTERVAL. By default, the OpenSearch DSL uses date type as the only date and time related type, which has contained all information about an absolute time point. To integrate with PPL language, each of the types other than timestamp is holding part of temporal or timezone information, and the usage to explicitly clarify the date and time types is reflected in the datetime functions (see `Functions <functions.rst>`_ for details), where some functions might have restrictions in the input argument type.


Date
Expand All @@ -141,19 +139,6 @@ Time represents the time on the clock or watch with no regard for which timezone
+------+-----------------------+----------------------------------------+


Datetime
--------

Datetime type is the combination of date and time. The conversion rule of date or time to datetime is described in `Conversion between date and time types`_. Datetime type does not contain timezone information. For an absolute time point that contains both date time and timezone information, see `Timestamp`_.

+----------+----------------------------------+--------------------------------------------------------------+
| Type | Syntax | Range |
+==========+==================================+==============================================================+
| Datetime | 'yyyy-MM-dd hh:mm:ss[.fraction]' | '0001-01-01 00:00:00.000000' to '9999-12-31 23:59:59.999999' |
+----------+----------------------------------+--------------------------------------------------------------+



Timestamp
---------

Expand Down Expand Up @@ -183,38 +168,26 @@ The expr is any expression that can be iterated to a quantity value eventually,
Conversion between date and time types
--------------------------------------

Basically the date and time types except interval can be converted to each other, but might suffer some alteration of the value or some information loss, for example extracting the time value from a datetime value, or convert a date value to a datetime value and so forth. Here lists the summary of the conversion rules that PPL plugin supports for each of the types:
Basically the date and time types except interval can be converted to each other, but might suffer some alteration of the value or some information loss, for example extracting the time value from a timestamp value, or convert a date value to a timestamp value and so forth. Here lists the summary of the conversion rules that PPL plugin supports for each of the types:

Conversion from DATE
>>>>>>>>>>>>>>>>>>>>

- Since the date value does not have any time information, conversion to `Time`_ type is not useful, and will always return a zero time value '00:00:00'.

- Conversion from date to datetime has a data fill-up due to the lack of time information, and it attaches the time '00:00:00' to the original date by default and forms a datetime instance. For example, the result to covert date '2020-08-17' to datetime type is datetime '2020-08-17 00:00:00'.

- Conversion to timestamp is to alternate both the time value and the timezone information, and it attaches the zero time value '00:00:00' and the session timezone (UTC by default) to the date. For example, the result to covert date '2020-08-17' to datetime type with session timezone UTC is datetime '2020-08-17 00:00:00' UTC.
- Conversion to timestamp is to alternate both the time value and the timezone information, and it attaches the zero time value '00:00:00' and the session timezone (UTC by default) to the date. For example, the result to covert date '2020-08-17' to timestamp type with session timezone UTC is timestamp '2020-08-17 00:00:00' UTC.


Conversion from TIME
>>>>>>>>>>>>>>>>>>>>

- Time value cannot be converted to any other date and time types since it does not contain any date information, so it is not meaningful to give no date info to a date/datetime/timestamp instance.


Conversion from DATETIME
>>>>>>>>>>>>>>>>>>>>>>>>

- Conversion from datetime to date is to extract the date part from the datetime value. For example, the result to convert datetime '2020-08-17 14:09:00' to date is date '2020-08-08'.

- Conversion to time is to extract the time part from the datetime value. For example, the result to convert datetime '2020-08-17 14:09:00' to time is time '14:09:00'.

- Since the datetime type does not contain timezone information, the conversion to timestamp needs to fill up the timezone part with the session timezone. For example, the result to convert datetime '2020-08-17 14:09:00' with system timezone of UTC, to timestamp is timestamp '2020-08-17 14:09:00' UTC.
- Time value cannot be converted to any other date and time types since it does not contain any date information, so it is not meaningful to give no date info to a date/timestamp instance.


Conversion from TIMESTAMP
>>>>>>>>>>>>>>>>>>>>>>>>>

- Conversion from timestamp is much more straightforward. To convert it to date is to extract the date value, and conversion to time is to extract the time value. Conversion to datetime, it will extracts the datetime value and leave the timezone information over. For example, the result to convert datetime '2020-08-17 14:09:00' UTC to date is date '2020-08-17', to time is '14:09:00' and to datetime is datetime '2020-08-17 14:09:00'.
- Conversion from timestamp is much more straightforward. To convert it to date is to extract the date value, and conversion to time is to extract the time value. For example, the result to convert timestamp '2020-08-17 14:09:00' UTC to date is date '2020-08-17', to time is '14:09:00'.


String Data Types
Expand Down Expand Up @@ -412,8 +385,8 @@ Select deeper level for object fields of array value which returns the first ele

os> source = people | fields accounts, accounts.id;
fetched rows / total rows = 1/1
+------------+---------------+
| accounts | accounts.id |
|------------+---------------|
| {'id': 1} | 1 |
+------------+---------------+
+-----------------------+-------------+
| accounts | accounts.id |
|-----------------------+-------------|
| [{'id': 1},{'id': 2}] | 1 |
+-----------------------+-------------+
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import org.json.JSONArray;
import org.json.JSONObject;
import org.junit.Test;
import org.opensearch.sql.common.setting.Settings;
import org.opensearch.sql.legacy.utils.StringUtils;

/**
Expand Down Expand Up @@ -79,9 +80,20 @@ public void testSelectNestedFieldItself() {
@Test
public void testSelectObjectFieldOfArrayValuesItself() {
JSONObject response = new JSONObject(query("SELECT accounts FROM %s"));
verifyDataRows(response, rows(new JSONArray("[{\"id\":1},{\"id\":2}]")));
}

// Only the first element of the list of is returned.
verifyDataRows(response, rows(new JSONObject("{\"id\": 1}")));
@Test
public void testSelectObjectFieldOfArrayValuesItselfNoFieldTypeTolerance() throws Exception {
updateClusterSettings(
new ClusterSetting(PERSISTENT, Settings.Key.FIELD_TYPE_TOLERANCE.getKeyValue(), "false"));
try {
JSONObject response = new JSONObject(query("SELECT accounts FROM %s"));
verifyDataRows(response, rows(new JSONObject("{\"id\":1}")));
} finally {
updateClusterSettings(
new ClusterSetting(PERSISTENT, Settings.Key.FIELD_TYPE_TOLERANCE.getKeyValue(), "true"));
}
}

@Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ private Settings defaultSettings() {
new ImmutableMap.Builder<Key, Object>()
.put(Key.QUERY_SIZE_LIMIT, 200)
.put(Key.SQL_PAGINATION_API_SEARCH_AFTER, true)
.put(Key.FIELD_TYPE_TOLERANCE, true)
.build();

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ public void test_nested_in_where_as_predicate_expression_with_multiple_condition
+ " nested(message.dayOfWeek) >= 4";
JSONObject result = executeJdbcRequest(query);
assertEquals(2, result.getInt("total"));
verifyDataRows(result, rows("c", "ab", 4), rows("zz", "aa", 6));
verifyDataRows(result, rows("c", "ab", 4), rows("zz", new JSONArray(List.of("aa", "bb")), 6));
}

@Test
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ private Settings defaultSettings() {
.put(Key.QUERY_SIZE_LIMIT, 200)
.put(Key.SQL_CURSOR_KEEP_ALIVE, TimeValue.timeValueMinutes(1))
.put(Key.SQL_PAGINATION_API_SEARCH_AFTER, true)
.put(Key.FIELD_TYPE_TOLERANCE, true)
.build();

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ public class OpenSearchExprValueFactory {
/** The Mapping of Field and ExprType. */
private final Map<String, OpenSearchDataType> typeMapping;

/** Whether to support nested value types (such as arrays) */
private final boolean fieldTypeTolerance;

/**
* Extend existing mapping by new data without overwrite. Called from aggregation only {@see
* AggregationQueryBuilder#buildTypeMapping}.
Expand Down Expand Up @@ -143,8 +146,10 @@ public void extendTypeMapping(Map<String, OpenSearchDataType> typeMapping) {
.build();

/** Constructor of OpenSearchExprValueFactory. */
public OpenSearchExprValueFactory(Map<String, OpenSearchDataType> typeMapping) {
public OpenSearchExprValueFactory(
Map<String, OpenSearchDataType> typeMapping, boolean fieldTypeTolerance) {
this.typeMapping = OpenSearchDataType.traverseAndFlatten(typeMapping);
this.fieldTypeTolerance = fieldTypeTolerance;
}

/**
Expand All @@ -160,7 +165,7 @@ public ExprValue construct(String jsonString, boolean supportArrays) {
new OpenSearchJsonContent(OBJECT_MAPPER.readTree(jsonString)),
TOP_PATH,
Optional.of(STRUCT),
supportArrays);
fieldTypeTolerance || supportArrays);
} catch (JsonProcessingException e) {
throw new IllegalStateException(String.format("invalid json: %s.", jsonString), e);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,9 @@ public OpenSearchQueryRequest(StreamInput in, OpenSearchStorageEngine engine) th
}

OpenSearchIndex index = (OpenSearchIndex) engine.getTable(null, indexName.toString());
exprValueFactory = new OpenSearchExprValueFactory(index.getFieldOpenSearchTypes());
exprValueFactory =
new OpenSearchExprValueFactory(
index.getFieldOpenSearchTypes(), index.isFieldTypeTolerance());
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,8 @@ public OpenSearchScrollRequest(StreamInput in, OpenSearchStorageEngine engine)
includes = in.readStringList();
indexName = new IndexName(in);
OpenSearchIndex index = (OpenSearchIndex) engine.getTable(null, indexName.toString());
exprValueFactory = new OpenSearchExprValueFactory(index.getFieldOpenSearchTypes());
exprValueFactory =
new OpenSearchExprValueFactory(
index.getFieldOpenSearchTypes(), index.isFieldTypeTolerance());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,13 @@ public class OpenSearchSettings extends Settings {
Setting.Property.NodeScope,
Setting.Property.Dynamic);

public static final Setting<?> FIELD_TYPE_TOLERANCE_SETTING =
Setting.boolSetting(
Key.FIELD_TYPE_TOLERANCE.getKeyValue(),
true,
Setting.Property.NodeScope,
Setting.Property.Dynamic);

/** Construct OpenSearchSetting. The OpenSearchSetting must be singleton. */
@SuppressWarnings("unchecked")
public OpenSearchSettings(ClusterSettings clusterSettings) {
Expand Down Expand Up @@ -372,13 +379,19 @@ public OpenSearchSettings(ClusterSettings clusterSettings) {
clusterSettings,
Key.SESSION_INACTIVITY_TIMEOUT_MILLIS,
SESSION_INACTIVITY_TIMEOUT_MILLIS_SETTING,
new Updater((Key.SESSION_INACTIVITY_TIMEOUT_MILLIS)));
new Updater(Key.SESSION_INACTIVITY_TIMEOUT_MILLIS));
register(
settingBuilder,
clusterSettings,
Key.STREAMING_JOB_HOUSEKEEPER_INTERVAL,
STREAMING_JOB_HOUSEKEEPER_INTERVAL_SETTING,
new Updater((Key.STREAMING_JOB_HOUSEKEEPER_INTERVAL)));
new Updater(Key.STREAMING_JOB_HOUSEKEEPER_INTERVAL));
register(
settingBuilder,
clusterSettings,
Key.FIELD_TYPE_TOLERANCE,
FIELD_TYPE_TOLERANCE_SETTING,
new Updater(Key.FIELD_TYPE_TOLERANCE));
defaultSettings = settingBuilder.build();
}

Expand Down Expand Up @@ -455,6 +468,7 @@ public static List<Setting<?>> pluginSettings() {
.add(DATASOURCES_LIMIT_SETTING)
.add(SESSION_INACTIVITY_TIMEOUT_MILLIS_SETTING)
.add(STREAMING_JOB_HOUSEKEEPER_INTERVAL_SETTING)
.add(FIELD_TYPE_TOLERANCE_SETTING)
.build();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,12 @@ private OpenSearchExprValueFactory createExprValueFactory() {
Map<String, OpenSearchDataType> allFields = new HashMap<>();
getReservedFieldTypes().forEach((k, v) -> allFields.put(k, OpenSearchDataType.of(v)));
allFields.putAll(getFieldOpenSearchTypes());
return new OpenSearchExprValueFactory(allFields);
return new OpenSearchExprValueFactory(
allFields, settings.getSettingValue(Settings.Key.FIELD_TYPE_TOLERANCE));
}

public boolean isFieldTypeTolerance() {
return settings.getSettingValue(Settings.Key.FIELD_TYPE_TOLERANCE);
}

@VisibleForTesting
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@ public class OpenSearchIndexScan extends TableScanOperator implements Serializab
/** Search response for current batch. */
private Iterator<ExprValue> iterator;

private Settings pluginSettings;

/** Creates index scan based on a provided OpenSearchRequestBuilder. */
public OpenSearchIndexScan(
OpenSearchClient client, int maxResponseSize, OpenSearchRequest request) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ private OpenSearchExprValueFactory buildValueFactory(Set<ReferenceExpression> fi
Map<String, OpenSearchDataType> typeEnv =
fields.stream()
.collect(toMap(ReferenceExpression::getAttr, e -> OpenSearchDataType.of(e.type())));
return new OpenSearchExprValueFactory(typeEnv);
return new OpenSearchExprValueFactory(typeEnv, false);
}

private Environment<Expression, ExprValue> buildValueEnv(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,6 @@ void search() {
new SearchHits(
new SearchHit[] {searchHit}, new TotalHits(1L, TotalHits.Relation.EQUAL_TO), 1.0F));
when(searchHit.getSourceAsString()).thenReturn("{\"id\", 1}");
when(searchHit.getInnerHits()).thenReturn(null);
when(factory.construct(any(), anyBoolean())).thenReturn(exprTupleValue);

// Mock second scroll request followed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,6 @@ void search() throws IOException {
new SearchHits(
new SearchHit[] {searchHit}, new TotalHits(1L, TotalHits.Relation.EQUAL_TO), 1.0F));
when(searchHit.getSourceAsString()).thenReturn("{\"id\", 1}");
when(searchHit.getInnerHits()).thenReturn(null);
when(factory.construct(any(), anyBoolean())).thenReturn(exprTupleValue);

// Mock second scroll request followed
Expand Down
Loading

0 comments on commit b101b6a

Please sign in to comment.