-
Notifications
You must be signed in to change notification settings - Fork 181
Description
What is the bug?
With the automatic type conversion introduced in #4599, PPL queries that contain certain type mismatches are rewritten to use the SAFE_CAST function.
When these queries are translated to Spark SQL via SparkSqlDialect, SAFE_CAST is emitted as-is in the generated SQL. However, Spark SQL does not provide a SAFE_CAST function, the resulting SQL is invalid and fails at analysis time in Spark.
How can one reproduce the bug?
This issue was first observed in the PPL unification PoC: opensearch-project/opensearch-spark#1281 (comment)
spark-sql (default)> search source=test_events;
@timestamp host packets message
2025-09-08 10:00:00 server1 60 {"category":1, "resource":"A"}
2025-09-08 10:01:00 server1 120 {"category":2, "resource":"B"}
2025-09-08 10:02:00 server1 60 {"category":3, "resource":"C"}
2025-09-08 10:02:30 server2 180 {"category":4, "resource":"D"}
spark-sql (default)> search source=test_events | spath input=message category | eval cat = abs(category);
[PARSE_SYNTAX_ERROR] Syntax error at or near 'AS'.(line 1, pos 153)
== SQL ==
SELECT `@timestamp`, `host`, `packets`, `message`, `JSON_EXTRACT`(`message`, 'category') `category`, ABS(SAFE_CAST(`JSON_EXTRACT`(`message`, 'category') AS DOUBLE)) `cat`
---------------------------------------------------------------------------------------------------------------------------------------------------------^^^
FROM `spark_catalog`.`default`.`test_events`
What is the expected behavior?
SAFE_CAST should be translated to an equivalent TRY_CAST function in Spark SQL, so that it behaves correctly and does not produce invalid SQL.
SELECT TRY_CAST('123' AS INT);
TRY_CAST(123 AS INT)
123
spark-sql (default)> SELECT TRY_CAST('123abc' AS INT);
TRY_CAST(123abc AS INT)
NULL
What is your host/environment?
- OS: 3.4
Do you have any screenshots?
N/A
Do you have any additional context?
- In Spark SQL,
CASTbehaves somewhat likeTRY_CASTonly when ANSI mode is disabled; relying on this is not safe, soCASTshould not be used to emulateSAFE_CASTsemantics. - Beyond this specific bug, there may be semantic discrepancies in which type conversions are considered valid between OpenSearch PPL and Spark, so we may need additional alignment around type conversion behavior.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status