Skip to content

[BUG] SAFE_CAST function is not translated to Spark-compatible cast #4778

@dai-chen

Description

@dai-chen

What is the bug?

With the automatic type conversion introduced in #4599, PPL queries that contain certain type mismatches are rewritten to use the SAFE_CAST function.

When these queries are translated to Spark SQL via SparkSqlDialect, SAFE_CAST is emitted as-is in the generated SQL. However, Spark SQL does not provide a SAFE_CAST function, the resulting SQL is invalid and fails at analysis time in Spark.

How can one reproduce the bug?

This issue was first observed in the PPL unification PoC: opensearch-project/opensearch-spark#1281 (comment)

spark-sql (default)> search source=test_events;
@timestamp	host	packets	message
2025-09-08 10:00:00	server1	60	{"category":1, "resource":"A"}
2025-09-08 10:01:00	server1	120	{"category":2, "resource":"B"}
2025-09-08 10:02:00	server1	60	{"category":3, "resource":"C"}
2025-09-08 10:02:30	server2	180	{"category":4, "resource":"D"}

spark-sql (default)> search source=test_events | spath input=message category | eval cat = abs(category);
[PARSE_SYNTAX_ERROR] Syntax error at or near 'AS'.(line 1, pos 153)

== SQL ==
SELECT `@timestamp`, `host`, `packets`, `message`, `JSON_EXTRACT`(`message`, 'category') `category`, ABS(SAFE_CAST(`JSON_EXTRACT`(`message`, 'category') AS DOUBLE)) `cat`
---------------------------------------------------------------------------------------------------------------------------------------------------------^^^
FROM `spark_catalog`.`default`.`test_events`

What is the expected behavior?

SAFE_CAST should be translated to an equivalent TRY_CAST function in Spark SQL, so that it behaves correctly and does not produce invalid SQL.

SELECT TRY_CAST('123' AS INT);
TRY_CAST(123 AS INT)
123

spark-sql (default)> SELECT TRY_CAST('123abc' AS INT);
TRY_CAST(123abc AS INT)
NULL

What is your host/environment?

  • OS: 3.4

Do you have any screenshots?
N/A

Do you have any additional context?

  • In Spark SQL, CAST behaves somewhat like TRY_CAST only when ANSI mode is disabled; relying on this is not safe, so CAST should not be used to emulate SAFE_CAST semantics.
  • Beyond this specific bug, there may be semantic discrepancies in which type conversions are considered valid between OpenSearch PPL and Spark, so we may need additional alignment around type conversion behavior.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't workingspark integration

Type

No type

Projects

Status

New

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions