Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing valid SQL with aliases #2867

Closed
andygrove opened this issue Jul 10, 2022 · 4 comments
Closed

Error parsing valid SQL with aliases #2867

andygrove opened this issue Jul 10, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

Describe the bug
I am working on a SQL query fuzzer (see sqlfuzz crate) and I think it just found a bug in DataFusion.

The following SQL works fine in Spark but fails in DataFusion with ParserError("Expected ), found: __t404").

SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
FROM ((
    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
        FROM (test1)) __t398
    INNER JOIN
    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
        FROM (test1)) __t403
    ON __t398.__c395 = __t403.__c401) __t404)
WHERE __t404.__c395 != __t404.__c396

Spark:

scala> spark.sql("""SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
     | FROM ((
     |     (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
     |         FROM (test1)) __t398
     |     INNER JOIN
     |     (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
     |         FROM (test1)) __t403
     |     ON __t398.__c395 = __t403.__c401) __t404)
     | WHERE __t404.__c395 != __t404.__c396""").show
+------+------+------+------+
|__c394|__c395|__c396|__c397|
+------+------+------+------+
+------+------+------+------+

To Reproduce
Use sqlfuzz

Expected behavior
Query should work

Additional context
None

@andygrove andygrove added the bug Something isn't working label Jul 10, 2022
@waitingkuo
Copy link
Contributor

i tried it on sqlparser-rs, it doesn't work

use sqlparser::dialect::GenericDialect;
use sqlparser::parser::Parser;

fn main() {

    let dialect = GenericDialect {}; // or AnsiDialect

    let sql = "SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
                FROM ((
                    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
                        FROM (test1)) __t398
                    INNER JOIN
                    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
                        FROM (test1)) __t403
                    ON __t398.__c395 = __t403.__c401) __t404)
                WHERE __t404.__c395 != __t404.__c396; ";



    let ast = Parser::parse_sql(&dialect, sql).unwrap();

    println!("AST: {:?}", ast);
} 
➜  sp git:(master) ✗ cargo run
   Compiling sp v0.1.0 (/Users/willy/willy/df-workspace/sp)
    Finished dev [unoptimized + debuginfo] target(s) in 0.30s
     Running `target/debug/sp`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParserError("Expected ), found: __t404")', src/main.rs:26:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

looks like the outer alias doesn't work. the following sql works

SELECT __c394, __c395, __c396, __c397 
FROM ((
    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
        FROM (test1)) __t398
    INNER JOIN
    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
        FROM (test1)) __t403
    ON __t398.__c395 = __t403.__c401) )
WHERE __c395 != __c396;
0 rows in set. Query took 0.004 seconds.

@waitingkuo
Copy link
Contributor

@andygrove i've made the pull request to sqlparser-rs to make nestedjoin support alias

I will update this https://github.com/apache/arrow-datafusion/blob/master/datafusion/sql/src/planner.rs#L731 after new sqlparser released

@waitingkuo
Copy link
Contributor

@andygrove the parser error fixed in #3072

now it raised another error

DataFusion CLI v10.0.0
0 rows in set. Query took 0.038 seconds.
SchemaError(FieldNotFound { qualifier: Some("__t404"), name: "__c395", valid_fields: Some(["__t398.__c394", "__t398.__c395", "__t398.__c396", "__t398.__c397", "__t403.__c399", "__t403.__c400", "__t403.__c401", "__t403.__c402"]) })

I feel like this might have the similar issue as #3073

@alamb
Copy link
Contributor

alamb commented Aug 22, 2024

This query seems to work great on datafusion 41.0.0

 andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ datafusion-cli
DataFusion CLI v41.0.0

> create or replace table test1(c0 int, c1 int, c2 int, c3 int);
0 row(s) fetched.
Elapsed 0.030 seconds.

> SELECT __t404.__c394, __t404.__c395, __t404.__c396, __t404.__c397
FROM ((
    (SELECT test1.c0 AS __c394, test1.c1 AS __c395, test1.c2 AS __c396, test1.c3 AS __c397
        FROM (test1)) __t398
    INNER JOIN
    (SELECT test1.c0 AS __c399, test1.c1 AS __c400, test1.c2 AS __c401, test1.c3 AS __c402
        FROM (test1)) __t403
    ON __t398.__c395 = __t403.__c401) __t404)
WHERE __t404.__c395 != __t404.__c396;
+--------+--------+--------+--------+
| __c394 | __c395 | __c396 | __c397 |
+--------+--------+--------+--------+
+--------+--------+--------+--------+
0 row(s) fetched.
Elapsed 0.046 seconds.

@alamb alamb closed this as completed Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants