Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Substring(str [from int] [for int]) #1621

Merged
merged 1 commit into from
Jan 24, 2022

Conversation

ovr
Copy link
Contributor

@ovr ovr commented Jan 20, 2022

Which issue does this PR close?

I didn't create an issue for it.

Rationale for this change

DF supports substr function, but it doesn't support substring expression. In this PR I've implemented support for substring expression in planner.

What changes are included in this PR?

Are there any user-facing changes?

Nope.

Thanks

@github-actions github-actions bot added datafusion Changes in the datafusion crate sql SQL Planner labels Jan 20, 2022
@@ -499,6 +499,16 @@ async fn test_interval_expressions() -> Result<()> {
Ok(())
}

#[cfg(feature = "unicode_expressions")]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt find any helpers to assert that function return error, should I introduce a new one or did I miss something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find one either.

You could potentially use the #[should_panic annotation. For example, something like (untested):

#[tokio::test]
#[should_panic(expected = "negative substring length not allowed")]
async fn test_substring_expr_bad() -> Result<()> {
    test_expression!("substring('alphabet' from -1 for 1)", "l");
}

A helper would be great

@xudong963
Copy link
Member

Thanks @ovr !

Lol, I also have just done exactly this for #175

Our idea is same, transfer Expr::Substring to Expr::Function

                let sql_expr = SQLExpr::Function(Function {
                    name: ObjectName(
                        [Ident {
                            value: "substr".to_string(),
                            quote_style: None,
                        }]
                        .to_vec(),
                    ),
                    args: [
                        FunctionArg::Unnamed(*expr.clone()),
                        FunctionArg::Unnamed(from_expr),
                        FunctionArg::Unnamed(for_expr),
                    ]
                    .to_vec(),
                    over: None,
                    distinct: false,
                });
                dbg!(schema.clone());
                self.sql_expr_to_logical_expr(&sql_expr, schema)

I'll help to review after getting up

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @ovr -- thank you for the contribution!

Do you want to add a negative test to this PR, or perhaps as a follow on?

@@ -499,6 +499,16 @@ async fn test_interval_expressions() -> Result<()> {
Ok(())
}

#[cfg(feature = "unicode_expressions")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find one either.

You could potentially use the #[should_panic annotation. For example, something like (untested):

#[tokio::test]
#[should_panic(expected = "negative substring length not allowed")]
async fn test_substring_expr_bad() -> Result<()> {
    test_expression!("substring('alphabet' from -1 for 1)", "l");
}

A helper would be great

@xudong963
Copy link
Member

xudong963 commented Jan 21, 2022

Hi, @ovr I do some case tests with Postgres in my local. The followings are some differences.

// postgres
postgres=# select * from t;
  a
-----
 123
(1 row)

postgres=#  select substring(a,0,3) from t;
 substring
-----------
 12
(1 row)

postgres=#  select substring(a,-1, 3) from t;
 substring
-----------
 1
(1 row)

// datafusion:pr1621
❯ select * from t;
+-----+
| c   |
+-----+
| 123 |
+-----+
1 row in set. Query took 0.009 seconds.

❯ select substring(c,0,3) from t;
+-------------------------------+
| substr(t.c,Int64(0),Int64(3)) |
+-------------------------------+
| 123                           |
+-------------------------------+

❯ select substring(c,-1,3) from t;
+--------------------------------+
| substr(t.c,Int64(-1),Int64(3)) |
+--------------------------------+
| 123                            |
+--------------------------------+
1 row in set. Query took 0.010 seconds.

BTW, It'll be better to integration testing to compare with postgres

@@ -499,6 +499,16 @@ async fn test_interval_expressions() -> Result<()> {
Ok(())
}

#[cfg(feature = "unicode_expressions")]
#[tokio::test]
async fn test_substring_expr() -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a test which from k (k <=0)

@alamb
Copy link
Contributor

alamb commented Jan 22, 2022

@ovr would you like to address the discrepancies that @xudong963 found with postgres?

@ovr
Copy link
Contributor Author

ovr commented Jan 24, 2022

Hi, @ovr I do some case tests with Postgres in my local. The followings are some differences.

Good catch!

@ovr would you like to address the discrepancies that @xudong963 found with postgres?

@alamb I've created an additional PR to fix that - #1660, it will be better to land it before this PR.

Thanks

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me, I think this is a bug left over from Datafusion, not introduced by this PR. We can merge the ticket firstly. BTW, thanks for your fix in #1660 @ovr

@alamb alamb merged commit 618c1e8 into apache:master Jan 24, 2022
@alamb
Copy link
Contributor

alamb commented Jan 24, 2022

Thanks @ovr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants