Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10559: [Rust][DataFusion] Split up logical_plan/mod.rs into sub modules #8639

Closed
wants to merge 9 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 11, 2020

The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge #8619 (review)

This PR just moves code around -- it is not intended to change any semantics

Reviewing it commit-by-commit might be helpful to see how each piece went

I can also break it up into a sequence of smaller PRs if that would help review

@alamb alamb force-pushed the alamb/ARROW-10559-split-up branch from aa4d569 to 5af4254 Compare November 11, 2020 16:08
use arrow::record_batch::RecordBatch;
use functions::{ReturnTypeFunction, ScalarFunctionImplementation, Signature};

mod builder;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module now pub use s all struct, enum and traits that were public in this module so no client code should need to change.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, @alamb , really improves org.

I went through each of the commits and left two comments.

Comment on lines -1268 to -1290
fn test_visitor() {
let schema = Schema::new(vec![]);
assert_eq!("[]", format!("{}", display_schema(&schema)));
}

#[test]
fn test_display_empty_schema() {
let schema = Schema::new(vec![]);
assert_eq!("[]", format!("{}", display_schema(&schema)));
}

#[test]
fn test_display_schema() {
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new("first_name", DataType::Utf8, true),
]);

assert_eq!(
"[id:Int32, first_name:Utf8;N]",
format!("{}", display_schema(&schema))
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one of the tests disapeared in this commit? (I count -3 +2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good 👁️

I removed

    fn test_visitor() {
        let schema = Schema::new(vec![]);
        assert_eq!("[]", format!("{}", display_schema(&schema)));
    }

Which is redundant, except for the name, with the test immediately below:

    #[test]
    fn test_display_empty_schema() {
        let schema = Schema::new(vec![]);
        assert_eq!("[]", format!("{}", display_schema(&schema)));
    }

Aka it was removing a copy / paste bug (that I added 😞 )

Comment on lines -550 to -579
/// Create an convenience function representing a unary scalar function
macro_rules! unary_math_expr {
($ENUM:ident, $FUNC:ident) => {
#[allow(missing_docs)]
pub fn $FUNC(e: Expr) -> Expr {
Expr::ScalarFunction {
fun: functions::BuiltinScalarFunction::$ENUM,
args: vec![e],
}
}
};
}

// generate methods for creating the supported unary math expressions
unary_math_expr!(Sqrt, sqrt);
unary_math_expr!(Sin, sin);
unary_math_expr!(Cos, cos);
unary_math_expr!(Tan, tan);
unary_math_expr!(Asin, asin);
unary_math_expr!(Acos, acos);
unary_math_expr!(Atan, atan);
unary_math_expr!(Floor, floor);
unary_math_expr!(Ceil, ceil);
unary_math_expr!(Round, round);
unary_math_expr!(Trunc, trunc);
unary_math_expr!(Abs, abs);
unary_math_expr!(Signum, signum);
unary_math_expr!(Exp, exp);
unary_math_expr!(Log, ln);
unary_math_expr!(Log2, log2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be removed? I though that they were needed to plan these expressions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler started complaining to me that they were not used so I removed them and the tests keep passing

I just tested with the repl and I seem to be able to plan the expressions just fine, so I guess the compiler was right from what I can see. Maybe I am missing something

(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust$ git status
On branch alamb/ARROW-10559-split-up
Your branch is up to date with 'alamb/alamb/ARROW-10559-split-up'.

nothing to commit, working tree clean
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust$ cd datafusion/
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust/datafusion$ echo "1" > /tmp/table.csv
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust/datafusion$ echo "2" >> /tmp/table.csv
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust/datafusion$ 
(arrow_dev) alamb@MacBook-Pro:~/Software/arrow2/rust/datafusion$ cargo run --bin repl
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `/Users/alamb/Software/arrow2/rust/target/debug/repl`
> 

CREATE EXTERNAL TABLE t(a int)
STORED AS CSV
LOCATION '/tmp/table.csv';

0 rows in set. Query took 0 seconds.
> select sqrt(a) from t;
+--------------------+
| sqrt(a)            |
+--------------------+
| 1                  |
| 1.4142135623730951 |
+--------------------+
2 row in set. Query took 0 seconds.
> 

🤔 So I guess they aren't needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that they are used when creating a statement via the DataFrame API. I.e. in examples/dataframe.rs, if we want to create an expression that is sqrt(col("c1")), we need to import sqrt from logical_plan. No?

I find it odd that the compiler is arguing that pub functions are not being used, though xD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that I didn't then publically export them. I'll put the code back in.

It is probably bad that we didn't have tests that failed when they got removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code restored in ed05ee5 and publically used in 60cd9fe

@github-actions
Copy link

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

yordan-pavlov pushed a commit to yordan-pavlov/arrow that referenced this pull request Nov 14, 2020
… modules

The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge apache#8619 (review)

This PR just moves code around -- it is not intended to change any semantics

Reviewing it commit-by-commit might be helpful to see how each piece went

I can also break it up into a sequence of smaller PRs if that would help review

Closes apache#8639 from alamb/alamb/ARROW-10559-split-up

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
@alamb alamb deleted the alamb/ARROW-10559-split-up branch December 7, 2020 19:30
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
… modules

The module has gotten fairly large and so refactoring it into smaller chunks will improve readability – as suggested by Jorge apache#8619 (review)

This PR just moves code around -- it is not intended to change any semantics

Reviewing it commit-by-commit might be helpful to see how each piece went

I can also break it up into a sequence of smaller PRs if that would help review

Closes apache#8639 from alamb/alamb/ARROW-10559-split-up

Authored-by: alamb <andrew@nerdnetworks.org>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants