-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DataFrame reference to the user guide #3067
Conversation
@kmitchener WDYT? |
Codecov Report
@@ Coverage Diff @@
## master #3067 +/- ##
==========================================
+ Coverage 85.85% 85.93% +0.08%
==========================================
Files 289 289
Lines 51890 52118 +228
==========================================
+ Hits 44548 44790 +242
+ Misses 7342 7328 -14
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
Looks great! |
Co-authored-by: Loïc Sharma <737941+loic-sharma@users.noreply.github.com>
Co-authored-by: Loïc Sharma <737941+loic-sharma@users.noreply.github.com>
Yeah, this is nice and was completely missing from the docs before. Re the function listing, should we combine the dataframe functions/conditional expressions, etc with the SQL version? So one definition of the function with maybe 2 examples -- one for SQL, one for Dataframe. Although that consolidation can come later in a follow-on. |
…afusion into user-guide-dataframe
// TODO(kszucs): this seems buggy, unary_scalar_expr! is used for many | ||
// varying arity functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #3069 to track this
I'm not sure if we should combine them or not. Combining them is easier for the maintainers for sure, but having separate references for DataFrame and SQL might be easier for users? |
Here is the related PR to add SQL function documentation: #3090 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
This looks like a really nice improvment @andygrove . Thank you!
unary_scalar_expr!(Log10, log10); | ||
unary_scalar_expr!(Ln, ln); | ||
unary_scalar_expr!(NullIf, nullif); | ||
unary_scalar_expr!(Sqrt, sqrt, "square root of a number"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the names
under the License. | ||
--> | ||
|
||
# DataFrame API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is important to mention somewhere that the computations are deferred until collect()
is called? Maybe that is common across other dataframe implementations and can be assumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
col("a").gt(lit(5)).and(col("b").lt(lit(7))) | ||
``` | ||
|
||
## Identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are not data frame specific, maybe we should put them into a different guide -- and reference that here
Something like docs/source/user-guide/expressions.md
perhaps
This is just a suggestion, I think it would be fine to do in a follow on PR or never
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move into a separate page as a follow-on. How would you see users using expressions outside the context of DataFrame usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use them all the time in IOx when we need to make Exprs
-- for example to create LogicalPlans
/ use the LogicalPlanBuilder
. I also think they are useful for writing tests when people are writing extensions for DataFusion
| case | CASE expression. Example: `case(expr).when(expr, expr).when(expr, expr).otherwise(expr).end()`. | | ||
| nullif | Returns a null value if value1 equals value2; otherwise it returns value1. This can be used to perform the inverse operation of the `coalesce` expression. | | ||
|
||
## String Expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is ok to leave the Notes
column blank for now and fill it out going forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we can link to the content in docs/source/user-guide/sql/scalar_functions.md
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Benchmark runs are scheduled for baseline = 098f0b0 and contender = 1e44417. 1e44417 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3066
Rationale for this change
We need to document what is available otherwise how will users know where to start?
What changes are included in this PR?
New section in user guide (see rendered version)
This is just a starting point and we'll need to continue improving and extending once this is merged.
Are there any user-facing changes?
No