-
Notifications
You must be signed in to change notification settings - Fork 22
DataFusion 52 release post #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
|
Thanks @mbutrovich -- any additional context / suggestions you have on the sort mergejoin improvement would be most appreciated |
|
(this is on my list, but I am struggling to find time to finish it -- hopefully after CIDR / thursday) |
…sion-site into site/datafusion_52
| --- | ||
| layout: post | ||
| title: Apache DataFusion 52.0.0 Released | ||
| date: 2026-01-08 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to:
- https://lists.apache.org/thread/gt29yg6wxzx82s87drwq1xb06yhs16y6
- https://crates.io/crates/datafusion/52.0.0
| date: 2026-01-08 | |
| date: 2026-01-12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- I think in the past we have dated the blog posts based on when the post was released rather than when the software was 🤔
| TODO: confirm the release date for 52.0.0 and update the front matter if needed. | ||
|
|
||
| [DataFusion 52.0.0]: https://crates.io/crates/datafusion/52.0.0 | ||
| [DataFusion 51.0.0]: https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
| explained in the [Extending SQL in DataFusion Blog]. With this new API, you can | ||
| customize DataFusion to support almost any SQL syntax, such as the following | ||
| (which are not supported by default): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this is slightly misleading: it reads as if the RelationPlanner is what now allows extending expressions and types (and relations). Maybe something like:
In addition to the existing expression and types extension points, this new API now allows extending FROM clauses, leading DataFusion to support almost any SQL syntax, such as the following (which are not supported by default):
But reworded to be less of a run-on sentence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great call out. I reworded it in 615affd to this:
DataFusion now has an API for extending the SQL planner for relations, as
explained in the [Extending SQL in DataFusion Blog]. In addition to the existing
expression and types extension points, this new API now allows extendingFROM
clauses. Using these APIs it is straightforward to provide SQL support for
almost any dialect, including vendor-specific syntax. Example use cases include:
| [Apache Comet]: https://datafusion.apache.org/comet/ | ||
| [mbutrovich]: https://github.com/mbutrovich | ||
|
|
||
| ### Rewritten merge join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section title looks very similar to the previous one. The start of the first sentence is also identical. Maybe a title that differentiates this section more from the previous one (e.g. "Optimised Output Handling of Merge Join") would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you -- I think this was the result of a bad merge conflict resolution (I had both the revised paragraph and the original). I removed the section in 1345bfb
nuno-faria
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the changelog and this PR caught my attention: apache/datafusion#18644. Maybe it could be worth a mention as well.
|
|
||
| This release also includes several additional caching improvements. | ||
|
|
||
| A new statistics cache for Parquet Metadata avoids repeatedly (re)calculating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe "Parquet Metadata" -> "File Metadata"? Since there is also a separate cache for the Parquet metadata itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call -- Fixed in e9308d4
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
…sion-site into site/datafusion_52
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
@nuno-faria another great call. I have added a section in 4e24b1f
Perhaps @2010YOUY01 can verify if I got the summary correct |
| [Variant shredding]: https://github.com/apache/datafusion/issues/16116 | ||
| [PhysicalExprAdapter]: https://docs.rs/datafusion/52.0.0/datafusion/physical_expr_adapter/trait.PhysicalExprAdapter.html | ||
|
|
||
| ### Sort Pushdown to Scans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Yongting You <2010youy01@gmail.com>
…sion-site into site/datafusion_52

52.0.0release datafusion#1969152.0.0(Dec 2025 / Jan 2026) datafusion#18566This is a draft of the DataFusion 52 release post
See rendered preview: https://datafusion.staged.apache.org/blog/2026/01/08/datafusion-52.0.0/
This was initially created using coded. Commands below
Details
We are going to write a blog post for the DataFusion 52.0.0 release
We need to cover the major features in this release. If you are unsure of any content, please leave a "TODO" note in the text and we can fill it in
later.
Please start with a copy of the previous post as a starting point: content/blog/2025-11-25-datafusion-51.0.0.md and update as needed.
The changelog is here: https://github.com/xudong963/arrow-datafusion/blob/update_version/dev/changelog/52.0.0.md
The list of major features can be found in apache/datafusion#18566 under the section "Features to mention in the blog
(if they make it)". Only include the ones that made it into the release, with a checkmark.
Please
example where possible.