Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 8, 2026

This is a draft of the DataFusion 52 release post

See rendered preview: https://datafusion.staged.apache.org/blog/2026/01/08/datafusion-52.0.0/

This was initially created using coded. Commands below

Details

We are going to write a blog post for the DataFusion 52.0.0 release

We need to cover the major features in this release. If you are unsure of any content, please leave a "TODO" note in the text and we can fill it in
later.

Please start with a copy of the previous post as a starting point: content/blog/2025-11-25-datafusion-51.0.0.md and update as needed.

The changelog is here: https://github.com/xudong963/arrow-datafusion/blob/update_version/dev/changelog/52.0.0.md

The list of major features can be found in apache/datafusion#18566 under the section "Features to mention in the blog
(if they make it)". Only include the ones that made it into the release, with a checkmark.

Please

  • write a blog post
  • leave a section for performance chart which we can fill in later
  • include a section for each major feature, summarizing what it is and why it is important, and the related PRs. Please try to include a diagram or
    example where possible.

Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Jan 20, 2026

Thanks @mbutrovich -- any additional context / suggestions you have on the sort mergejoin improvement would be most appreciated

@alamb
Copy link
Contributor Author

alamb commented Jan 20, 2026

(this is on my list, but I am struggling to find time to finish it -- hopefully after CIDR / thursday)

@alamb alamb changed the title WIP: DataFusion 52 release post DataFusion 52 release post Jan 23, 2026
@alamb
Copy link
Contributor Author

alamb commented Jan 23, 2026

FYI @2010YOUY01 @BlakeOrth @Dandandan @Jefffrey @LiaCastaneda @NGA-TRAN @Tim-53 @Yuvraj-cyborg @adriangb @alamb @alchemist51 @asolimando @bharath-techie @comphead @corasaurus-hex
@ethan-tyler @feniljain @gabotechs @geoffreyclaude @jdcasale @jizezhang @kosiew @martin-g @mbutrovich @milenkovicm @nuno-faria @pepijnve @rluvaton @theirix @timsaucer @zhuqi-lucas and @xudong963 as you are mentioned in this post

---
layout: post
title: Apache DataFusion 52.0.0 Released
date: 2026-01-08
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- I think in the past we have dated the blog posts based on when the post was released rather than when the software was 🤔

TODO: confirm the release date for 52.0.0 and update the front matter if needed.

[DataFusion 52.0.0]: https://crates.io/crates/datafusion/52.0.0
[DataFusion 51.0.0]: https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

Comment on lines 228 to 230
explained in the [Extending SQL in DataFusion Blog]. With this new API, you can
customize DataFusion to support almost any SQL syntax, such as the following
(which are not supported by default):
Copy link
Contributor

@geoffreyclaude geoffreyclaude Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this is slightly misleading: it reads as if the RelationPlanner is what now allows extending expressions and types (and relations). Maybe something like:

In addition to the existing expression and types extension points, this new API now allows extending FROM clauses, leading DataFusion to support almost any SQL syntax, such as the following (which are not supported by default):

But reworded to be less of a run-on sentence...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great call out. I reworded it in 615affd to this:

DataFusion now has an API for extending the SQL planner for relations, as
explained in the [Extending SQL in DataFusion Blog]. In addition to the existing
expression and types extension points, this new API now allows extending FROM
clauses. Using these APIs it is straightforward to provide SQL support for
almost any dialect, including vendor-specific syntax. Example use cases include:

[Apache Comet]: https://datafusion.apache.org/comet/
[mbutrovich]: https://github.com/mbutrovich

### Rewritten merge join
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section title looks very similar to the previous one. The start of the first sentence is also identical. Maybe a title that differentiates this section more from the previous one (e.g. "Optimised Output Handling of Merge Join") would be clearer.

Copy link
Contributor Author

@alamb alamb Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- I think this was the result of a bad merge conflict resolution (I had both the revised paragraph and the original). I removed the section in 1345bfb

Copy link
Contributor

@nuno-faria nuno-faria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at the changelog and this PR caught my attention: apache/datafusion#18644. Maybe it could be worth a mention as well.


This release also includes several additional caching improvements.

A new statistics cache for Parquet Metadata avoids repeatedly (re)calculating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe "Parquet Metadata" -> "File Metadata"? Since there is also a separate cache for the Parquet metadata itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- Fixed in e9308d4

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Jan 24, 2026

I was looking at the changelog and this PR caught my attention: apache/datafusion#18644. Maybe it could be worth a mention as well.

@nuno-faria another great call. I have added a section in 4e24b1f

Screenshot 2026-01-24 at 7 54 23 AM

Perhaps @2010YOUY01 can verify if I got the summary correct

[Variant shredding]: https://github.com/apache/datafusion/issues/16116
[PhysicalExprAdapter]: https://docs.rs/datafusion/52.0.0/datafusion/physical_expr_adapter/trait.PhysicalExprAdapter.html

### Sort Pushdown to Scans
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blog post for the DataFusion 52.0.0 release

10 participants