From 68abcfb57d7251c7ca52919b0c1e842b9be3144b Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 8 Jan 2026 17:41:52 -0500
Subject: [PATCH 01/26] Initial draft (coded with codex)

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 337 +++++++++++++++++++
 1 file changed, 337 insertions(+)
 create mode 100644 content/blog/2026-01-08-datafusion-52.0.0.md
diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
new file mode 100644
index 00000000..52ff7fa7
--- /dev/null
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -0,0 +1,337 @@
+---
+layout: post
+title: Apache DataFusion 52.0.0 Released
+date: 2026-01-08
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+## Introduction
+
+We are proud to announce the release of [DataFusion 52.0.0]. This post highlights
+some of the major improvements since [DataFusion 51.0.0]. The complete list of
+changes is available in the [changelog]. Thanks to the [120 contributors] for
+making this release possible.
+
+TODO: confirm the release date for 52.0.0 and update the front matter if needed.
+
+[DataFusion 52.0.0]: https://crates.io/crates/datafusion/52.0.0
+[DataFusion 51.0.0]: https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/
+[changelog]: https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md
+[120 contributors]: https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md#credits
+
+## Performance Improvements 🚀
+
+We continue to make significant performance improvements in DataFusion, both in
+the core engine and in the Parquet reader. This release includes faster `CASE`
+expressions, better hash performance for string types, and continued string
+function optimizations.
+
+### Performance Chart (TODO)
+
+TODO: add the 52.0.0 performance chart and update the caption.
+
+<img
+src="/blog/images/datafusion-52.0.0/performance_over_time_clickbench.png"
+width="100%"
+class="img-responsive"
+alt="Performance over time"
+/>
+
+**Figure 1**: TODO: update caption for 52.0.0 benchmarking results.
+
+## Major Features ✨
+
+### Arrow IPC Stream file support
+
+DataFusion can now read Arrow IPC stream files ([#18457]). This expands
+interoperability with systems that emit Arrow streams directly, making it
+simpler to ingest Arrow-native data without conversion.
+
+Example (TODO: confirm exact syntax for IPC stream format selection):
+
+```sql
+-- TODO: confirm whether the format name is `arrow`, `ipc_stream`, or implicit.
+CREATE EXTERNAL TABLE ipc_events
+STORED AS ARROW
+LOCATION 's3://bucket/events.arrow';
+```
+
+Related PRs: [#18457]
+
+[#18457]: https://github.com/apache/datafusion/pull/18457
+
+### Faster `CASE` expression evaluation
+
+DataFusion 52 completes major work from the CASE performance epic ([#18075]).
+Lookup-table based evaluation avoids repeated expression evaluation and reduces
+branching overhead, accelerating common ETL patterns.
+
+Example:
+
+```sql
+SELECT
+  CASE
+    WHEN status IN ('NEW', 'READY', 'STAGED') THEN 'PENDING'
+    WHEN status IN ('DONE', 'COMPLETE') THEN 'FINISHED'
+    ELSE 'OTHER'
+  END AS status_bucket,
+  count(*)
+FROM jobs
+GROUP BY 1;
+```
+
+Related PRs: [#18183]
+
+[#18075]: https://github.com/apache/datafusion/issues/18075
+[#18183]: https://github.com/apache/datafusion/pull/18183
+
+### Extensible SQL planning with relation planner extensions
+
+DataFusion now supports relation planner extensions for custom SQL syntax and
+planning logic ([#17824], [#17843]). This lets downstream projects inject their
+own planning behavior without forking the SQL planner, which is critical for
+dialect extensions and custom table references.
+
+Diagram:
+
+```
+SQL text
+  |  (custom relation planner extension)
+  v
+Logical plan
+  |  (DataFusion optimizers)
+  v
+Physical plan
+```
+
+TODO: include a short Rust snippet showing how to register a relation planner
+extension once the final API example is confirmed.
+
+Related PRs: [#17843]
+
+[#17824]: https://github.com/apache/datafusion/issues/17824
+[#17843]: https://github.com/apache/datafusion/pull/17843
+
+### ListingTable object store usage improvements
+
+ListingTable improvements continue to reduce object store I/O and planning
+latency for partitioned datasets ([#17214]). DataFusion now normalizes partition
+and flat listings, enables a memory-bound list-files cache by default, and
+makes the cache prefix-aware for partition pruning.
+
+Diagram:
+
+```
+Object store LIST
+  |  (normalized listing + cache)
+  v
+Partitioned files
+  |  (planner)
+  v
+Execution plan
+```
+
+Related PRs: [#18146], [#18855], [#19366], [#19298], [#18971]
+
+[#17214]: https://github.com/apache/datafusion/issues/17214
+[#18146]: https://github.com/apache/datafusion/pull/18146
+[#18855]: https://github.com/apache/datafusion/pull/18855
+[#19366]: https://github.com/apache/datafusion/pull/19366
+[#19298]: https://github.com/apache/datafusion/pull/19298
+[#18971]: https://github.com/apache/datafusion/pull/18971
+
+### Statistics cache improvements
+
+The statistics cache has been improved to make pruning and planning more
+reliable in repeated workloads ([#19051]). DataFusion now exposes a
+`statistics_cache` function and improves cache memory behavior for listing
+workflows, making it easier to diagnose cache contents and reduce repeated I/O.
+
+Example (TODO: confirm the function signature and output schema):
+
+```sql
+-- TODO: confirm the function name and arguments.
+SELECT * FROM statistics_cache('my_table');
+```
+
+Related PRs: [#19054], [#18855], [#18971]
+
+[#19051]: https://github.com/apache/datafusion/issues/19051
+[#19054]: https://github.com/apache/datafusion/pull/19054
+
+### Pushdown expression evaluation via PhysicalExprAdapter
+
+DataFusion now pushes down expression evaluation into TableProviders using the
+PhysicalExprAdapter, replacing the older SchemaAdapter approach ([#14993],
+[#16800]). This enables richer pushdown (expressions and projections) and
+improves consistency between logical and physical planning.
+
+Diagram:
+
+```
+SQL filter/projection
+  |  (PhysicalExprAdapter)
+  v
+TableProvider pushdown
+  |  (scan)
+  v
+Reduced data
+```
+
+Related PRs: [#18998], [#19345]
+
+[#14993]: https://github.com/apache/datafusion/issues/14993
+[#16800]: https://github.com/apache/datafusion/issues/16800
+[#18998]: https://github.com/apache/datafusion/pull/18998
+[#19345]: https://github.com/apache/datafusion/pull/19345
+
+### Hash join build-side pushdown
+
+DataFusion can now push down build-side hash tables from HashJoinExec into scans
+([#17171]). When the build side is small, DataFusion converts the hash table to
+an `IN` list or hash lookup that can be evaluated during scans, reducing the
+join input size early.
+
+Example:
+
+```sql
+SELECT *
+FROM orders o
+JOIN small_dim d
+ON o.dim_id = d.id;
+```
+
+TODO: include a physical plan snippet that shows the pushdown filter once a
+canonical example is selected.
+
+Related PRs: [#18393]
+
+[#17171]: https://github.com/apache/datafusion/issues/17171
+[#18393]: https://github.com/apache/datafusion/pull/18393
+
+### Sort pushdown to sources
+
+DataFusion now supports sort pushdown into data sources, allowing scans to
+return sorted data or leverage reversed row groups when possible ([#10433],
+[#19064]). This reduces memory pressure and can eliminate explicit sort stages
+for partitioned or pre-sorted data.
+
+Example:
+
+```sql
+SELECT *
+FROM parquet_table
+ORDER BY event_time DESC;
+```
+
+Related PRs: [#19064]
+
+[#10433]: https://github.com/apache/datafusion/issues/10433
+[#19064]: https://github.com/apache/datafusion/pull/19064
+
+### DELETE/UPDATE hooks in TableProvider
+
+TableProvider now includes DELETE and UPDATE hooks, with MemTable providing the
+first implementation ([#19142]). This is an important step toward fully
+featured DML support and enables downstream storage engines to plug in their
+own mutation logic.
+
+Example:
+
+```sql
+DELETE FROM mem_table WHERE status = 'obsolete';
+```
+
+Related PRs: [#19142]
+
+[#19142]: https://github.com/apache/datafusion/pull/19142
+
+### CoalesceBatchesExec removal and integrated batch coalescing
+
+DataFusion continues the work to remove the standalone CoalesceBatchesExec
+operator ([#18779]). Batch coalescing is now integrated into multiple operators,
+reducing plan complexity and avoiding unnecessary batch materialization.
+
+Diagram:
+
+```
+Before:
+  Scan -> CoalesceBatches -> Filter -> CoalesceBatches -> Join
+
+After:
+  Scan -> Filter (coalesce inline) -> Join (coalesce inline)
+```
+
+Related PRs: [#18540], [#18604], [#18630], [#18972], [#19002], [#19342], [#19239]
+
+[#18779]: https://github.com/apache/datafusion/issues/18779
+[#18540]: https://github.com/apache/datafusion/pull/18540
+[#18604]: https://github.com/apache/datafusion/pull/18604
+[#18630]: https://github.com/apache/datafusion/pull/18630
+[#18972]: https://github.com/apache/datafusion/pull/18972
+[#19002]: https://github.com/apache/datafusion/pull/19002
+[#19342]: https://github.com/apache/datafusion/pull/19342
+[#19239]: https://github.com/apache/datafusion/pull/19239
+
+## Upgrade Guide and Changelog
+
+Upgrading to 52.0.0 should be straightforward for most users. Please review the
+[Upgrade Guide]
+for details on breaking changes and code snippets to help with the transition.
+For a comprehensive list of all changes, please refer to the [changelog].
+
+## About DataFusion
+
+[Apache DataFusion] is an extensible query engine, written in [Rust], that uses
+[Apache Arrow] as its in-memory format. DataFusion is used by developers to
+create new, fast, data-centric systems such as databases, dataframe libraries,
+and machine learning and streaming applications. While [DataFusion's primary
+design goal] is to accelerate the creation of other data-centric systems, it
+provides a reasonable experience directly out of the box as a [dataframe
+library], [Python library], and [command-line SQL tool].
+
+[apache datafusion]: https://datafusion.apache.org/
+[rust]: https://www.rust-lang.org/
+[apache arrow]: https://arrow.apache.org
+[DataFusion's primary design goal]: https://datafusion.apache.org/user-guide/introduction.html#project-goals
+[dataframe library]: https://datafusion.apache.org/user-guide/dataframe.html
+[python library]: https://datafusion.apache.org/python/
+[command-line SQL tool]: https://datafusion.apache.org/user-guide/cli/
+[Upgrade Guide]: https://datafusion.apache.org/library-user-guide/upgrading.html
+
+## How to Get Involved
+
+DataFusion is not a project built or driven by a single person, company, or
+foundation. Rather, our community of users and contributors works together to
+build a shared technology that none of us could have built alone.
+
+If you are interested in joining us, we would love to have you. You can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests, or code. A list of open issues suitable for beginners is [here], and you
+can find out how to reach us on the [communication doc].
+
+[here]: https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
+[communication doc]: https://datafusion.apache.org/contributor-guide/communication.html

From 1686945ee443f9aaec5c3d5e07697f5a9ee9da0e Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 10 Jan 2026 14:55:49 -0500
Subject: [PATCH 02/26] updates

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 52ff7fa7..7fdf9016 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -270,9 +270,14 @@ Related PRs: [#19142]
 
 ### CoalesceBatchesExec removal and integrated batch coalescing
 
-DataFusion continues the work to remove the standalone CoalesceBatchesExec
-operator ([#18779]). Batch coalescing is now integrated into multiple operators,
-reducing plan complexity and avoiding unnecessary batch materialization.
+DataFusion continues the work from the CoalesceBatchesExec epic ([#18779]). The
+standalone `CoalesceBatchesExec` operator existed to ensure batches were large
+enough for vectorized execution, and it was inserted after filter-like
+operators such as `FilterExec`, `HashJoinExec`, and `RepartitionExec`. However,
+it also blocked other optimizations (like pushing limits through joins) and
+made optimizer rules more complex. This release integrates coalescing into the
+operators themselves and relies on Arrow's coalesce kernels, reducing plan
+complexity while keeping batch sizes efficient.
 
 Diagram:
 
@@ -285,6 +290,8 @@ After:
 ```
 
 Related PRs: [#18540], [#18604], [#18630], [#18972], [#19002], [#19342], [#19239]
+Thanks to [Tim-53], [Dandandan], [jizezhang], and [feniljain] for implementing
+this feature.
 
 [#18779]: https://github.com/apache/datafusion/issues/18779
 [#18540]: https://github.com/apache/datafusion/pull/18540
@@ -294,6 +301,10 @@ Related PRs: [#18540], [#18604], [#18630], [#18972], [#19002], [#19342], [#19239
 [#19002]: https://github.com/apache/datafusion/pull/19002
 [#19342]: https://github.com/apache/datafusion/pull/19342
 [#19239]: https://github.com/apache/datafusion/pull/19239
+[Tim-53]: https://github.com/Tim-53
+[Dandandan]: https://github.com/Dandandan
+[jizezhang]: https://github.com/jizezhang
+[feniljain]: https://github.com/feniljain
 
 ## Upgrade Guide and Changelog
 

From 3c7dd6a5e9715a015f0d0307ee57a4c387424b37 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sun, 18 Jan 2026 22:30:44 -0500
Subject: [PATCH 03/26] updates

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 195 ++++++++++++-------
 1 file changed, 120 insertions(+), 75 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 7fdf9016..5a7f720b 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -27,11 +27,9 @@ limitations under the License.
 
 [TOC]
 
-## Introduction
-
 We are proud to announce the release of [DataFusion 52.0.0]. This post highlights
 some of the major improvements since [DataFusion 51.0.0]. The complete list of
-changes is available in the [changelog]. Thanks to the [120 contributors] for
+changes is available in the [changelog]. Thanks to the [121 contributors] for
 making this release possible.
 
 TODO: confirm the release date for 52.0.0 and update the front matter if needed.
@@ -43,10 +41,12 @@ TODO: confirm the release date for 52.0.0 and update the front matter if needed.
 
 ## Performance Improvements 🚀
 
-We continue to make significant performance improvements in DataFusion, both in
-the core engine and in the Parquet reader. This release includes faster `CASE`
-expressions, better hash performance for string types, and continued string
-function optimizations.
+We continue to make significant performance improvements in DataFusion. This
+release includes faster `CASE` expressions (see below), a new SortMergeJoin,
+automatic caching of metadata, statistics, and listing results for ListingTable,
+improved hashing and grouping performance for string types, and string function
+optimizations.
+
 
 ### Performance Chart (TODO)
 
@@ -61,26 +61,6 @@ alt="Performance over time"
 
 **Figure 1**: TODO: update caption for 52.0.0 benchmarking results.
 
-## Major Features ✨
-
-### Arrow IPC Stream file support
-
-DataFusion can now read Arrow IPC stream files ([#18457]). This expands
-interoperability with systems that emit Arrow streams directly, making it
-simpler to ingest Arrow-native data without conversion.
-
-Example (TODO: confirm exact syntax for IPC stream format selection):
-
-```sql
--- TODO: confirm whether the format name is `arrow`, `ipc_stream`, or implicit.
-CREATE EXTERNAL TABLE ipc_events
-STORED AS ARROW
-LOCATION 's3://bucket/events.arrow';
-```
-
-Related PRs: [#18457]
-
-[#18457]: https://github.com/apache/datafusion/pull/18457
 
 ### Faster `CASE` expression evaluation
 
@@ -106,80 +86,145 @@ Related PRs: [#18183]
 
 [#18075]: https://github.com/apache/datafusion/issues/18075
 [#18183]: https://github.com/apache/datafusion/pull/18183
+[#18487]: https://github.com/apache/datafusion/issues/18487
+[#18875]: https://github.com/apache/datafusion/pull/18875
 
-### Extensible SQL planning with relation planner extensions
+### Rewritten merge join
 
-DataFusion now supports relation planner extensions for custom SQL syntax and
-planning logic ([#17824], [#17843]). This lets downstream projects inject their
-own planning behavior without forking the SQL planner, which is critical for
-dialect extensions and custom table references.
+DataFusion 52 includes a rewrite of the sort-merge join output buffering to
+avoid excessive `concat_batches` work and to use `BatchCoalescer` internally and
+for final output. This change targets pathological slowdowns like the reported
+LeftAnti join case in [#18487], which also affected Comet workloads that rely on
+SMJ. Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
+minutes to milliseconds) while leaving most other queries unchanged or modestly
+faster, and the update is fully internal with no user-facing API changes.
 
-Diagram:
 
-```
-SQL text
-  |  (custom relation planner extension)
-  v
-Logical plan
-  |  (DataFusion optimizers)
-  v
-Physical plan
-```
+### Caching Improvements
 
-TODO: include a short Rust snippet showing how to register a relation planner
-extension once the final API example is confirmed.
+DataFusion also includes several additional caching improvements in this release.
 
-Related PRs: [#17843]
+First it includes a new statistics cache for Parquet Metadata that avoids repeatedly
+calculating statistics for Parquet backed files. This significantly improves
+planning time for certain queries. You can see the contents of the new cache using the
+[statistics_cache] function in the CLI:
 
-[#17824]: https://github.com/apache/datafusion/issues/17824
-[#17843]: https://github.com/apache/datafusion/pull/17843
+[statistics_cache]: https://datafusion.apache.org/user-guide/cli/functions.html#statistics-cache
 
-### ListingTable object store usage improvements
 
-ListingTable improvements continue to reduce object store I/O and planning
-latency for partitioned datasets ([#17214]). DataFusion now normalizes partition
-and flat listings, enables a memory-bound list-files cache by default, and
-makes the cache prefix-aware for partition pruning.
+```sql
+select * from statistics_cache();
++------------------+---------------------+-----------------+------------------------+---------+-----------------+-------------+--------------------+-----------------------+
+| path             | file_modified       | file_size_bytes | e_tag                  | version | num_rows        | num_columns | table_size_bytes   | statistics_size_bytes |
++------------------+---------------------+-----------------+------------------------+---------+-----------------+-------------+--------------------+-----------------------+
+| .../hits.parquet | 2022-06-25T22:22:22 | 14779976446     | 0-5e24d1ee16380-370f48 | NULL    | Exact(99997497) | 105         | Exact(36445943240) | 0                     |
++------------------+---------------------+-----------------+------------------------+---------+-----------------+-------------+--------------------+-----------------------+
+```
+Related PRs: [#18971], [#19054]
 
-Diagram:
+[#18971]: https://github.com/apache/datafusion/pull/18971
+[#19054]: https://github.com/apache/datafusion/pull/19054
 
-```
-Object store LIST
-  |  (normalized listing + cache)
-  v
-Partitioned files
-  |  (planner)
-  v
-Execution plan
+
+DataFusion and includes a memory-bound, prefix aware list-files cache by
+default. You can see the contents of the new cache using the [list_files_cache]
+function in the CLI:
+
+[list_files_cache]: https://datafusion.apache.org/user-guide/cli/functions.html#list-files-cache
+
+```sql
+create external table overturemaps
+stored as parquet
+location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/type=infrastructure';
+0 row(s) fetched.
+> select table, path, metadata_size_bytes, expires_in, unnest(metadata_list)['file_size_bytes'] as file_size_bytes, unnest(metadata_list)['e_tag'] as e_tag from list_files_cache() limit 10;
++--------------+-----------------------------------------------------+---------------------+-----------------------------------+-----------------+---------------------------------------+
+| table        | path                                                | metadata_size_bytes | expires_in                        | file_size_bytes | e_tag                                 |
++--------------+-----------------------------------------------------+---------------------+-----------------------------------+-----------------+---------------------------------------+
+| overturemaps | release/2025-12-17.0/theme=base/type=infrastructure | 2750                | 0 days 0 hours 0 mins 25.264 secs | 999055952       | "35fc8fbe8400960b54c66fbb408c48e8-60" |
+| overturemaps | release/2025-12-17.0/theme=base/type=infrastructure | 2750                | 0 days 0 hours 0 mins 25.264 secs | 975592768       | "8a16e10b722681cdc00242564b502965-59" |
+...
+| overturemaps | release/2025-12-17.0/theme=base/type=infrastructure | 2750                | 0 days 0 hours 0 mins 25.264 secs | 1016732378      | "6d70857a0473ed9ed3fc6e149814168b-61" |
+| overturemaps | release/2025-12-17.0/theme=base/type=infrastructure | 2750                | 0 days 0 hours 0 mins 25.264 secs | 991363784       | "c9cafb42fcbb413f851691c895dd7c2b-60" |
+| overturemaps | release/2025-12-17.0/theme=base/type=infrastructure | 2750                | 0 days 0 hours 0 mins 25.264 secs | 1032469715      | "7540252d0d67158297a67038a3365e0f-62" |
++--------------+-----------------------------------------------------+---------------------+-----------------------------------+-----------------+---------------------------------------+
 ```
 
-Related PRs: [#18146], [#18855], [#19366], [#19298], [#18971]
+Related PRs: [#18146], [#18855], [#19366], [#19298], 
 
-[#17214]: https://github.com/apache/datafusion/issues/17214
+[Epic #17214]: https://github.com/apache/datafusion/issues/17214
 [#18146]: https://github.com/apache/datafusion/pull/18146
 [#18855]: https://github.com/apache/datafusion/pull/18855
 [#19366]: https://github.com/apache/datafusion/pull/19366
 [#19298]: https://github.com/apache/datafusion/pull/19298
-[#18971]: https://github.com/apache/datafusion/pull/18971
 
-### Statistics cache improvements
 
-The statistics cache has been improved to make pruning and planning more
-reliable in repeated workloads ([#19051]). DataFusion now exposes a
-`statistics_cache` function and improves cache memory behavior for listing
-workflows, making it easier to diagnose cache contents and reduce repeated I/O.
 
-Example (TODO: confirm the function signature and output schema):
+## Major Features ✨
+
+### Arrow IPC Stream file support
+
+DataFusion can now read Arrow IPC stream files ([#18457]). This expands
+interoperability with systems that emit Arrow streams directly, making it
+simpler to ingest Arrow-native data without conversion. Thanks to [corasaurus-hex]
+for implementing this feature.
 
 ```sql
--- TODO: confirm the function name and arguments.
-SELECT * FROM statistics_cache('my_table');
+CREATE EXTERNAL TABLE ipc_events
+STORED AS ARROW
+LOCATION 's3://bucket/events.arrow';
 ```
 
-Related PRs: [#19054], [#18855], [#18971]
+Related PRs: [#18457]
 
-[#19051]: https://github.com/apache/datafusion/issues/19051
-[#19054]: https://github.com/apache/datafusion/pull/19054
+[#18457]: https://github.com/apache/datafusion/pull/18457
+[corasaurus-hex]: https://github.com/corasaurus-hex
+
+### Extensible SQL planning with relation planner extensions
+
+DataFusion now supports relation planner extensions for custom SQL syntax and
+planning logic ([#17824], [#17843]). This lets downstream projects inject their
+own planning behavior without forking the SQL planner. As explained in the
+[Extending SQL in DataFusion Blog], you can now customize DataFusion with
+support for almost any SQL syntax, such as:
+
+```sql
+-- Postgres-style JSON operators
+SELECT payload->'user'->>'id' FROM logs;
+-- MySQL-specific types
+SELECT DATETIME '2001-01-01 18:00:00';
+-- Statistical sampling
+SELECT * FROM sensor_data TABLESAMPLE BERNOULLI(10 PERCENT);
+```
+[Extending SQL in DataFusion Blog]: https://datafusion.apache.org/blog/2026/01/12/extending-sql/
+
+Thanks to [geoffreyclaude] for implementing relation planner extensions, and to
+[theirix], [alamb], [NGA-TRAN], and [gabotechs] for reviews and feedback that
+shaped the design.
+
+
+<figure>
+  <img src="/blog/images/extending-sql/architecture.svg" alt="DataFusion SQL processing pipeline: SQL String flows through Parser to AST, then SqlToRel (with Extension Planners) to LogicalPlan, then PhysicalPlanner to ExecutionPlan" width="100%" class="img-responsive">
+  <figcaption>
+    <b>Figure 1:</b> 
+        SQL processing pipeline with relation planner extensions from the 
+        <a href="https://datafusion.apache.org/blog/2026/01/12/extending-sql/">Extending SQL in DataFusion Blog</a>. 
+  </figcaption>
+</figure>
+
+
+
+
+
+Related PRs: [#17843]
+
+[#17824]: https://github.com/apache/datafusion/issues/17824
+[#17843]: https://github.com/apache/datafusion/pull/17843
+[geoffreyclaude]: https://github.com/geoffreyclaude
+[theirix]: https://github.com/theirix
+[alamb]: https://github.com/alamb
+[NGA-TRAN]: https://github.com/NGA-TRAN
+[gabotechs]: https://github.com/gabotechs
 
 ### Pushdown expression evaluation via PhysicalExprAdapter
 

From 2cc1f59b5d82c0984cd90035b816699759fc6b87 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Mon, 19 Jan 2026 13:48:26 -0500
Subject: [PATCH 04/26] Update sql planning

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 25 +++++++-------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 5a7f720b..2464d909 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -182,11 +182,10 @@ Related PRs: [#18457]
 
 ### Extensible SQL planning with relation planner extensions
 
-DataFusion now supports relation planner extensions for custom SQL syntax and
-planning logic ([#17824], [#17843]). This lets downstream projects inject their
-own planning behavior without forking the SQL planner. As explained in the
-[Extending SQL in DataFusion Blog], you can now customize DataFusion with
-support for almost any SQL syntax, such as:
+DataFusion now has an API for extending the SQL planner for relations. As
+explained in the [Extending SQL in DataFusion Blog], with this new API you can
+now customize DataFusion with support for almost any SQL syntax, such as:
+
 
 ```sql
 -- Postgres-style JSON operators
@@ -198,11 +197,6 @@ SELECT * FROM sensor_data TABLESAMPLE BERNOULLI(10 PERCENT);
 ```
 [Extending SQL in DataFusion Blog]: https://datafusion.apache.org/blog/2026/01/12/extending-sql/
 
-Thanks to [geoffreyclaude] for implementing relation planner extensions, and to
-[theirix], [alamb], [NGA-TRAN], and [gabotechs] for reviews and feedback that
-shaped the design.
-
-
 <figure>
   <img src="/blog/images/extending-sql/architecture.svg" alt="DataFusion SQL processing pipeline: SQL String flows through Parser to AST, then SqlToRel (with Extension Planners) to LogicalPlan, then PhysicalPlanner to ExecutionPlan" width="100%" class="img-responsive">
   <figcaption>
@@ -212,13 +206,10 @@ shaped the design.
   </figcaption>
 </figure>
 
+Thanks to [geoffreyclaude] for implementing relation planner extensions, and to
+[theirix], [alamb], [NGA-TRAN], and [gabotechs] for reviews and feedback on the
+design.  Related PRs: [#17843]
 
-
-
-
-Related PRs: [#17843]
-
-[#17824]: https://github.com/apache/datafusion/issues/17824
 [#17843]: https://github.com/apache/datafusion/pull/17843
 [geoffreyclaude]: https://github.com/geoffreyclaude
 [theirix]: https://github.com/theirix
@@ -226,7 +217,7 @@ Related PRs: [#17843]
 [NGA-TRAN]: https://github.com/NGA-TRAN
 [gabotechs]: https://github.com/gabotechs
 
-### Pushdown expression evaluation via PhysicalExprAdapter
+### Pushdown Expression Evaluation via PhysicalExprAdapter
 
 DataFusion now pushes down expression evaluation into TableProviders using the
 PhysicalExprAdapter, replacing the older SchemaAdapter approach ([#14993],

From ccc5d4296951810f48e133fe70948d34c4b4f9bd Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Tue, 20 Jan 2026 12:25:03 -0500
Subject: [PATCH 05/26] Apply suggestions from code review

Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
---
 content/blog/2026-01-08-datafusion-52.0.0.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 5a7f720b..51b4693d 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -42,7 +42,7 @@ TODO: confirm the release date for 52.0.0 and update the front matter if needed.
 ## Performance Improvements 🚀
 
 We continue to make significant performance improvements in DataFusion. This
-release includes faster `CASE` expressions (see below), a new SortMergeJoin,
+release includes faster `CASE` expressions (see below), SortMergeJoin buffering optimizations,
 automatic caching of metadata, statistics, and listing results for ListingTable,
 improved hashing and grouping performance for string types, and string function
 optimizations.
@@ -64,7 +64,7 @@ alt="Performance over time"
 
 ### Faster `CASE` expression evaluation
 
-DataFusion 52 completes major work from the CASE performance epic ([#18075]).
+DataFusion 52 completes major work from the `CASE` performance epic ([#18075]).
 Lookup-table based evaluation avoids repeated expression evaluation and reduces
 branching overhead, accelerating common ETL patterns.
 
@@ -91,7 +91,7 @@ Related PRs: [#18183]
 
 ### Rewritten merge join
 
-DataFusion 52 includes a rewrite of the sort-merge join output buffering to
+DataFusion 52 includes a rewrite of the sort-merge join (SMJ) output buffering to
 avoid excessive `concat_batches` work and to use `BatchCoalescer` internally and
 for final output. This change targets pathological slowdowns like the reported
 LeftAnti join case in [#18487], which also affected Comet workloads that rely on

From 81954c512cefe8a9f9a2b16d45e059b88e1832d5 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 18:45:06 -0500
Subject: [PATCH 06/26] Updates

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 158 ++++++++++---------
 1 file changed, 80 insertions(+), 78 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 2464d909..acc12b6c 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -41,71 +41,54 @@ TODO: confirm the release date for 52.0.0 and update the front matter if needed.
 
 ## Performance Improvements 🚀
 
-We continue to make significant performance improvements in DataFusion. This
-release includes faster `CASE` expressions (see below), a new SortMergeJoin,
-automatic caching of metadata, statistics, and listing results for ListingTable,
-improved hashing and grouping performance for string types, and string function
-optimizations.
-
-
-### Performance Chart (TODO)
-
-TODO: add the 52.0.0 performance chart and update the caption.
-
-<img
-src="/blog/images/datafusion-52.0.0/performance_over_time_clickbench.png"
-width="100%"
-class="img-responsive"
-alt="Performance over time"
-/>
-
-**Figure 1**: TODO: update caption for 52.0.0 benchmarking results.
-
+We continue to make significant performance improvements in DataFusion as explained below.
 
 ### Faster `CASE` expression evaluation
 
-DataFusion 52 completes major work from the CASE performance epic ([#18075]).
-Lookup-table based evaluation avoids repeated expression evaluation and reduces
-branching overhead, accelerating common ETL patterns.
-
-Example:
+DataFusion 52 now has lookup-table based evaluation for certain `CASE`
+expressions to avoid repeated evaluation for accelerating common ETL patterns such
+as 
 
 ```sql
-SELECT
-  CASE
-    WHEN status IN ('NEW', 'READY', 'STAGED') THEN 'PENDING'
-    WHEN status IN ('DONE', 'COMPLETE') THEN 'FINISHED'
-    ELSE 'OTHER'
-  END AS status_bucket,
-  count(*)
-FROM jobs
-GROUP BY 1;
+CASE company
+    WHEN 1 THEN 'Apple'
+    WHEN 5 THEN 'Samsung'
+    WHEN 2 THEN 'Motorola'
+    WHEN 3 THEN 'LG'
+    ELSE 'Other'
+END
 ```
 
-Related PRs: [#18183]
+This is the final contains the final work in our CASE performance epic
+([#18075]). Related PRs [#18183]. Thanks to [rluvaton] and [pepijnve] for 
+the implementation.
+
+[rluvaton]: https://github.com/rluvaton
+[pepijnve]: https://github.com/pepijnve
+
 
 [#18075]: https://github.com/apache/datafusion/issues/18075
 [#18183]: https://github.com/apache/datafusion/pull/18183
-[#18487]: https://github.com/apache/datafusion/issues/18487
-[#18875]: https://github.com/apache/datafusion/pull/18875
 
-### Rewritten merge join
+### New Merge Join
 
-DataFusion 52 includes a rewrite of the sort-merge join output buffering to
-avoid excessive `concat_batches` work and to use `BatchCoalescer` internally and
-for final output. This change targets pathological slowdowns like the reported
-LeftAnti join case in [#18487], which also affected Comet workloads that rely on
-SMJ. Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
-minutes to milliseconds) while leaving most other queries unchanged or modestly
-faster, and the update is fully internal with no user-facing API changes.
+DataFusion 52 includes a rewrite of the sort-merge join, leading to speedups of three orders of magnitude in some cases. 
+This change targets pathological slowdowns like the reported
+LeftAnti join case in [#18487], which also affected [Apache Comet] workloads. 
+Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
+minutes to milliseconds) while leaving other queries unchanged or modestly
+faster.
 
+[#18487]: https://github.com/apache/datafusion/issues/18487
+[#18875]: https://github.com/apache/datafusion/pull/18875
+[Apache Comet]: https://datafusion.apache.org/comet/
 
 ### Caching Improvements
 
-DataFusion also includes several additional caching improvements in this release.
+This release also includes several additional caching improvements.
 
 First it includes a new statistics cache for Parquet Metadata that avoids repeatedly
-calculating statistics for Parquet backed files. This significantly improves
+(re) calculating statistics for Parquet backed files. This significantly improves
 planning time for certain queries. You can see the contents of the new cache using the
 [statistics_cache] function in the CLI:
 
@@ -126,9 +109,19 @@ Related PRs: [#18971], [#19054]
 [#19054]: https://github.com/apache/datafusion/pull/19054
 
 
-DataFusion and includes a memory-bound, prefix aware list-files cache by
-default. You can see the contents of the new cache using the [list_files_cache]
-function in the CLI:
+It also includes a prefix aware list-files cache by default which accelerates
+evaluating partition predicates for HIVE partitioned tables.
+
+```sql
+-- Read the hive partitioned dataset from Overture Maps (100s of Parquet files)
+CREATE EXTERNAL TABLE overturemaps
+STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-12-17.0/';
+-- Find all files where the path contains `theme=base without requiring another LIST call
+select count(*) from overturemaps where theme='base';
+```
+
+You can see the
+contents of the new cache using the [list_files_cache] function in the CLI:
 
 [list_files_cache]: https://datafusion.apache.org/user-guide/cli/functions.html#list-files-cache
 
@@ -159,6 +152,34 @@ Related PRs: [#18146], [#18855], [#19366], [#19298],
 [#19298]: https://github.com/apache/datafusion/pull/19298
 
 
+### Filtering with Hash Join pushdown
+
+DataFusion can now push down build-side hash tables from HashJoinExec into scans
+([#17171]). This is sometimes referred to "sideways information passing" (TODO
+reference). The initial DataFusion implement supports pushdown when the build
+side is small . In this case, DataFusion converts the hash table to an `IN` list
+that is pushed down to TableProviders  reducing the join input size early.
+
+Example:
+
+```sql
+SELECT *
+FROM orders o
+JOIN small_dim d
+ON o.dim_id = d.id;
+```
+
+TODO: include a physical plan snippet that shows the pushdown filter once a
+canonical example is selected.
+
+Thanks to [adriangb] for implementing hash join pushdown.
+Related PRs: [#18393]
+
+[#17171]: https://github.com/apache/datafusion/issues/17171
+[#18393]: https://github.com/apache/datafusion/pull/18393
+[adriangb]: https://github.com/adriangb
+
+
 
 ## Major Features ✨
 
@@ -182,10 +203,10 @@ Related PRs: [#18457]
 
 ### Extensible SQL planning with relation planner extensions
 
-DataFusion now has an API for extending the SQL planner for relations. As
-explained in the [Extending SQL in DataFusion Blog], with this new API you can
-now customize DataFusion with support for almost any SQL syntax, such as:
-
+DataFusion now has an API for extending the SQL planner for relations, as
+explained in the [Extending SQL in DataFusion Blog]. With this new API, you can
+customize DataFusion to support almost any SQL syntax, such as the following
+(which are not supported by default) :
 
 ```sql
 -- Postgres-style JSON operators
@@ -236,6 +257,7 @@ TableProvider pushdown
 Reduced data
 ```
 
+Thanks to [adriangb] for implementing PhysicalExprAdapter pushdown support.
 Related PRs: [#18998], [#19345]
 
 [#14993]: https://github.com/apache/datafusion/issues/14993
@@ -243,30 +265,6 @@ Related PRs: [#18998], [#19345]
 [#18998]: https://github.com/apache/datafusion/pull/18998
 [#19345]: https://github.com/apache/datafusion/pull/19345
 
-### Hash join build-side pushdown
-
-DataFusion can now push down build-side hash tables from HashJoinExec into scans
-([#17171]). When the build side is small, DataFusion converts the hash table to
-an `IN` list or hash lookup that can be evaluated during scans, reducing the
-join input size early.
-
-Example:
-
-```sql
-SELECT *
-FROM orders o
-JOIN small_dim d
-ON o.dim_id = d.id;
-```
-
-TODO: include a physical plan snippet that shows the pushdown filter once a
-canonical example is selected.
-
-Related PRs: [#18393]
-
-[#17171]: https://github.com/apache/datafusion/issues/17171
-[#18393]: https://github.com/apache/datafusion/pull/18393
-
 ### Sort pushdown to sources
 
 DataFusion now supports sort pushdown into data sources, allowing scans to
@@ -282,10 +280,12 @@ FROM parquet_table
 ORDER BY event_time DESC;
 ```
 
+Thanks to [zhuqi-lucas] for implementing sort pushdown.
 Related PRs: [#19064]
 
 [#10433]: https://github.com/apache/datafusion/issues/10433
 [#19064]: https://github.com/apache/datafusion/pull/19064
+[zhuqi-lucas]: https://github.com/zhuqi-lucas
 
 ### DELETE/UPDATE hooks in TableProvider
 
@@ -300,9 +300,11 @@ Example:
 DELETE FROM mem_table WHERE status = 'obsolete';
 ```
 
+Thanks to [ethan-tyler] for implementing the TableProvider DELETE/UPDATE hooks.
 Related PRs: [#19142]
 
 [#19142]: https://github.com/apache/datafusion/pull/19142
+[ethan-tyler]: https://github.com/ethan-tyler
 
 ### CoalesceBatchesExec removal and integrated batch coalescing
 

From 781cd621f7e37d9ca89318e7b78cf598b576f485 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 18:47:21 -0500
Subject: [PATCH 07/26] acknowledgments

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index acc12b6c..e895c052 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -78,10 +78,12 @@ LeftAnti join case in [#18487], which also affected [Apache Comet] workloads.
 Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
 minutes to milliseconds) while leaving other queries unchanged or modestly
 faster.
+Thanks to [mbutrovich] for implementing the merge join rewrite.
 
 [#18487]: https://github.com/apache/datafusion/issues/18487
 [#18875]: https://github.com/apache/datafusion/pull/18875
 [Apache Comet]: https://datafusion.apache.org/comet/
+[mbutrovich]: https://github.com/mbutrovich
 
 ### Caching Improvements
 
@@ -103,10 +105,13 @@ select * from statistics_cache();
 | .../hits.parquet | 2022-06-25T22:22:22 | 14779976446     | 0-5e24d1ee16380-370f48 | NULL    | Exact(99997497) | 105         | Exact(36445943240) | 0                     |
 +------------------+---------------------+-----------------+------------------------+---------+-----------------+-------------+--------------------+-----------------------+
 ```
+Thanks to [bharath-techie] and [nuno-faria] for implementing the statistics cache.
 Related PRs: [#18971], [#19054]
 
 [#18971]: https://github.com/apache/datafusion/pull/18971
 [#19054]: https://github.com/apache/datafusion/pull/19054
+[bharath-techie]: https://github.com/bharath-techie
+[nuno-faria]: https://github.com/nuno-faria
 
 
 It also includes a prefix aware list-files cache by default which accelerates
@@ -143,6 +148,7 @@ location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/type=infra
 +--------------+-----------------------------------------------------+---------------------+-----------------------------------+-----------------+---------------------------------------+
 ```
 
+Thanks to [BlakeOrth] and [Yuvraj-cyborg] for implementing the list-files cache work.
 Related PRs: [#18146], [#18855], [#19366], [#19298], 
 
 [Epic #17214]: https://github.com/apache/datafusion/issues/17214
@@ -150,6 +156,8 @@ Related PRs: [#18146], [#18855], [#19366], [#19298],
 [#18855]: https://github.com/apache/datafusion/pull/18855
 [#19366]: https://github.com/apache/datafusion/pull/19366
 [#19298]: https://github.com/apache/datafusion/pull/19298
+[BlakeOrth]: https://github.com/BlakeOrth
+[Yuvraj-cyborg]: https://github.com/Yuvraj-cyborg
 
 
 ### Filtering with Hash Join pushdown
@@ -317,15 +325,6 @@ made optimizer rules more complex. This release integrates coalescing into the
 operators themselves and relies on Arrow's coalesce kernels, reducing plan
 complexity while keeping batch sizes efficient.
 
-Diagram:
-
-```
-Before:
-  Scan -> CoalesceBatches -> Filter -> CoalesceBatches -> Join
-
-After:
-  Scan -> Filter (coalesce inline) -> Join (coalesce inline)
-```
 
 Related PRs: [#18540], [#18604], [#18630], [#18972], [#19002], [#19342], [#19239]
 Thanks to [Tim-53], [Dandandan], [jizezhang], and [feniljain] for implementing

From 790b65859731309059a0d821d479c0029f99ba3d Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 19:02:55 -0500
Subject: [PATCH 08/26] update

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 33 +++++++-------------
 1 file changed, 12 insertions(+), 21 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index e895c052..dbe2117b 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -160,35 +160,26 @@ Related PRs: [#18146], [#18855], [#19366], [#19298],
 [Yuvraj-cyborg]: https://github.com/Yuvraj-cyborg
 
 
-### Filtering with Hash Join pushdown
+### Improved Hash Join Filter Pushdown
 
-DataFusion can now push down build-side hash tables from HashJoinExec into scans
-([#17171]). This is sometimes referred to "sideways information passing" (TODO
-reference). The initial DataFusion implement supports pushdown when the build
-side is small . In this case, DataFusion converts the hash table to an `IN` list
-that is pushed down to TableProviders  reducing the join input size early.
+Starting in DataFusion 51, filtering information from `HashJoinExec` is passed
+dynamically to scans, as explained in the [Dynamic Filtering blog post] using a
+technique referred to as [Sideways Information Passing] in  Database research
+literature. The initial implementation passed min/max values for the join keys.
+DataFusion 52 extends the optimization ([#17171] / [#18393]) to use an `IN` list when the
+build size is small such as when the join is very selective. The `IN` list is
+pushed down to the probe side scan and is used to prune files, row groups, and
+individual rows.  Thanks to [adriangb] for implementing this feature.
 
-Example:
 
-```sql
-SELECT *
-FROM orders o
-JOIN small_dim d
-ON o.dim_id = d.id;
-```
-
-TODO: include a physical plan snippet that shows the pushdown filter once a
-canonical example is selected.
-
-Thanks to [adriangb] for implementing hash join pushdown.
-Related PRs: [#18393]
+[Sideways Information Passing]: https://dl.acm.org/doi/10.1109/ICDE.2008.4497486
+[Dynamic Filtering blog post]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters
 
 [#17171]: https://github.com/apache/datafusion/issues/17171
 [#18393]: https://github.com/apache/datafusion/pull/18393
 [adriangb]: https://github.com/adriangb
 
 
-
 ## Major Features ✨
 
 ### Arrow IPC Stream file support
@@ -314,7 +305,7 @@ Related PRs: [#19142]
 [#19142]: https://github.com/apache/datafusion/pull/19142
 [ethan-tyler]: https://github.com/ethan-tyler
 
-### CoalesceBatchesExec removal and integrated batch coalescing
+### `CoalesceBatchesExec` Removed
 
 DataFusion continues the work from the CoalesceBatchesExec epic ([#18779]). The
 standalone `CoalesceBatchesExec` operator existed to ensure batches were large

From d63a4de797b1c1a7ef48be2eb593dd8cd9620545 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 19:20:16 -0500
Subject: [PATCH 09/26] updates

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 111 +++++++++----------
 1 file changed, 52 insertions(+), 59 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index dbe2117b..11bf96a5 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -78,7 +78,8 @@ LeftAnti join case in [#18487], which also affected [Apache Comet] workloads.
 Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
 minutes to milliseconds) while leaving other queries unchanged or modestly
 faster.
-Thanks to [mbutrovich] for implementing the merge join rewrite.
+Thanks to [mbutrovich] for implementing the merge join rewrite, with reviews
+from [Dandandan].
 
 [#18487]: https://github.com/apache/datafusion/issues/18487
 [#18875]: https://github.com/apache/datafusion/pull/18875
@@ -105,13 +106,16 @@ select * from statistics_cache();
 | .../hits.parquet | 2022-06-25T22:22:22 | 14779976446     | 0-5e24d1ee16380-370f48 | NULL    | Exact(99997497) | 105         | Exact(36445943240) | 0                     |
 +------------------+---------------------+-----------------+------------------------+---------+-----------------+-------------+--------------------+-----------------------+
 ```
-Thanks to [bharath-techie] and [nuno-faria] for implementing the statistics cache.
+Thanks to [bharath-techie] and [nuno-faria] for implementing the statistics cache,
+with reviews from [martin-g], [alamb], and [alchemist51].
 Related PRs: [#18971], [#19054]
 
 [#18971]: https://github.com/apache/datafusion/pull/18971
 [#19054]: https://github.com/apache/datafusion/pull/19054
 [bharath-techie]: https://github.com/bharath-techie
 [nuno-faria]: https://github.com/nuno-faria
+[martin-g]: https://github.com/martin-g
+[alchemist51]: https://github.com/alchemist51
 
 
 It also includes a prefix aware list-files cache by default which accelerates
@@ -148,7 +152,8 @@ location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/type=infra
 +--------------+-----------------------------------------------------+---------------------+-----------------------------------+-----------------+---------------------------------------+
 ```
 
-Thanks to [BlakeOrth] and [Yuvraj-cyborg] for implementing the list-files cache work.
+Thanks to [BlakeOrth] and [Yuvraj-cyborg] for implementing the list-files cache work,
+with reviews from [gabotechs], [alamb], [alchemist51], [martin-g], and [BlakeOrth].
 Related PRs: [#18146], [#18855], [#19366], [#19298], 
 
 [Epic #17214]: https://github.com/apache/datafusion/issues/17214
@@ -163,21 +168,25 @@ Related PRs: [#18146], [#18855], [#19366], [#19298],
 ### Improved Hash Join Filter Pushdown
 
 Starting in DataFusion 51, filtering information from `HashJoinExec` is passed
-dynamically to scans, as explained in the [Dynamic Filtering blog post] using a
+dynamically to scans, as explained in the [Dynamic Filtering Blog] using a
 technique referred to as [Sideways Information Passing] in  Database research
 literature. The initial implementation passed min/max values for the join keys.
 DataFusion 52 extends the optimization ([#17171] / [#18393]) to use an `IN` list when the
 build size is small such as when the join is very selective. The `IN` list is
 pushed down to the probe side scan and is used to prune files, row groups, and
-individual rows.  Thanks to [adriangb] for implementing this feature.
+individual rows.  Thanks to [adriangb] for implementing this feature, with
+reviews from [LiaCastaneda], [asolimando], [comphead], and [mbutrovich].
 
 
 [Sideways Information Passing]: https://dl.acm.org/doi/10.1109/ICDE.2008.4497486
-[Dynamic Filtering blog post]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters
+[Dynamic Filtering blog]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters
 
 [#17171]: https://github.com/apache/datafusion/issues/17171
 [#18393]: https://github.com/apache/datafusion/pull/18393
 [adriangb]: https://github.com/adriangb
+[LiaCastaneda]: https://github.com/LiaCastaneda
+[asolimando]: https://github.com/asolimando
+[comphead]: https://github.com/comphead
 
 
 ## Major Features ✨
@@ -199,8 +208,12 @@ Related PRs: [#18457]
 
 [#18457]: https://github.com/apache/datafusion/pull/18457
 [corasaurus-hex]: https://github.com/corasaurus-hex
+[Jefffrey]: https://github.com/Jefffrey
+[jdcasale]: https://github.com/jdcasale
+[2010YOUY01]: https://github.com/2010YOUY01
+[timsaucer]: https://github.com/timsaucer
 
-### Extensible SQL planning with relation planner extensions
+### More Extensible SQL Planning with `RelationPlanner`
 
 DataFusion now has an API for extending the SQL planner for relations, as
 explained in the [Extending SQL in DataFusion Blog]. With this new API, you can
@@ -217,15 +230,6 @@ SELECT * FROM sensor_data TABLESAMPLE BERNOULLI(10 PERCENT);
 ```
 [Extending SQL in DataFusion Blog]: https://datafusion.apache.org/blog/2026/01/12/extending-sql/
 
-<figure>
-  <img src="/blog/images/extending-sql/architecture.svg" alt="DataFusion SQL processing pipeline: SQL String flows through Parser to AST, then SqlToRel (with Extension Planners) to LogicalPlan, then PhysicalPlanner to ExecutionPlan" width="100%" class="img-responsive">
-  <figcaption>
-    <b>Figure 1:</b> 
-        SQL processing pipeline with relation planner extensions from the 
-        <a href="https://datafusion.apache.org/blog/2026/01/12/extending-sql/">Extending SQL in DataFusion Blog</a>. 
-  </figcaption>
-</figure>
-
 Thanks to [geoffreyclaude] for implementing relation planner extensions, and to
 [theirix], [alamb], [NGA-TRAN], and [gabotechs] for reviews and feedback on the
 design.  Related PRs: [#17843]
@@ -237,25 +241,13 @@ design.  Related PRs: [#17843]
 [NGA-TRAN]: https://github.com/NGA-TRAN
 [gabotechs]: https://github.com/gabotechs
 
-### Pushdown Expression Evaluation via PhysicalExprAdapter
+### Expression Evaluation Pushdown to Scans 
 
 DataFusion now pushes down expression evaluation into TableProviders using the
 PhysicalExprAdapter, replacing the older SchemaAdapter approach ([#14993],
 [#16800]). This enables richer pushdown (expressions and projections) and
 improves consistency between logical and physical planning.
 
-Diagram:
-
-```
-SQL filter/projection
-  |  (PhysicalExprAdapter)
-  v
-TableProvider pushdown
-  |  (scan)
-  v
-Reduced data
-```
-
 Thanks to [adriangb] for implementing PhysicalExprAdapter pushdown support.
 Related PRs: [#18998], [#19345]
 
@@ -263,35 +255,30 @@ Related PRs: [#18998], [#19345]
 [#16800]: https://github.com/apache/datafusion/issues/16800
 [#18998]: https://github.com/apache/datafusion/pull/18998
 [#19345]: https://github.com/apache/datafusion/pull/19345
+[kosiew]: https://github.com/kosiew
 
-### Sort pushdown to sources
-
-DataFusion now supports sort pushdown into data sources, allowing scans to
-return sorted data or leverage reversed row groups when possible ([#10433],
-[#19064]). This reduces memory pressure and can eliminate explicit sort stages
-for partitioned or pre-sorted data.
+### Sort Pushdown to Scans
 
-Example:
-
-```sql
-SELECT *
-FROM parquet_table
-ORDER BY event_time DESC;
-```
-
-Thanks to [zhuqi-lucas] for implementing sort pushdown.
-Related PRs: [#19064]
+DataFusion can now push sorts all the way data sources, allowing scans to return
+sorted data or leverage reversed row groups when possible ([#10433], [#19064]).
+This allows table provider implementations to take better advantage of existing sort 
+information such as to reorder files or row groups to satisfy `LIMIT` clauses more
+efficiently. Thanks to [zhuqi-lucas] for this feature.  
 
 [#10433]: https://github.com/apache/datafusion/issues/10433
 [#19064]: https://github.com/apache/datafusion/pull/19064
 [zhuqi-lucas]: https://github.com/zhuqi-lucas
 
-### DELETE/UPDATE hooks in TableProvider
+### `TableProvider` supports `DELETE` and `UPDATE` statements
+
+The [TableProvider] trait now includes hooks for `DELETE` and `UPDATE`
+statements and the basic MemTable implements them ([#19142]). This lets
+downstream implementations and storage engines plug in their own mutation logic.
+See [TableProvider::delete_from] and [TableProvider::update] for more details
 
-TableProvider now includes DELETE and UPDATE hooks, with MemTable providing the
-first implementation ([#19142]). This is an important step toward fully
-featured DML support and enables downstream storage engines to plug in their
-own mutation logic.
+[TableProvider]: https://docs.rs/datafusion/52.0.0/datafusion/datasource/trait.TableProvider.html
+[TableProvider::delete_from]: https://docs.rs/datafusion/52.0.0/datafusion/datasource/trait.TableProvider.html#method.delete_from
+[TableProvider::update]: https://docs.rs/datafusion/52.0.0/datafusion/datasource/trait.TableProvider.html#method.update
 
 Example:
 
@@ -299,7 +286,8 @@ Example:
 DELETE FROM mem_table WHERE status = 'obsolete';
 ```
 
-Thanks to [ethan-tyler] for implementing the TableProvider DELETE/UPDATE hooks.
+Thanks to [ethan-tyler] for implementing the TableProvider DELETE/UPDATE hooks,
+with reviews from [alamb] and [adriangb].
 Related PRs: [#19142]
 
 [#19142]: https://github.com/apache/datafusion/pull/19142
@@ -307,19 +295,21 @@ Related PRs: [#19142]
 
 ### `CoalesceBatchesExec` Removed
 
-DataFusion continues the work from the CoalesceBatchesExec epic ([#18779]). The
-standalone `CoalesceBatchesExec` operator existed to ensure batches were large
-enough for vectorized execution, and it was inserted after filter-like
+The standalone `CoalesceBatchesExec` operator existed to ensure batches were
+large enough for vectorized execution, and it was inserted after filter-like
 operators such as `FilterExec`, `HashJoinExec`, and `RepartitionExec`. However,
-it also blocked other optimizations (like pushing limits through joins) and
-made optimizer rules more complex. This release integrates coalescing into the
-operators themselves and relies on Arrow's coalesce kernels, reducing plan
-complexity while keeping batch sizes efficient.
+it also blocked other optimizations (like pushing limits through joins) and made
+optimizer rules more complex. In this release, we have completed integrating
+coalescing into the operators themselves ([#18779]) using Arrow's [coalesce kernel],
+reducing plan complexity while keeping batch sizes efficient. This also allows 
+additional focused optimization work in the kernel, such as [Dandandan]'s recent
+work with filtering in [arrow-rs/#8951].
 
 
 Related PRs: [#18540], [#18604], [#18630], [#18972], [#19002], [#19342], [#19239]
 Thanks to [Tim-53], [Dandandan], [jizezhang], and [feniljain] for implementing
-this feature.
+this feature, with reviews from [Jefffrey], [alamb], [martin-g],
+[geoffreyclaude], [milenkovicm], and [jizezhang].
 
 [#18779]: https://github.com/apache/datafusion/issues/18779
 [#18540]: https://github.com/apache/datafusion/pull/18540
@@ -333,6 +323,9 @@ this feature.
 [Dandandan]: https://github.com/Dandandan
 [jizezhang]: https://github.com/jizezhang
 [feniljain]: https://github.com/feniljain
+[milenkovicm]: https://github.com/milenkovicm
+[coalesce kernel]: https://docs.rs/arrow/57.2.0/arrow/compute/kernels/coalesce/
+[arrow-rs/#8951]: https://github.com/apache/arrow-rs/pull/8951
 
 ## Upgrade Guide and Changelog
 

From d38d99f153633ca3352c2c1db54d8c6111119e91 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 19:25:15 -0500
Subject: [PATCH 10/26] update

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 32 +++++++++-----------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 11bf96a5..cd06c39c 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -43,11 +43,10 @@ TODO: confirm the release date for 52.0.0 and update the front matter if needed.
 
 We continue to make significant performance improvements in DataFusion as explained below.
 
-### Faster `CASE` expression evaluation
+### Faster `CASE` Expressions
 
-DataFusion 52 now has lookup-table based evaluation for certain `CASE`
-expressions to avoid repeated evaluation for accelerating common ETL patterns such
-as 
+DataFusion 52 has lookup-table based evaluation for certain `CASE` expressions
+to avoid repeated evaluation for accelerating common ETL patterns such as
 
 ```sql
 CASE company
@@ -59,9 +58,9 @@ CASE company
 END
 ```
 
-This is the final contains the final work in our CASE performance epic
-([#18075]). Related PRs [#18183]. Thanks to [rluvaton] and [pepijnve] for 
-the implementation.
+This is the final work in our `CASE` performance epic ([#18075]) which has
+improved `CASE` evaluation significantly. Related PRs [#18183]. Thanks to
+[rluvaton] and [pepijnve] for the implementation.
 
 [rluvaton]: https://github.com/rluvaton
 [pepijnve]: https://github.com/pepijnve
@@ -72,14 +71,12 @@ the implementation.
 
 ### New Merge Join
 
-DataFusion 52 includes a rewrite of the sort-merge join, leading to speedups of three orders of magnitude in some cases. 
-This change targets pathological slowdowns like the reported
-LeftAnti join case in [#18487], which also affected [Apache Comet] workloads. 
-Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
-minutes to milliseconds) while leaving other queries unchanged or modestly
-faster.
-Thanks to [mbutrovich] for implementing the merge join rewrite, with reviews
-from [Dandandan].
+DataFusion 52 includes a rewrite of the sort-merge join operator, with speedups
+of three orders of magnitude in some pathological cases such the case in
+[#18487], which also affected [Apache Comet] workloads. Benchmarks in [#18875]
+show dramatic gains for TPC-H Q21 (minutes to milliseconds) while leaving other
+queries unchanged or modestly faster. Thanks to [mbutrovich] for the implementation
+and reviews from [Dandandan].
 
 [#18487]: https://github.com/apache/datafusion/issues/18487
 [#18875]: https://github.com/apache/datafusion/pull/18875
@@ -196,7 +193,8 @@ reviews from [LiaCastaneda], [asolimando], [comphead], and [mbutrovich].
 DataFusion can now read Arrow IPC stream files ([#18457]). This expands
 interoperability with systems that emit Arrow streams directly, making it
 simpler to ingest Arrow-native data without conversion. Thanks to [corasaurus-hex]
-for implementing this feature.
+for implementing this feature, with reviews from [martin-g], [Jefffrey],
+[jdcasale], [2010YOUY01], and [timsaucer].
 
 ```sql
 CREATE EXTERNAL TABLE ipc_events
@@ -329,7 +327,7 @@ this feature, with reviews from [Jefffrey], [alamb], [martin-g],
 
 ## Upgrade Guide and Changelog
 
-Upgrading to 52.0.0 should be straightforward for most users. Please review the
+As always, upgrading to 52.0.0 should be straightforward for most users. Please review the
 [Upgrade Guide]
 for details on breaking changes and code snippets to help with the transition.
 For a comprehensive list of all changes, please refer to the [changelog].

From b47c50dd8d927a65d8f0fbe505b84feec5c69c88 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 19:27:23 -0500
Subject: [PATCH 11/26] typos

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 28 ++++++++++----------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index cd06c39c..7784ea90 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -37,7 +37,7 @@ TODO: confirm the release date for 52.0.0 and update the front matter if needed.
 [DataFusion 52.0.0]: https://crates.io/crates/datafusion/52.0.0
 [DataFusion 51.0.0]: https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/
 [changelog]: https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md
-[120 contributors]: https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md#credits
+[121 contributors]: https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md#credits
 
 ## Performance Improvements 🚀
 
@@ -45,7 +45,7 @@ We continue to make significant performance improvements in DataFusion as explai
 
 ### Faster `CASE` Expressions
 
-DataFusion 52 has lookup-table based evaluation for certain `CASE` expressions
+DataFusion 52 has lookup-table-based evaluation for certain `CASE` expressions
 to avoid repeated evaluation for accelerating common ETL patterns such as
 
 ```sql
@@ -58,7 +58,7 @@ CASE company
 END
 ```
 
-This is the final work in our `CASE` performance epic ([#18075]) which has
+This is the final work in our `CASE` performance epic ([#18075]), which has
 improved `CASE` evaluation significantly. Related PRs [#18183]. Thanks to
 [rluvaton] and [pepijnve] for the implementation.
 
@@ -72,7 +72,7 @@ improved `CASE` evaluation significantly. Related PRs [#18183]. Thanks to
 ### New Merge Join
 
 DataFusion 52 includes a rewrite of the sort-merge join operator, with speedups
-of three orders of magnitude in some pathological cases such the case in
+of three orders of magnitude in some pathological cases such as the case in
 [#18487], which also affected [Apache Comet] workloads. Benchmarks in [#18875]
 show dramatic gains for TPC-H Q21 (minutes to milliseconds) while leaving other
 queries unchanged or modestly faster. Thanks to [mbutrovich] for the implementation
@@ -88,7 +88,7 @@ and reviews from [Dandandan].
 This release also includes several additional caching improvements.
 
 First it includes a new statistics cache for Parquet Metadata that avoids repeatedly
-(re) calculating statistics for Parquet backed files. This significantly improves
+(re)calculating statistics for Parquet backed files. This significantly improves
 planning time for certain queries. You can see the contents of the new cache using the
 [statistics_cache] function in the CLI:
 
@@ -115,8 +115,8 @@ Related PRs: [#18971], [#19054]
 [alchemist51]: https://github.com/alchemist51
 
 
-It also includes a prefix aware list-files cache by default which accelerates
-evaluating partition predicates for HIVE partitioned tables.
+It also includes a prefix-aware list-files cache by default which accelerates
+evaluating partition predicates for Hive partitioned tables.
 
 ```sql
 -- Read the hive partitioned dataset from Overture Maps (100s of Parquet files)
@@ -166,7 +166,7 @@ Related PRs: [#18146], [#18855], [#19366], [#19298],
 
 Starting in DataFusion 51, filtering information from `HashJoinExec` is passed
 dynamically to scans, as explained in the [Dynamic Filtering Blog] using a
-technique referred to as [Sideways Information Passing] in  Database research
+technique referred to as [Sideways Information Passing] in Database research
 literature. The initial implementation passed min/max values for the join keys.
 DataFusion 52 extends the optimization ([#17171] / [#18393]) to use an `IN` list when the
 build size is small such as when the join is very selective. The `IN` list is
@@ -216,7 +216,7 @@ Related PRs: [#18457]
 DataFusion now has an API for extending the SQL planner for relations, as
 explained in the [Extending SQL in DataFusion Blog]. With this new API, you can
 customize DataFusion to support almost any SQL syntax, such as the following
-(which are not supported by default) :
+(which are not supported by default):
 
 ```sql
 -- Postgres-style JSON operators
@@ -230,7 +230,7 @@ SELECT * FROM sensor_data TABLESAMPLE BERNOULLI(10 PERCENT);
 
 Thanks to [geoffreyclaude] for implementing relation planner extensions, and to
 [theirix], [alamb], [NGA-TRAN], and [gabotechs] for reviews and feedback on the
-design.  Related PRs: [#17843]
+design. Related PRs: [#17843]
 
 [#17843]: https://github.com/apache/datafusion/pull/17843
 [geoffreyclaude]: https://github.com/geoffreyclaude
@@ -239,7 +239,7 @@ design.  Related PRs: [#17843]
 [NGA-TRAN]: https://github.com/NGA-TRAN
 [gabotechs]: https://github.com/gabotechs
 
-### Expression Evaluation Pushdown to Scans 
+### Expression Evaluation Pushdown to Scans
 
 DataFusion now pushes down expression evaluation into TableProviders using the
 PhysicalExprAdapter, replacing the older SchemaAdapter approach ([#14993],
@@ -257,7 +257,7 @@ Related PRs: [#18998], [#19345]
 
 ### Sort Pushdown to Scans
 
-DataFusion can now push sorts all the way data sources, allowing scans to return
+DataFusion can now push sorts all the way to data sources, allowing scans to return
 sorted data or leverage reversed row groups when possible ([#10433], [#19064]).
 This allows table provider implementations to take better advantage of existing sort 
 information such as to reorder files or row groups to satisfy `LIMIT` clauses more
@@ -272,7 +272,7 @@ efficiently. Thanks to [zhuqi-lucas] for this feature.
 The [TableProvider] trait now includes hooks for `DELETE` and `UPDATE`
 statements and the basic MemTable implements them ([#19142]). This lets
 downstream implementations and storage engines plug in their own mutation logic.
-See [TableProvider::delete_from] and [TableProvider::update] for more details
+See [TableProvider::delete_from] and [TableProvider::update] for more details.
 
 [TableProvider]: https://docs.rs/datafusion/52.0.0/datafusion/datasource/trait.TableProvider.html
 [TableProvider::delete_from]: https://docs.rs/datafusion/52.0.0/datafusion/datasource/trait.TableProvider.html#method.delete_from
@@ -299,7 +299,7 @@ operators such as `FilterExec`, `HashJoinExec`, and `RepartitionExec`. However,
 it also blocked other optimizations (like pushing limits through joins) and made
 optimizer rules more complex. In this release, we have completed integrating
 coalescing into the operators themselves ([#18779]) using Arrow's [coalesce kernel],
-reducing plan complexity while keeping batch sizes efficient. This also allows 
+reducing plan complexity while keeping batch sizes efficient. This also allows
 additional focused optimization work in the kernel, such as [Dandandan]'s recent
 work with filtering in [arrow-rs/#8951].
 

From 1f5b91e431917eea054e3f3f8a66504dbc35c741 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 19:36:53 -0500
Subject: [PATCH 12/26] refine

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 43 ++++++++++----------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 7784ea90..3c888053 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -241,27 +241,27 @@ design. Related PRs: [#17843]
 
 ### Expression Evaluation Pushdown to Scans
 
-DataFusion now pushes down expression evaluation into TableProviders using the
-PhysicalExprAdapter, replacing the older SchemaAdapter approach ([#14993],
-[#16800]). This enables richer pushdown (expressions and projections) and
-improves consistency between logical and physical planning.
-
-Thanks to [adriangb] for implementing PhysicalExprAdapter pushdown support.
-Related PRs: [#18998], [#19345]
+DataFusion now pushes down expression evaluation into TableProviders using 
+[PhysicalExprAdapter], replacing the older SchemaAdapter approach ([#14993],
+[#16800]). This work means predicates and expressions can be customized for each
+individual file schema, opening additional optimization such as support for
+[Variant shredding]. Thanks to [adriangb] for implementing PhysicalExprAdapter
+and reworking pushdown to use it. Related PRs: [#18998], [#19345]
 
 [#14993]: https://github.com/apache/datafusion/issues/14993
 [#16800]: https://github.com/apache/datafusion/issues/16800
 [#18998]: https://github.com/apache/datafusion/pull/18998
 [#19345]: https://github.com/apache/datafusion/pull/19345
 [kosiew]: https://github.com/kosiew
+[Variant shredding]: https://github.com/apache/datafusion/issues/16116
+[PhysicalExprAdapter]: https://docs.rs/datafusion/52.0.0/datafusion/physical_expr_adapter/trait.PhysicalExprAdapter.html
 
 ### Sort Pushdown to Scans
 
-DataFusion can now push sorts all the way to data sources, allowing scans to return
-sorted data or leverage reversed row groups when possible ([#10433], [#19064]).
+DataFusion can now push sorts all the way to data sources ([#10433], [#19064]).
 This allows table provider implementations to take better advantage of existing sort 
 information such as to reorder files or row groups to satisfy `LIMIT` clauses more
-efficiently. Thanks to [zhuqi-lucas] for this feature.  
+efficiently. Thanks to [zhuqi-lucas] for this feature. 
 
 [#10433]: https://github.com/apache/datafusion/issues/10433
 [#19064]: https://github.com/apache/datafusion/pull/19064
@@ -284,9 +284,8 @@ Example:
 DELETE FROM mem_table WHERE status = 'obsolete';
 ```
 
-Thanks to [ethan-tyler] for implementing the TableProvider DELETE/UPDATE hooks,
-with reviews from [alamb] and [adriangb].
-Related PRs: [#19142]
+Thanks to [ethan-tyler] for the implementation and [alamb] and [adriangb] for
+reviews.
 
 [#19142]: https://github.com/apache/datafusion/pull/19142
 [ethan-tyler]: https://github.com/ethan-tyler
@@ -294,15 +293,15 @@ Related PRs: [#19142]
 ### `CoalesceBatchesExec` Removed
 
 The standalone `CoalesceBatchesExec` operator existed to ensure batches were
-large enough for vectorized execution, and it was inserted after filter-like
-operators such as `FilterExec`, `HashJoinExec`, and `RepartitionExec`. However,
-it also blocked other optimizations (like pushing limits through joins) and made
-optimizer rules more complex. In this release, we have completed integrating
-coalescing into the operators themselves ([#18779]) using Arrow's [coalesce kernel],
-reducing plan complexity while keeping batch sizes efficient. This also allows
-additional focused optimization work in the kernel, such as [Dandandan]'s recent
-work with filtering in [arrow-rs/#8951].
-
+large enough for subsequent vectorized execution, and was inserted after
+filter-like operators such as `FilterExec`, `HashJoinExec`, and
+`RepartitionExec`. However, using a separate operator also blocks other
+optimizations such as pushing `LIMIT` through joins and made optimizer rules
+more complex. In this release, we  integrated the coalescing into the operators
+themselves ([#18779]) using Arrow's [coalesce kernel]. This reduces plan
+complexity while keeping batch sizes efficient, and allows additional focused
+optimization work in the Arrow kernel, such as [Dandandan]'s recent work with
+filtering in [arrow-rs/#8951].
 
 Related PRs: [#18540], [#18604], [#18630], [#18972], [#19002], [#19342], [#19239]
 Thanks to [Tim-53], [Dandandan], [jizezhang], and [feniljain] for implementing

From 2823de5e3f3fa502c0b1975e09192f256002501e Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Thu, 22 Jan 2026 19:50:18 -0500
Subject: [PATCH 13/26] clean

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 23 ++++++++++----------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 6d394db9..c4dba3f4 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -80,10 +80,8 @@ the implementation and reviews from [Dandandan].
 
 [#18487]: https://github.com/apache/datafusion/issues/18487
 [#18875]: https://github.com/apache/datafusion/pull/18875
-<<<<<<< HEAD
 [Apache Comet]: https://datafusion.apache.org/comet/
 [mbutrovich]: https://github.com/mbutrovich
-=======
 
 ### Rewritten merge join
 
@@ -95,15 +93,14 @@ SMJ. Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
 minutes to milliseconds) while leaving most other queries unchanged or modestly
 faster, and the update is fully internal with no user-facing API changes.
 
->>>>>>> ccc5d4296951810f48e133fe70948d34c4b4f9bd
 
 ### Caching Improvements
 
 This release also includes several additional caching improvements.
 
-First it includes a new statistics cache for Parquet Metadata that avoids repeatedly
-(re)calculating statistics for Parquet backed files. This significantly improves
-planning time for certain queries. You can see the contents of the new cache using the
+A new statistics cache for Parquet Metadata avoids repeatedly (re)calculating
+statistics for Parquet backed files. This significantly improves planning time
+for certain queries. You can see the contents of the new cache using the
 [statistics_cache] function in the CLI:
 
 [statistics_cache]: https://datafusion.apache.org/user-guide/cli/functions.html#statistics-cache
@@ -129,8 +126,8 @@ Related PRs: [#18971], [#19054]
 [alchemist51]: https://github.com/alchemist51
 
 
-It also includes a prefix-aware list-files cache by default which accelerates
-evaluating partition predicates for Hive partitioned tables.
+A prefix-aware list-files cache accelerates evaluating partition predicates for
+Hive partitioned tables.
 
 ```sql
 -- Read the hive partitioned dataset from Overture Maps (100s of Parquet files)
@@ -257,7 +254,7 @@ design. Related PRs: [#17843]
 
 DataFusion now pushes down expression evaluation into TableProviders using 
 [PhysicalExprAdapter], replacing the older SchemaAdapter approach ([#14993],
-[#16800]). This work means predicates and expressions can be customized for each
+[#16800]). Predicates and expressions can now be customized for each
 individual file schema, opening additional optimization such as support for
 [Variant shredding]. Thanks to [adriangb] for implementing PhysicalExprAdapter
 and reworking pushdown to use it. Related PRs: [#18998], [#19345]
@@ -272,14 +269,16 @@ and reworking pushdown to use it. Related PRs: [#18998], [#19345]
 
 ### Sort Pushdown to Scans
 
-DataFusion can now push sorts all the way to data sources ([#10433], [#19064]).
+DataFusion can now push sorts into data sources ([#10433], [#19064]).
 This allows table provider implementations to take better advantage of existing sort 
-information such as to reorder files or row groups to satisfy `LIMIT` clauses more
-efficiently. Thanks to [zhuqi-lucas] for this feature. 
+information based on the query pattern, such as to reorder files or row groups to 
+satisfy `LIMIT` clauses more
+efficiently. Thanks to [zhuqi-lucas] and [xudong963] for this feature. 
 
 [#10433]: https://github.com/apache/datafusion/issues/10433
 [#19064]: https://github.com/apache/datafusion/pull/19064
 [zhuqi-lucas]: https://github.com/zhuqi-lucas
+[xudong963]: https://github.com/xudong963
 
 ### `TableProvider` supports `DELETE` and `UPDATE` statements
 

From 1c9dadf5ee06acfd22ab4b779209f2299b1879c5 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 06:59:03 -0500
Subject: [PATCH 14/26] Update content/blog/2026-01-08-datafusion-52.0.0.md

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
---
 content/blog/2026-01-08-datafusion-52.0.0.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index c4dba3f4..f3c134e7 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -32,7 +32,6 @@ some of the major improvements since [DataFusion 51.0.0]. The complete list of
 changes is available in the [changelog]. Thanks to the [121 contributors] for
 making this release possible.
 
-TODO: confirm the release date for 52.0.0 and update the front matter if needed.
 
 [DataFusion 52.0.0]: https://crates.io/crates/datafusion/52.0.0
 [DataFusion 51.0.0]: https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/

From 615affddb5ee93070c0999893d9ae514b01b8d3b Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:02:52 -0500
Subject: [PATCH 15/26] Clarify RelationPlanner syntax

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index c4dba3f4..7f944c36 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -224,10 +224,14 @@ Related PRs: [#18457]
 
 ### More Extensible SQL Planning with `RelationPlanner`
 
+
+
 DataFusion now has an API for extending the SQL planner for relations, as
-explained in the [Extending SQL in DataFusion Blog]. With this new API, you can
-customize DataFusion to support almost any SQL syntax, such as the following
-(which are not supported by default):
+explained in the [Extending SQL in DataFusion Blog]. In addition to the existing
+expression and types extension points, this new API now allows extending `FROM`
+clauses. Using these APIs it is straightforward to provide SQL support for
+almost any dialect, including vendor-specific syntax. Example use cases include:
+
 
 ```sql
 -- Postgres-style JSON operators

From 1345bfba5d1d2e91f976b8c55052dc5735e244d5 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:05:20 -0500
Subject: [PATCH 16/26] remove extra section

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index bec481bc..37c7d5c5 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -82,16 +82,6 @@ the implementation and reviews from [Dandandan].
 [Apache Comet]: https://datafusion.apache.org/comet/
 [mbutrovich]: https://github.com/mbutrovich
 
-### Rewritten merge join
-
-DataFusion 52 includes a rewrite of the sort-merge join (SMJ) output buffering to
-avoid excessive `concat_batches` work and to use `BatchCoalescer` internally and
-for final output. This change targets pathological slowdowns like the reported
-LeftAnti join case in [#18487], which also affected Comet workloads that rely on
-SMJ. Benchmarks in [#18875] show dramatic gains for TPC-H Q21 (moving from
-minutes to milliseconds) while leaving most other queries unchanged or modestly
-faster, and the update is fully internal with no user-facing API changes.
-
 
 ### Caching Improvements
 
@@ -223,8 +213,6 @@ Related PRs: [#18457]
 
 ### More Extensible SQL Planning with `RelationPlanner`
 
-
-
 DataFusion now has an API for extending the SQL planner for relations, as
 explained in the [Extending SQL in DataFusion Blog]. In addition to the existing
 expression and types extension points, this new API now allows extending `FROM`

From 63b571c4948712d820f0ab330522f64f2bbcb2eb Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:06:26 -0500
Subject: [PATCH 17/26] Update content/blog/2026-01-08-datafusion-52.0.0.md

Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
---
 content/blog/2026-01-08-datafusion-52.0.0.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 37c7d5c5..3e05f822 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -169,9 +169,12 @@ dynamically to scans, as explained in the [Dynamic Filtering Blog] using a
 technique referred to as [Sideways Information Passing] in Database research
 literature. The initial implementation passed min/max values for the join keys.
 DataFusion 52 extends the optimization ([#17171] / [#18393]) to use an `IN` list when the
-build size is small such as when the join is very selective. The `IN` list is
-pushed down to the probe side scan and is used to prune files, row groups, and
-individual rows.  Thanks to [adriangb] for implementing this feature, with
+build size is small such as when the join is very selective or a reference to the build side hash map when the build side is larger.
+These new expressions are pushed down to the probe side scan and is used to prune files, row groups, and
+individual rows.
+When the build side is small enough (<=20 rows but configurable) the pushed down filters can even participate in statistics pruning to avoid even reading the join keys from row groups that will not match.
+
+Thanks to [adriangb] for implementing this feature, with
 reviews from [LiaCastaneda], [asolimando], [comphead], and [mbutrovich].
 
 

From 7984011d33abeed667907a5b2565fef75e9d6243 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:14:30 -0500
Subject: [PATCH 18/26] Refine wording

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 3e05f822..89c9f7f9 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -168,18 +168,20 @@ Starting in DataFusion 51, filtering information from `HashJoinExec` is passed
 dynamically to scans, as explained in the [Dynamic Filtering Blog] using a
 technique referred to as [Sideways Information Passing] in Database research
 literature. The initial implementation passed min/max values for the join keys.
-DataFusion 52 extends the optimization ([#17171] / [#18393]) to use an `IN` list when the
-build size is small such as when the join is very selective or a reference to the build side hash map when the build side is larger.
-These new expressions are pushed down to the probe side scan and is used to prune files, row groups, and
+DataFusion 52 extends the optimization ([#17171] / [#18393]) to pass the contents 
+of the build side hash map.
+These filters are evaluated on the probe side scan to prune files, row groups, and
 individual rows.
-When the build side is small enough (<=20 rows but configurable) the pushed down filters can even participate in statistics pruning to avoid even reading the join keys from row groups that will not match.
-
+When the build side contains `20` or fewer rows (configurable) the contents of the hash map are 
+transformed to an `IN` expression and used for [statistics-based pruning] which
+can avoid reading entire files or row groups that contain no matching join keys.
 Thanks to [adriangb] for implementing this feature, with
 reviews from [LiaCastaneda], [asolimando], [comphead], and [mbutrovich].
 
 
 [Sideways Information Passing]: https://dl.acm.org/doi/10.1109/ICDE.2008.4497486
 [Dynamic Filtering blog]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters
+[statistics-based pruning]: https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html
 
 [#17171]: https://github.com/apache/datafusion/issues/17171
 [#18393]: https://github.com/apache/datafusion/pull/18393

From 66a9dcbaee8d58816480e5b19c4f8223108f5a18 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:14:43 -0500
Subject: [PATCH 19/26] reflow

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 89c9f7f9..8977dce1 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -168,15 +168,14 @@ Starting in DataFusion 51, filtering information from `HashJoinExec` is passed
 dynamically to scans, as explained in the [Dynamic Filtering Blog] using a
 technique referred to as [Sideways Information Passing] in Database research
 literature. The initial implementation passed min/max values for the join keys.
-DataFusion 52 extends the optimization ([#17171] / [#18393]) to pass the contents 
-of the build side hash map.
-These filters are evaluated on the probe side scan to prune files, row groups, and
-individual rows.
-When the build side contains `20` or fewer rows (configurable) the contents of the hash map are 
+DataFusion 52 extends the optimization ([#17171] / [#18393]) to pass the
+contents of the build side hash map. These filters are evaluated on the probe
+side scan to prune files, row groups, and individual rows. When the build side
+contains `20` or fewer rows (configurable) the contents of the hash map are
 transformed to an `IN` expression and used for [statistics-based pruning] which
 can avoid reading entire files or row groups that contain no matching join keys.
-Thanks to [adriangb] for implementing this feature, with
-reviews from [LiaCastaneda], [asolimando], [comphead], and [mbutrovich].
+Thanks to [adriangb] for implementing this feature, with reviews from
+[LiaCastaneda], [asolimando], [comphead], and [mbutrovich].
 
 
 [Sideways Information Passing]: https://dl.acm.org/doi/10.1109/ICDE.2008.4497486

From e9308d49687e89fc216f02ed4d255623c0eed5de Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:16:39 -0500
Subject: [PATCH 20/26] Metadata cache is general

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 8977dce1..95961f2a 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -87,8 +87,8 @@ the implementation and reviews from [Dandandan].
 
 This release also includes several additional caching improvements.
 
-A new statistics cache for Parquet Metadata avoids repeatedly (re)calculating
-statistics for Parquet backed files. This significantly improves planning time
+A new statistics cache for File Metadata avoids repeatedly (re)calculating
+statistics for files. This significantly improves planning time
 for certain queries. You can see the contents of the new cache using the
 [statistics_cache] function in the CLI:
 

From 4e24b1ff4317ebd0d436e84ef3a5553cf66f8814 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sat, 24 Jan 2026 07:53:35 -0500
Subject: [PATCH 21/26] Add section on Min/max dynamic filters

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 31 ++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 95961f2a..d3048f90 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -68,6 +68,37 @@ improved `CASE` evaluation significantly. Related PRs [#18183]. Thanks to
 [#18075]: https://github.com/apache/datafusion/issues/18075
 [#18183]: https://github.com/apache/datafusion/pull/18183
 
+### `MIN`/`MAX` Aggregate Dynamic Filters
+
+DataFusion now creates dynamic filters for queries with `MIN`/`MAX` aggregates
+that have filters, but no `GROUP BY`. These dynamic filters are used during scan
+to prune files and rows as tighter bounds are discovered during execution, as
+explained in the [Dynamic Filtering blog]. For example, the following query:
+
+```sql
+SELECT min(l_shipdate)
+FROM lineitem
+WHERE l_returnflag = 'R';
+```
+
+Is now executed like this  
+```sql
+SELECT min(l_shipdate)
+FROM lineitem
+--  '__current_min' is updated dynamically during execution
+WHERE l_returnflag = 'R' AND l_shipdate > __current_min;
+```
+
+
+Thanks to [2010YOUY01] for implementing this feature, with reviews from
+[martin-g], [adriangb], and [LiaCastaneda]. Related PRs: [#18644]
+
+[#18644]: https://github.com/apache/datafusion/pull/18644
+[2010YOUY01]: https://github.com/2010YOUY01
+[martin-g]: https://github.com/martin-g
+[adriangb]: https://github.com/adriangb
+[LiaCastaneda]: https://github.com/LiaCastaneda
+
 ### New Merge Join
 
 DataFusion 52 includes a rewrite of the sort-merge join (SMJ) operator, with

From d62dfab29dd514e8738dfc046198b30d17a761ba Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sun, 25 Jan 2026 08:56:42 -0500
Subject: [PATCH 22/26] Update content/blog/2026-01-08-datafusion-52.0.0.md

Co-authored-by: Yongting You <2010youy01@gmail.com>
---
 content/blog/2026-01-08-datafusion-52.0.0.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index d3048f90..7371fbae 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -86,7 +86,7 @@ Is now executed like this
 SELECT min(l_shipdate)
 FROM lineitem
 --  '__current_min' is updated dynamically during execution
-WHERE l_returnflag = 'R' AND l_shipdate > __current_min;
+WHERE l_returnflag = 'R' AND l_shipdate < __current_min;
 ```
 
 

From 1a77f710c5dc37fd3d576272b54074837a6d295b Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sun, 25 Jan 2026 08:57:25 -0500
Subject: [PATCH 23/26] whitespace

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index d3048f90..b73670fe 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -89,7 +89,6 @@ FROM lineitem
 WHERE l_returnflag = 'R' AND l_shipdate > __current_min;
 ```
 
-
 Thanks to [2010YOUY01] for implementing this feature, with reviews from
 [martin-g], [adriangb], and [LiaCastaneda]. Related PRs: [#18644]
 

From 37b12dd14255b8302a0325144d60378c17689794 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sun, 25 Jan 2026 09:08:48 -0500
Subject: [PATCH 24/26] Improve sort pushdown description

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index a5e31fc3..33c1aceb 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -295,10 +295,15 @@ and reworking pushdown to use it. Related PRs: [#18998], [#19345]
 ### Sort Pushdown to Scans
 
 DataFusion can now push sorts into data sources ([#10433], [#19064]).
-This allows table provider implementations to take better advantage of existing sort 
-information based on the query pattern, such as to reorder files or row groups to 
-satisfy `LIMIT` clauses more
-efficiently. Thanks to [zhuqi-lucas] and [xudong963] for this feature. 
+This allows table provider implementations to optimize based on
+sort knowledge for certain query patterns. For example, the provided Parquet
+datasource now reverses the scan order of row groups and files when queried in
+for the opposite of the file's natural sort (e.g. `DESC` when the files are sorted `ASC`). 
+This reversal, combined with dynamic filtering allows TopK queries with `LIMIT` 
+on pre-sorted data to find the requested rows very quickly, pruning more files and row groups
+without even scanning them. We have seen a ~30x performance improvement on
+benchmark queries with pre-sorted data.
+Thanks to [zhuqi-lucas] and [xudong963] for this feature. 
 
 [#10433]: https://github.com/apache/datafusion/issues/10433
 [#19064]: https://github.com/apache/datafusion/pull/19064

From 91f71f3c420000f481595db1d7208d8ddb7e2eb9 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sun, 25 Jan 2026 09:12:14 -0500
Subject: [PATCH 25/26] capitalizaton

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 33c1aceb..5781071a 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -73,7 +73,7 @@ improved `CASE` evaluation significantly. Related PRs [#18183]. Thanks to
 DataFusion now creates dynamic filters for queries with `MIN`/`MAX` aggregates
 that have filters, but no `GROUP BY`. These dynamic filters are used during scan
 to prune files and rows as tighter bounds are discovered during execution, as
-explained in the [Dynamic Filtering blog]. For example, the following query:
+explained in the [Dynamic Filtering Blog]. For example, the following query:
 
 ```sql
 SELECT min(l_shipdate)
@@ -209,7 +209,7 @@ Thanks to [adriangb] for implementing this feature, with reviews from
 
 
 [Sideways Information Passing]: https://dl.acm.org/doi/10.1109/ICDE.2008.4497486
-[Dynamic Filtering blog]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters
+[Dynamic Filtering Blog]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/#hash-join-dynamic-filters
 [statistics-based pruning]: https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html
 
 [#17171]: https://github.com/apache/datafusion/issues/17171

From 040b85c199c9226242d67ca4ae119b2d1f7b2922 Mon Sep 17 00:00:00 2001
From: Andrew Lamb <andrew@nerdnetworks.org>
Date: Sun, 25 Jan 2026 09:16:37 -0500
Subject: [PATCH 26/26] wordsmith

---
 content/blog/2026-01-08-datafusion-52.0.0.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/content/blog/2026-01-08-datafusion-52.0.0.md b/content/blog/2026-01-08-datafusion-52.0.0.md
index 5781071a..bb2d7d0f 100644
--- a/content/blog/2026-01-08-datafusion-52.0.0.md
+++ b/content/blog/2026-01-08-datafusion-52.0.0.md
@@ -297,13 +297,14 @@ and reworking pushdown to use it. Related PRs: [#18998], [#19345]
 DataFusion can now push sorts into data sources ([#10433], [#19064]).
 This allows table provider implementations to optimize based on
 sort knowledge for certain query patterns. For example, the provided Parquet
-datasource now reverses the scan order of row groups and files when queried in
-for the opposite of the file's natural sort (e.g. `DESC` when the files are sorted `ASC`). 
-This reversal, combined with dynamic filtering allows TopK queries with `LIMIT` 
+data source now reverses the scan order of row groups and files when queried
+for the opposite of the file's natural sort (e.g. `DESC` when the files are sorted `ASC`).
+This reversal, combined with dynamic filtering, allows top-K queries with `LIMIT`
 on pre-sorted data to find the requested rows very quickly, pruning more files and row groups
 without even scanning them. We have seen a ~30x performance improvement on
 benchmark queries with pre-sorted data.
-Thanks to [zhuqi-lucas] and [xudong963] for this feature. 
+Thanks to [zhuqi-lucas] and [xudong963] for this feature, with reviews from
+[martin-g], [adriangb], and [alamb].
 
 [#10433]: https://github.com/apache/datafusion/issues/10433
 [#19064]: https://github.com/apache/datafusion/pull/19064