You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/user-guide/arrow-introduction.md
+60-10Lines changed: 60 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@
17
17
under the License.
18
18
-->
19
19
20
-
# Introduction to Apache Arrow
20
+
# Gentle Arrow Introduction
21
21
22
22
```{contents}
23
23
:local:
@@ -46,7 +46,7 @@ Traditional Row Storage: Arrow Columnar Storage:
46
46
(read entire rows) (process entire columns at once)
47
47
```
48
48
49
-
## `RecordBatch`
49
+
## `RecordBatch`
50
50
51
51
Arrow's standard unit for packaging data is the **[`RecordBatch`]**.
52
52
@@ -65,10 +65,16 @@ This design allows DataFusion to process streams of row-based chunks while gaini
65
65
66
66
DataFusion processes queries as pull-based pipelines where operators request batches from their inputs. This streaming approach enables early result production, bounds memory usage (spilling to disk only when necessary), and naturally supports parallel execution across multiple CPU cores.
67
67
68
+
For example, given the following query:
69
+
70
+
```sql
71
+
SELECT name FROM'data.parquet'WHERE id >10
72
+
```
73
+
74
+
The DataFusion Pipeline looks like this:
75
+
68
76
```text
69
-
A user's query: SELECT name FROM 'data.parquet' WHERE id > 10
@@ -81,7 +87,7 @@ In this pipeline, [`RecordBatch`]es are the "packages" of columnar data that flo
81
87
82
88
## Creating `ArrayRef` and `RecordBatch`es
83
89
84
-
Sometimes you need to create Arrow data programmatically rather than reading from files.
90
+
Sometimes you need to create Arrow data programmatically rather than reading from files.
85
91
86
92
The first thing needed is creating an Arrow Array, for each column. [arrow-rs] provides array builders and `From` impls to create arrays from Rust vectors.
87
93
@@ -126,21 +132,66 @@ use arrow_schema::{DataType, Field, Schema};
0 commit comments