Skip to content

Commit

Permalink
ARROW-4466: [Rust] [DataFusion] Add support for Parquet data source
Browse files Browse the repository at this point in the history
I'm sure I'll need some guidance on this one from @sunchao or @liurenjie1024 but I am keen to get parquet support added for primitive types so that I can actually use DataFusion and Arrow in production at some point.

Author: Andy Grove <andygrove73@gmail.com>
Author: Neville Dipale <nevilledips@gmail.com>
Author: Andy Grove <andygrove@users.noreply.github.com>

Closes apache#3851 from andygrove/ARROW-4466 and squashes the following commits:

3158529 <Andy Grove> add test for reading small batches
549c829 <Andy Grove> Remove hard-coded batch size, fix nits
8d2df06 <Andy Grove> move schema projection function from arrow into datafusion
204db83 <Andy Grove> fix timestamp nano issue
73aa934 <Andy Grove> Remove println from test
25d34ac <Andy Grove> Make INT32/64/96 handling consistent with C++ implementation
9b1308f <Andy Grove> clean up handling of INT96 and DATE/TIME/TIMESTAMP types in schema converter
1ec815b <Andy Grove> Clean up imports
023dc25 <Andy Grove> Merge pull request #2 from nevi-me/ARROW-4466
02b2ed3 <Neville Dipale> fix int96 conversion to read timestamps correctly
2aeea24 <Andy Grove> remove println from tests
9d3047a <Andy Grove> code cleanup
639e13e <Andy Grove> null handling for int96
1503855 <Andy Grove> handle nulls for binary data
80cf303 <Andy Grove> add date support
5a3368c <Andy Grove> Remove unnecessary slice, fix null handling
306d07a <Neville Dipale> fmt
3c711a5 <Neville Dipale> immediately allocate vec
e6cbbaa <Neville Dipale> replace read_column! macro with generic
607a29f <Andy Grove> return result if there are null values
e8aa784 <Andy Grove> revert temp debug change to error messages
6457c36 <Andy Grove> use parquet::reader::schema::parquet_to_arrow_schema
c56510e <Andy Grove> projection takes slice instead of vec
7e1a98f <Andy Grove> remove println and unwrap
dddb7d7 <Andy Grove> update to use partition-aware changes from master
157512e <Andy Grove> Remove invalid TODO comment
debb2fb <Andy Grove> code cleanup
6c3b7e2 <Andy Grove> add support for all primitive parquet types
b4981ed <Andy Grove> implement more parquet column types and tests
5ce3086 <Andy Grove> revert to columnar reads
c3f71d7 <Andy Grove> add integration test
aea9f8a <Andy Grove> convert to use row iter
f46e6f7 <Andy Grove> save
eaddafb <Andy Grove> save
322fc87 <Andy Grove> add test for reading strings from parquet
3a412b1 <Andy Grove> first parquet test passes
ff3e5b7 <Andy Grove> test
10710a2 <Andy Grove> Parquet datasource
  • Loading branch information
andygrove committed Mar 15, 2019
1 parent 359ba42 commit e3df5b7
Show file tree
Hide file tree
Showing 5 changed files with 667 additions and 12 deletions.
1 change: 1 addition & 0 deletions rust/datafusion/src/datasource/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
pub mod csv;
pub mod datasource;
pub mod memory;
pub mod parquet;

pub use self::csv::{CsvBatchIterator, CsvFile};
pub use self::datasource::{RecordBatchIterator, ScanResult, Table};
Expand Down
Loading

0 comments on commit e3df5b7

Please sign in to comment.