Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: arrow based table state and checkpoint handling #1837

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8d15958
feat: add protocol checker
roeap Nov 5, 2023
bec2b8f
feat: append-only table feature
roeap Nov 5, 2023
2e9acf4
fix: avoid allocations when checking protocol support
roeap Nov 6, 2023
dd36307
fix: reader version in concurrent writer tests
roeap Nov 10, 2023
0bc5a17
chore: add expressions from kernel
roeap Nov 9, 2023
c8e8be8
chore: add expression evaluator from kernel
roeap Nov 9, 2023
365faf5
chore: cleanup expression handler
roeap Nov 9, 2023
d5006c5
fix: docs links
roeap Nov 10, 2023
f62ccb2
feat: add path abstraction to handle objects store paths and urls
roeap Nov 7, 2023
c6649d3
fix: typos in kernel types and string conversion from reference
roeap Nov 8, 2023
37b1520
feat: parse log files with arrow json
roeap Nov 8, 2023
e08f030
feat: snapshot
roeap Nov 9, 2023
e761ace
feat: read commit entries
roeap Nov 10, 2023
b3cee5f
feat: add field schema definitions
roeap Nov 10, 2023
0093df0
chore: cleanup hidden files
roeap Nov 10, 2023
61a21f4
fix: update field schemas to match protocol specs
roeap Nov 10, 2023
1c9304e
feat: create known arrow schemas form kernel types
roeap Nov 11, 2023
05a4d3d
refactor: isolate kernel code that uses arrow
roeap Nov 11, 2023
a184b1e
chore: implement Snapshot for current table state
roeap Nov 11, 2023
e22dd63
chore: add v2 checkpoint config key
roeap Nov 11, 2023
4f6caa1
fix: revert setting utc by default
roeap Nov 11, 2023
d65fc79
fix: define snapshot trait outside of arrow module
roeap Nov 11, 2023
4a1c86f
refactor: split up datafusion module
roeap Nov 11, 2023
8b780a8
refactor: consolidate pruning statistics implementations
roeap Nov 11, 2023
e668509
refactor: simplify state api surface
roeap Nov 11, 2023
148ace3
refactor: move physical schema function
roeap Nov 11, 2023
300416b
refactor: simplify current table state
roeap Nov 12, 2023
28467d7
chore: simplify names
roeap Nov 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@ debug = "line-tables-only"
[workspace.dependencies]
# arrow
arrow = { version = "47" }
arrow-arith = { version = "47" }
arrow-array = { version = "47" }
arrow-buffer = { version = "47" }
arrow-cast = { version = "47" }
arrow-json = { version = "47" }
arrow-ord = { version = "47" }
arrow-row = { version = "47" }
arrow-schema = { version = "47" }
Expand Down
6 changes: 5 additions & 1 deletion crates/deltalake-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@ features = ["azure", "datafusion", "gcs", "glue", "hdfs", "json", "python", "s3"
[dependencies]
# arrow
arrow = { workspace = true, optional = true }
arrow-arith = { workspace = true, optional = true }
arrow-array = { workspace = true, optional = true }
arrow-buffer = { workspace = true, optional = true }
arrow-cast = { workspace = true, optional = true }
arrow-json = { workspace = true, optional = true }
arrow-ord = { workspace = true, optional = true }
arrow-row = { workspace = true, optional = true }
arrow-schema = { workspace = true, optional = true, features = ["serde"] }
Expand Down Expand Up @@ -110,7 +112,6 @@ reqwest = { version = "0.11.18", default-features = false, features = [

# Datafusion
dashmap = { version = "5", optional = true }

sqlparser = { version = "0.38", optional = true }

# NOTE dependencies only for integration tests
Expand All @@ -130,13 +131,16 @@ tempfile = "3"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
utime = "0.3"
hyper = { version = "0.14", features = ["server"] }
criterion = "0.5"

[features]
azure = ["object_store/azure"]
arrow = [
"dep:arrow",
"arrow-arith",
"arrow-array",
"arrow-cast",
"arrow-json",
"arrow-ord",
"arrow-row",
"arrow-schema",
Expand Down
4 changes: 2 additions & 2 deletions crates/deltalake-core/benches/read_checkpoint.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use criterion::{criterion_group, criterion_main, Criterion};
use deltalake::table::state::DeltaTableState;
use deltalake::DeltaTableConfig;
use deltalake_core::table::state::DeltaTableState;
use deltalake_core::DeltaTableConfig;
use std::fs::File;
use std::io::Read;

Expand Down
2 changes: 1 addition & 1 deletion crates/deltalake-core/src/delta_datafusion/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ mod test {
&arrow_schema::DataType::Utf8,
&table
.state
.input_schema()
.arrow_schema(false)
.unwrap()
.as_ref()
.to_owned()
Expand Down
Loading