Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to new data passing API #109

Merged
merged 113 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
6c9efe1
rename table client -> engine client
nicklan Jan 24, 2024
ec08901
add engine_data and trait method
nicklan Jan 24, 2024
f6c1072
checkpoint, working parsing metadata/protocol
nicklan Jan 25, 2024
0643945
checkpoint, can parse metadata in snapshot
nicklan Jan 26, 2024
53c8850
make scan tests pass (with some todos)
nicklan Jan 27, 2024
32a28a9
All internal tests pass
nicklan Jan 27, 2024
d7930e6
clean up parsing a bit
nicklan Jan 27, 2024
0e54c24
remove old comment
nicklan Jan 27, 2024
438a1e2
add todo comment
nicklan Jan 27, 2024
5ab0021
New better way to do list/map
nicklan Jan 29, 2024
ab89728
put back old test
nicklan Jan 29, 2024
e182e3d
add dv support
nicklan Jan 30, 2024
6b3efe3
start to think about DVs
nicklan Jan 30, 2024
fd80a97
make dv.rs tests work
nicklan Jan 31, 2024
4bf5b47
fmt and small fix
nicklan Jan 31, 2024
304da21
add some tests back
nicklan Jan 31, 2024
a7393fe
fix "list_from"
nicklan Jan 31, 2024
54a0c1a
remove todo
nicklan Jan 31, 2024
54a0d7f
allow default client to use EngineData
nicklan Feb 1, 2024
a8e8920
use some unsafe magic to make more tests pass
nicklan Feb 1, 2024
33923bb
fmt
nicklan Feb 1, 2024
82cff1b
some clippy guided cleanup
nicklan Feb 1, 2024
dc6e963
switch to selection vec
nicklan Feb 3, 2024
1cff896
make parse_json not refrence record batch anymore
nicklan Feb 5, 2024
e6d5eb5
almost have arrow out of lib.rs
nicklan Feb 5, 2024
28031fa
add back md test
nicklan Feb 6, 2024
4294020
put back dataskippingfilter (but it still uses arrow)
nicklan Feb 6, 2024
ef0e79c
make comment more clear
nicklan Feb 6, 2024
b6b560c
doc updates
nicklan Feb 6, 2024
ae12b63
add back parquet reader test
nicklan Feb 6, 2024
f6cdd00
remove commented code
nicklan Feb 6, 2024
f4c8507
put back default client json tests
nicklan Feb 6, 2024
4d3631a
remove unsafe (woo!)
nicklan Feb 6, 2024
95c75ce
minor cleanup
nicklan Feb 6, 2024
3902fcb
rename to try_from_engine_data
nicklan Feb 6, 2024
200a8ac
all arrow out of lib :)
nicklan Feb 7, 2024
3ac637f
move simple_client to a feature, make it default
nicklan Feb 7, 2024
49253b7
add materialize for map_item
nicklan Feb 7, 2024
9a286e6
mostly remove old action parsing code and make maps work right
nicklan Feb 7, 2024
8b0a60f
fmt + clippy
nicklan Feb 7, 2024
15f0122
test_read_files
nicklan Feb 8, 2024
bdf540b
Apply suggestions from code review
nicklan Feb 12, 2024
5905ed2
address some minor comments
nicklan Feb 14, 2024
1f83540
switch to add_string
nicklan Feb 14, 2024
5f3e6d4
switch to using as_list
nicklan Feb 14, 2024
9aa7d36
address more comments:
nicklan Feb 14, 2024
2fb11f5
cleaner min_file_name
nicklan Feb 14, 2024
9834460
clean up `list_from` a bit + fmt
nicklan Feb 14, 2024
cff7b98
Update kernel/src/simple_client/json.rs
nicklan Feb 14, 2024
eb0f3c6
Apply suggestions from code review
nicklan Feb 14, 2024
7f8ab66
fixups for review merge
nicklan Feb 14, 2024
01bcd9a
address comment re itereator chain
nicklan Feb 14, 2024
e5f0116
fix bug
nicklan Feb 14, 2024
93e346b
Apply suggestions from code review
nicklan Feb 14, 2024
d1f5531
validate magic in DV
nicklan Feb 14, 2024
7198c90
get_json_filename
nicklan Feb 14, 2024
ab64aec
only default-client needs tokio
nicklan Feb 14, 2024
1194be2
rename res_arry -> res_array
nicklan Feb 14, 2024
c288f2e
Update kernel/src/simple_client/parquet.rs
nicklan Feb 14, 2024
ef33863
fix lints
nicklan Feb 14, 2024
5b47c93
address comments
nicklan Feb 14, 2024
23b7196
Initial bit of extract_into
nicklan Feb 15, 2024
65a21cf
doc comments and fmt
nicklan Feb 15, 2024
ff85d93
fully switch to extract_into
nicklan Feb 15, 2024
5fea8d3
fmt
nicklan Feb 15, 2024
423853e
initial work on row-getter data passing
nicklan Feb 16, 2024
e2913ff
add get_data_item.rs
nicklan Feb 16, 2024
b30bb2c
cleanup extract, support all previous types, better errors
nicklan Feb 16, 2024
baf1997
extract references to Maps/Lists
nicklan Feb 16, 2024
f96314c
cleanup extract
nicklan Feb 16, 2024
89f920d
switch to new data passing style
nicklan Feb 17, 2024
46b7e1f
reformat a bit
nicklan Feb 17, 2024
3536132
only getters need to implement ExtractInto
nicklan Feb 17, 2024
190c05c
make GetDataItem macro for impls
nicklan Feb 17, 2024
1a6b444
remove trivial casts warning, address minor comments
nicklan Feb 20, 2024
0dbb5ed
get rid of DataItem
nicklan Feb 21, 2024
582e8be
break once we've found p&m
nicklan Feb 21, 2024
ab88f14
extract error handling
nicklan Feb 21, 2024
4458c9a
add doc comment
nicklan Feb 21, 2024
ca37359
fold Extractor into EngineData trait
nicklan Feb 21, 2024
a6ed21e
add materialize for list
nicklan Feb 21, 2024
cee7d16
fmt
nicklan Feb 21, 2024
7ff8589
make magic constant a `const`
nicklan Feb 21, 2024
72dcb54
use try_collect
nicklan Feb 21, 2024
b9601a7
remove commented code
nicklan Feb 21, 2024
8d78cf7
comment updates
nicklan Feb 21, 2024
67fe622
error improvements
nicklan Feb 21, 2024
3fab759
fix comment
nicklan Feb 21, 2024
ace56fa
better macro format
nicklan Feb 21, 2024
d7afb3b
impl TypedGetData for Vec and Map
nicklan Feb 21, 2024
db8d28d
return Options for try_new_from_data
nicklan Feb 21, 2024
2d57456
created_time is optional
nicklan Feb 21, 2024
22d85ca
stats are String
nicklan Feb 21, 2024
3f13937
doc comment and fmt
nicklan Feb 21, 2024
619bffd
refactor action defs and parsing. remove ActionType enum
nicklan Feb 21, 2024
a72eac6
add todo
nicklan Feb 21, 2024
977e251
add missing files
nicklan Feb 21, 2024
6750a2a
add comment
nicklan Feb 21, 2024
bd9e17b
GIANT Merge branch 'main' into new-data-passing-api
nicklan Feb 22, 2024
a2dedf5
checkpoint, actually compiles. needs expressions to be put back
nicklan Feb 23, 2024
f5d80c3
woo, all tests working.
nicklan Feb 23, 2024
0a9cd11
add is_empty to make clippy happy
nicklan Feb 23, 2024
b268b4e
final clippy fixes
nicklan Feb 23, 2024
aab7fb8
import style
nicklan Feb 23, 2024
8d397e6
put back test that merge inexplicably removed
nicklan Feb 23, 2024
a2cb373
use error constructors where strings are static
nicklan Feb 23, 2024
1a19d19
missed one
nicklan Feb 23, 2024
c130692
impl From for SimpleData -> RecordBatch
nicklan Feb 23, 2024
fdd927a
bunch of doc fixes
nicklan Feb 23, 2024
b4699c0
small comment changes
nicklan Feb 26, 2024
1c922c9
fmt
nicklan Feb 26, 2024
4a14694
Update kernel/src/snapshot.rs
nicklan Mar 4, 2024
14d0aff
fix typo
nicklan Mar 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion acceptance/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Helpers to validate implementaions of TableClients
//! Helpers to validate implementaions of EngineClients

pub mod meta;
pub use meta::*;
10 changes: 5 additions & 5 deletions acceptance/src/meta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use serde::{Deserialize, Serialize};
use url::Url;

use deltakernel::snapshot::Snapshot;
use deltakernel::{Error, Table, TableClient, Version};
use deltakernel::{EngineClient, Error, Table, Version};

#[derive(Debug, thiserror::Error)]
pub enum AssertionError {
Expand Down Expand Up @@ -96,17 +96,17 @@ impl TestCaseInfo {
Ok(())
}

pub async fn assert_metadata(&self, table_client: Arc<dyn TableClient>) -> TestResult<()> {
let table_client = table_client.as_ref();
pub async fn assert_metadata(&self, engine_client: Arc<dyn EngineClient>) -> TestResult<()> {
let engine_client = engine_client.as_ref();
let table = Table::new(self.table_root()?);

let (latest, versions) = self.versions().await?;

let snapshot = table.snapshot(table_client, None)?;
let snapshot = table.snapshot(engine_client, None)?;
self.assert_snapshot_meta(&latest, &snapshot)?;

for table_version in versions {
let snapshot = table.snapshot(table_client, Some(table_version.version))?;
let snapshot = table.snapshot(engine_client, Some(table_version.version))?;
self.assert_snapshot_meta(&table_version, &snapshot)?;
}

Expand Down
4 changes: 2 additions & 2 deletions acceptance/tests/dat_reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ fn reader_test(path: &Path) -> datatest_stable::Result<()> {
.block_on(async {
let case = read_dat_case(root_dir).unwrap();
let table_root = case.table_root().unwrap();
let table_client = Arc::new(
let engine_client = Arc::new(
DefaultTableClient::try_new(
&table_root,
std::iter::empty::<(&str, &str)>(),
Expand All @@ -27,7 +27,7 @@ fn reader_test(path: &Path) -> datatest_stable::Result<()> {
.unwrap(),
);

case.assert_metadata(table_client.clone()).await.unwrap();
case.assert_metadata(engine_client.clone()).await.unwrap();
});
Ok(())
}
Expand Down
4 changes: 2 additions & 2 deletions acceptance/tests/other.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ async fn test_read_table_with_checkpoint() {
))
.unwrap();
let location = url::Url::from_directory_path(path).unwrap();
let table_client = Arc::new(
let engine_client = Arc::new(
DefaultTableClient::try_new(&location, HashMap::<String, String>::new()).unwrap(),
);
let snapshot = Snapshot::try_new(location, table_client, None)
let snapshot = Snapshot::try_new(location, engine_client, None)
.await
.unwrap();

Expand Down
15 changes: 7 additions & 8 deletions kernel/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,26 +39,25 @@ visibility = "0.1.0"
# Used in default client
futures = { version = "0.3", optional = true }
object_store = { version = "^0.8.0", optional = true }
parquet = { version = "^49.0", optional = true, features = [
"async",
"object_store",
] }
# Used in default and simple client
parquet = { version = "^49.0", optional = true }

# optionally used with default client (though not required)
tokio = { version = "1", optional = true, features = ["rt-multi-thread"] }

[features]
default = ["default-client"]
default-client = ["chrono", "futures", "object_store", "parquet"]
arrow-conversion = []
ryan-johnson-databricks marked this conversation as resolved.
Show resolved Hide resolved
default = ["simple-client"]
default-client = ["arrow-conversion", "chrono", "futures", "object_store", "parquet/async", "parquet/object_store"]
ryan-johnson-databricks marked this conversation as resolved.
Show resolved Hide resolved
developer-visibility = []
simple-client = ["arrow-conversion", "parquet"]

[dev-dependencies]
arrow = { version = "^49.0", features = ["json", "prettyprint"] }
deltakernel = { path = ".", features = ["tokio"] }
deltakernel = { path = ".", features = ["tokio", "default-client"] }
ryan-johnson-databricks marked this conversation as resolved.
Show resolved Hide resolved
test-log = { version = "0.2", default-features = false, features = ["trace"] }
tempfile = "3"
test-case = { version = "3.1.0" }
tokio = { version = "1" }
tracing-subscriber = { version = "0.3", default-features = false, features = [
"env-filter",
"fmt",
Expand Down
10 changes: 5 additions & 5 deletions kernel/examples/dump-table/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -86,21 +86,21 @@ fn main() {
println!("Invalid url");
return;
};
let table_client = DefaultTableClient::try_new(
let engine_client = DefaultTableClient::try_new(
&url,
HashMap::<String, String>::new(),
Arc::new(TokioBackgroundExecutor::new()),
);
let Ok(table_client) = table_client else {
let Ok(engine_client) = engine_client else {
println!(
"Failed to construct table client: {}",
table_client.err().unwrap()
engine_client.err().unwrap()
);
return;
};

let table = Table::new(url);
let snapshot = table.snapshot(&table_client, None);
let snapshot = table.snapshot(&engine_client, None);
let Ok(snapshot) = snapshot else {
println!(
"Failed to construct latest snapshot: {}",
Expand All @@ -127,7 +127,7 @@ fn main() {
}
table.set_header(header_names);

for batch in scan.execute(&table_client).unwrap() {
for batch in scan.execute(&engine_client).unwrap() {
for row in 0..batch.num_rows() {
let table_row =
(0..batch.num_columns()).map(|col| extract_value(batch.column(col), row));
Expand Down
12 changes: 6 additions & 6 deletions kernel/examples/inspect-table/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,21 +54,21 @@ fn main() {
println!("Invalid url");
return;
};
let table_client = DefaultTableClient::try_new(
let engine_client = DefaultTableClient::try_new(
&url,
HashMap::<String, String>::new(),
Arc::new(TokioBackgroundExecutor::new()),
);
let Ok(table_client) = table_client else {
let Ok(engine_client) = engine_client else {
println!(
"Failed to construct table client: {}",
table_client.err().unwrap()
engine_client.err().unwrap()
);
return;
};

let table = Table::new(url);
let snapshot = table.snapshot(&table_client, None);
let snapshot = table.snapshot(&engine_client, None);
let Ok(snapshot) = snapshot else {
println!(
"Failed to construct latest snapshot: {}",
Expand All @@ -91,7 +91,7 @@ fn main() {
use deltakernel::Add;
let scan = ScanBuilder::new(snapshot).build();
let files: Vec<Add> = scan
.files(&table_client)
.files(&engine_client)
.unwrap()
.map(|r| r.unwrap())
.collect();
Expand All @@ -116,7 +116,7 @@ fn main() {

let batches = snapshot
._log_segment()
.replay(&table_client, read_schema, None);
.replay(&engine_client, read_schema, None);

let batch_vec = batches
.unwrap()
Expand Down
Loading