Skip to content

Commit f70b9c8

Browse files
committed
Merge remote-tracking branch 'upstream/main' into issue_14909
2 parents c56e9a5 + 4d2e06f commit f70b9c8

File tree

58 files changed

+6084
-5607
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+6084
-5607
lines changed

Cargo.lock

Lines changed: 40 additions & 40 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,12 +80,12 @@ version = "45.0.0"
8080
ahash = { version = "0.8", default-features = false, features = [
8181
"runtime-rng",
8282
] }
83-
arrow = { version = "54.2.0", features = [
83+
arrow = { version = "54.2.1", features = [
8484
"prettyprint",
8585
"chrono-tz",
8686
] }
8787
arrow-buffer = { version = "54.1.0", default-features = false }
88-
arrow-flight = { version = "54.2.0", features = [
88+
arrow-flight = { version = "54.2.1", features = [
8989
"flight-sql-experimental",
9090
] }
9191
arrow-ipc = { version = "54.2.0", default-features = false, features = [
@@ -137,7 +137,7 @@ itertools = "0.14"
137137
log = "^0.4"
138138
object_store = { version = "0.11.0", default-features = false }
139139
parking_lot = "0.12"
140-
parquet = { version = "54.2.0", default-features = false, features = [
140+
parquet = { version = "54.2.1", default-features = false, features = [
141141
"arrow",
142142
"async",
143143
"object_store",

benchmarks/README.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -513,5 +513,49 @@ For example, to run query 1 with the small data generated above:
513513
cargo run --release --bin dfbench -- h2o --path ./benchmarks/data/h2o/G1_1e7_1e7_100_0.csv --query 1
514514
```
515515

516+
## h2o benchmarks for join
517+
518+
### Generate data for h2o benchmarks
519+
There are three options for generating data for h2o benchmarks: `small`, `medium`, and `big`. The data is generated in the `data` directory.
520+
521+
1. Generate small data (4 table files, the largest is 1e7 rows)
522+
```bash
523+
./bench.sh data h2o_small_join
524+
```
525+
526+
527+
2. Generate medium data (4 table files, the largest is 1e8 rows)
528+
```bash
529+
./bench.sh data h2o_medium_join
530+
```
531+
532+
3. Generate large data (4 table files, the largest is 1e9 rows)
533+
```bash
534+
./bench.sh data h2o_big_join
535+
```
536+
537+
### Run h2o benchmarks
538+
There are three options for running h2o benchmarks: `small`, `medium`, and `big`.
539+
1. Run small data benchmark
540+
```bash
541+
./bench.sh run h2o_small_join
542+
```
543+
544+
2. Run medium data benchmark
545+
```bash
546+
./bench.sh run h2o_medium_join
547+
```
548+
549+
3. Run large data benchmark
550+
```bash
551+
./bench.sh run h2o_big_join
552+
```
553+
554+
4. Run a specific query with a specific join data paths, the data paths are including 4 table files.
555+
556+
For example, to run query 1 with the small data generated above:
557+
```bash
558+
cargo run --release --bin dfbench -- h2o --join-paths ./benchmarks/data/h2o/J1_1e7_NA_0.csv,./benchmarks/data/h2o/J1_1e7_1e1_0.csv,./benchmarks/data/h2o/J1_1e7_1e4_0.csv,./benchmarks/data/h2o/J1_1e7_1e7_NA.csv --queries-path ./benchmarks/queries/h2o/join.sql --query 1
559+
```
516560
[1]: http://www.tpc.org/tpch/
517561
[2]: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

0 commit comments

Comments
 (0)