Skip to content

Commit

Permalink
revert to current readme
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffbrennan committed Nov 14, 2023
1 parent bbe9fa1 commit 63471de
Showing 1 changed file with 0 additions and 22 deletions.
22 changes: 0 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,25 +114,3 @@ Here's how to create a Parquet file with fake data:
df = farsante.pandas_df([person.full_name, person.email, address.city, address.state, datetime.datetime], 3)
df.to_parquet('./tmp/fake_data.parquet', index=False)
```

## h2o data generation

[h2o](https://github.com/h2oai/db-benchmark) is a popular dataset to benchmark data transformation performance. The main.rs file in [h2o-data-rust](h2o-data-rust/) can be used to generate h2o data.

### arguments

flag | description
:--- | :---
--n | number of rows
--k | number of groups
--nas | percentage of missing values
--seed | the seed to use

Here is a command to create a 1 million row dataset with 100 groups, 10% missing values, and a seed of 42.

```bash
cd h2o-data-rust
cargo run -- --n 1000000 --k 100 --nas 10 --seed 42
```

A .csv file named G_1e6_1e2_10.csv will be created in the h2o-data-rust directory.

0 comments on commit 63471de

Please sign in to comment.