From 63471dec2ae4708b096dc9b9196d364dbb82817f Mon Sep 17 00:00:00 2001 From: jeffbrennan Date: Tue, 14 Nov 2023 18:37:22 -0500 Subject: [PATCH] revert to current readme --- README.md | 22 ---------------------- 1 file changed, 22 deletions(-) diff --git a/README.md b/README.md index 7296846..e2c268e 100644 --- a/README.md +++ b/README.md @@ -114,25 +114,3 @@ Here's how to create a Parquet file with fake data: df = farsante.pandas_df([person.full_name, person.email, address.city, address.state, datetime.datetime], 3) df.to_parquet('./tmp/fake_data.parquet', index=False) ``` - -## h2o data generation - -[h2o](https://github.com/h2oai/db-benchmark) is a popular dataset to benchmark data transformation performance. The main.rs file in [h2o-data-rust](h2o-data-rust/) can be used to generate h2o data. - -### arguments - -flag | description -:--- | :--- ---n | number of rows ---k | number of groups ---nas | percentage of missing values ---seed | the seed to use - -Here is a command to create a 1 million row dataset with 100 groups, 10% missing values, and a seed of 42. - -```bash -cd h2o-data-rust -cargo run -- --n 1000000 --k 100 --nas 10 --seed 42 -``` - -A .csv file named G_1e6_1e2_10.csv will be created in the h2o-data-rust directory.