Add file based data generation options to db_bench #10395

missa-prime · 2022-07-20T21:40:13Z

Currently, db_bench populates a data store's values with randomly generated strings. This change adds two new options for generating data:

File Random - Populates data store values with randomly generated strings from a particular file
File Direct - Populates data store values with strings read sequentially from a file and wrapping around to the initial read position after reaching the end of the file

In the case of File Random, the actual compression ratio will have a larger delta from the target compression ratio compared to the delta achieved in the original randomization scheme. This is because any window of characters in a file will likely be more repeatable than a randomly generated string of the same size. In the case of "File Direct", the target compression ratio be ignored entirely because of the need to remain as close the original file contents as possible.

In addition to the two new options mentioned above, a third option called Pure Random is provided. This option is nothing but the original randomization scheme. It is set as the default option.

ajkr · 2023-07-21T17:12:48Z

I assume this branch is being updated multiple times daily by automation as I don't see anything changing. Will close this to save CI resources

missa-prime · 2023-07-24T15:40:13Z

@ajkr Yes, this was being synced roughly once per day. Is it safe to merge changes into main branch?

facebook-github-bot added the CLA Signed label Jul 20, 2022

missa-prime force-pushed the user/missa-prime/corpus branch from c9725ca to dc4fae6 Compare July 23, 2022 07:36

missa-prime marked this pull request as ready for review July 23, 2022 08:33

missa-prime force-pushed the user/missa-prime/corpus branch from dc4fae6 to 241e0f3 Compare August 5, 2022 07:35

missa-prime force-pushed the user/missa-prime/corpus branch 8 times, most recently from 64a0f6b to 8653ab5 Compare August 26, 2022 18:34

missa-prime force-pushed the user/missa-prime/corpus branch 2 times, most recently from 1ec2cc6 to d4e2174 Compare September 7, 2022 04:21

missa-prime force-pushed the user/missa-prime/corpus branch from faa10f5 to be15d7b Compare September 14, 2022 19:37

missa-prime closed this Sep 14, 2022

missa-prime deleted the user/missa-prime/corpus branch September 14, 2022 20:28

missa-prime restored the user/missa-prime/corpus branch September 14, 2022 20:44

missa-prime reopened this Sep 14, 2022

missa-prime force-pushed the user/missa-prime/corpus branch 9 times, most recently from ea97966 to bb734d7 Compare September 21, 2022 17:27

missa-prime force-pushed the user/missa-prime/corpus branch 2 times, most recently from 0f13d92 to 37a15b2 Compare September 26, 2022 16:31

missa-prime force-pushed the user/missa-prime/corpus branch 6 times, most recently from deab23f to 8b3a840 Compare June 14, 2023 16:08

missa-prime force-pushed the user/missa-prime/corpus branch 5 times, most recently from 689b1b0 to 38e3e23 Compare June 22, 2023 16:15

missa-prime force-pushed the user/missa-prime/corpus branch 4 times, most recently from 28b0d06 to 241806b Compare June 29, 2023 17:00

missa-prime force-pushed the user/missa-prime/corpus branch 3 times, most recently from fc74cb5 to 14b0363 Compare July 6, 2023 15:34

missa-prime force-pushed the user/missa-prime/corpus branch 5 times, most recently from f0c7533 to 6471537 Compare July 15, 2023 23:11

missa-prime force-pushed the user/missa-prime/corpus branch 2 times, most recently from 59edb5f to ca70174 Compare July 20, 2023 15:35

Add options to generate db_bench data from user provided corpus file

268c002

missa-prime force-pushed the user/missa-prime/corpus branch from ca70174 to 268c002 Compare July 21, 2023 17:00

ajkr closed this Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add file based data generation options to db_bench #10395

Add file based data generation options to db_bench #10395

Uh oh!

missa-prime commented Jul 20, 2022

Uh oh!

ajkr commented Jul 21, 2023

Uh oh!

missa-prime commented Jul 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add file based data generation options to db_bench #10395

Add file based data generation options to db_bench #10395

Uh oh!

Conversation

missa-prime commented Jul 20, 2022

Uh oh!

ajkr commented Jul 21, 2023

Uh oh!

missa-prime commented Jul 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants