Build dataset using COPY instead of multi-row inserts for TPROC-C #301
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: this PR includes the #292, to make it more readable and make merging them
both in the future easier. Adding comments to the diff is probably easier to do
on this equivalent PR with a different base branch on the Citus repo: citusdata#3
The fastest way to bulk insert data in Postgres is by using COPY. This
changes the dataset building to use that for TPROC-C. I tried building a dataset for
1000 warehouses using 100 vusers. Without copy this took 100 minutes, with
copy it only took 42 minutes. So it reduced the time it takes to build the
dataset by ~58%.
Docs on the usage of COPY in the tcl Postgres library can be found here:
http://pgtclng.sourceforge.net/pgtcldocs/pgtcl-example-copy.html
NOTE: A similar improvement could probably be done for postgres on TPROC-H