Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed comparison with DataPusher #25

Closed
davidread opened this issue Nov 10, 2017 · 0 comments
Closed

Speed comparison with DataPusher #25

davidread opened this issue Nov 10, 2017 · 0 comments

Comments

@davidread
Copy link
Contributor

davidread commented Nov 10, 2017

Summary

Express Loader loads the data in 11.4 times the speed compared with DataPusher

Test conditions:

  • Load of Boston 311 dataset (1033882 rows, 475MB)
  • Run locally on a MacBook Pro (i7, 2013 model)

stats with ckanext-xloader

12s - retrieve the file (over HTTP) from local FileStore
23s - convert to UTF8
21s - copy CSV file into PostgreSQL table (one COPY command)
160s - create search index

Total: 206 seconds

At this point the full data is made available to the user.

Afterwards the column indexes are generated which simply speed up common queries - this takes a further 1262s. However we exclude this from the load time, as it is merely an optimization.

stats with datapusher

12s - retrieve the file (over HTTP) from local FileStore
2338s - convert to UTF8 and then to JSON, setup postgres indexes to be generated during load, load JSON into table (4000 INSERT statements).

Total: 2350s

amercader pushed a commit that referenced this issue Nov 30, 2022
[QOL-7596] use 'six' instead of assuming features will exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant