Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor s3 submodule to minimize resource usage #569

Merged
merged 7 commits into from
Dec 27, 2020
Merged

Conversation

mpenkov
Copy link
Collaborator

@mpenkov mpenkov commented Dec 18, 2020

Creating sessions and resources costs time and memory. If possible, smart_open should avoid creating resources by itself, and allow the user to specify them up front.

Here are some CPU-time benchmark results:

$ time python benchmark/read_s3.py < benchmark/urls.txt
real    1m20.786s
user    0m8.619s
sys     0m0.894s

$ time python benchmark/read_s3.py create_session < benchmark/urls.txt
real    1m45.826s
user    0m4.554s
sys     0m0.149s

$ time python benchmark/read_s3.py create_resource < benchmark/urls.txt
real    0m22.046s
user    0m1.474s
sys     0m0.065s

$ time python benchmark/read_s3.py create_session_and_resource < benchmark/urls.txt
real    0m21.086s
user    0m1.496s

There are memory benefits as well, but I didn't benchmark them, because the CPU benchmarks were compelling enough.

creating sessions and resources costs time and memory
$ time python benchmark/read_s3.py < benchmark/urls.txt
real    1m20.786s
user    0m8.619s
sys     0m0.894s

$ time python benchmark/read_s3.py create_session < benchmark/urls.txt
real    1m45.826s
user    0m4.554s
sys     0m0.149s

$ time python benchmark/read_s3.py create_resource < benchmark/urls.txt
real    0m22.046s
user    0m1.474s
sys     0m0.065s

$ time python benchmark/read_s3.py create_session_and_resource < benchmark/urls.txt
real    0m21.086s
user    0m1.496s
sys     0m0.073s
smart_open/s3.py Outdated Show resolved Hide resolved
benchmark/read_s3.py Outdated Show resolved Hide resolved
smart_open/s3.py Show resolved Hide resolved
smart_open/s3.py Show resolved Hide resolved
mpenkov and others added 3 commits December 19, 2020 08:08
Co-authored-by: Radim Řehůřek <radimrehurek@seznam.cz>
howto.md Outdated Show resolved Hide resolved
@mpenkov mpenkov merged commit 74afb2a into develop Dec 27, 2020
@mpenkov mpenkov deleted the s3_refactor branch December 27, 2020 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants