-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrapper around smart_open to cache transport_params for every call #335
Wrapper around smart_open to cache transport_params for every call #335
Conversation
There is no nice way how to pass transport_params to the reader and writer The user has to call SmartOpen.setup separately
b623b3b
to
c75832d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! Let’s extend the doc a bit and LGTM
lhotse/utils.py
Outdated
client = session.client('s3') | ||
from lhotse.utils import SmartOpen | ||
|
||
if not slow: # switch between 44s vs 18.9 Wall time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be nit-picking -- this 44s vs 18.9s seems very specific to your use-case, could we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also -- can you format it like:
Example::
>>> import boto3
>>> session = boto3.Session()
...
This will render well with sphinx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pls check that it is ok now?
Or just tell me how to generate (and check) the docs myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The auto generated docs for this PR are here https://lhotse--335.org.readthedocs.build/en/335/
RTD used to add it as a CI test result in GH but the integration broke when I moved Lhotse to lhotse-speech organization… and I couldn’t fix it.
Unfortunately it seems that the API reference doesn’t list lhotse.utils so you won’t see it there anyway.. I haven’t gotten to re-working this part of the docs yet.
Looking good, thanks! |
Motivation
The
smart_open.open
function when used with default value setups all necessary parameters to access e.g. S3 clouduri
.However, it typically means that it creates a session e.g.
boto3.session
withclient('s3')
for AWS S3 access.Creating such a session for every call is slow.
If we reused a single session for AWS S3 access for 1000 calls the Wall time was reduced from 44s to 18.9s.
This PR introduces SmartOpen singleton class
which caches transport_params and thus e.g. the
client('s3')
In addition, it imports smart_open.open only once and does not import it in every call.