-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse S3 session #622
Comments
Hey! If it's not too much of a hassle, mind submitting a PR with your proposed change? I'd be happy to review |
Sure! I made a fix for this that worked earlier, but will need to clean it up a bit before submitting. Will take a look somewhere next week |
Perfect, thank you @wouterzwerink! Feel free to tag me when the PR is up. |
@wouterzwerink Hey, just wanted to follow up on this, mind submitting a quick PR if/when you have some time? Thanks!! |
I am interested in this issue (actually we need it for potential performance improvement). I think the problem is in which level we want to keep a boto3 seesion. Maybe keep one seesion for each stream? If so, I suppose to create an s3 client in stream and reuse it when trigger |
@huxuan Are you seeing any performance degradation with the current approach? If yes, by how much? |
I have not done that yet, maybe I can implement a draft version for comparsion. |
@huxuan I ended up abandoning this after increasing the shard size, which made the S3 overhead negligible. Perhaps that will work for you as well? |
Thanks for the response. We saved feature vectors in the data, so the sample size is relatively large (about 12 MB per sample). We are already using 200 MB as the Any comments are welcome. |
🚀 Feature Request
Currently when I use S3 with an IAM role, I see StreamingDataset fetch new credentials for every shard:
There is a never ending stream of credential logs after this
That's quite inefficient, getting credentials from IAM roles is not that fast. Would be nicer to reuse credentials until they expire
Motivation
Faster is better!
[Optional] Implementation
I think it would work to just reuse the S3 Session object per thread
Additional context
The text was updated successfully, but these errors were encountered: