-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer S3 Client instantiation until needed #2079
Defer S3 Client instantiation until needed #2079
Conversation
5185212
to
ba04570
Compare
luigi/contrib/s3.py
Outdated
options.update(kwargs) | ||
options = dict(self._options) | ||
|
||
if getattr(self, '_s3', None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use self._s3 = None in the constructor, so that this check will be simply if self._s3: return self._s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure - do you prefer setting in __init__
to setting _s3 = None
in the class definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would prefer to initialize instance variables in __init__
, but it is not a strong opinion.
luigi/contrib/s3.py
Outdated
def s3(self, value): | ||
self._s3 = value | ||
|
||
@s3.deleter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to provide a deleter, since no code in luigi deletes a member from an instance.
|
||
@s3.setter | ||
def s3(self, value): | ||
self._s3 = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can as well remove the setter, since the s3
member is not assigned to anywhere in the code (as far as I can see). With this change, I am okay with the changes, but I am not a core developer and not using the s3 stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to leave the setter in because it makes it easier to test out / mock interactions here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that is fine with me. Personally I keep my good-to-have-in-the-future code in my local repo only.
@jtratner Could you check on your build failures? Thanks! |
@dlstadther - to me - they're all entirely unrelated. I'll try merging in latest master and see if it works this time. |
Previously the S3 Client was instantiated in the `__init__` method of S3Target, which meant that if you had to, say, traverse a large task graph, quickly creating and throwing away targets, you'd end up incurring a lot of wasted connections and slowness. Now we create the client only when we want to make a remote request. This does not change any of the semantics of how the target is instantiated (still can override `s3` property and the `Key` property), except that making an STS connection is also deferred until the moment it's actually used.
786e5e9
to
e2694cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did @kalvdans concern get addressed? If so let's mege. :)
Thanks for reviewing @kalvdans!
…On Fri, Apr 28, 2017 at 10:53 AM kalvdans ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In luigi/contrib/s3.py
<#2079 (comment)>:
> for key in ['aws_access_key_id', 'aws_secret_access_key', 'aws_role_session_name', 'aws_role_arn']:
if key in options:
options.pop(key)
-
- self.s3 = boto.s3.connection.S3Connection(aws_access_key_id,
- aws_secret_access_key,
- security_token=aws_session_token,
- **options)
- self.Key = Key
+ self._s3 = boto.s3.connection.S3Connection(aws_access_key_id,
+ aws_secret_access_key,
+ security_token=aws_session_token,
+ **options)
+ return self._s3
+
+ @s3.setter
+ def s3(self, value):
+ self._s3 = value
Ok, that is fine with me. Personally I keep my good-to-have-in-the-future
code in my local repo only.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2079 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABhjq56ONaPVfWOO6PUwe6H6ZpoQzCQ4ks5r0iepgaJpZM4Mr6BP>
.
|
Thanks @jtratner ! |
Description
Delays instantiation of the S3Client until we need to contact S3, speeding the
output()
method and allowing nested task graphs to use fewer connections. Fixes #2048.Motivation and Context
Previously the S3 Client was instantiated in the
__init__
method ofS3Target, which meant that if you wanted to traverse a large task
graph using
output()
, quickly creating and throwing away targets, you'd end upincurring a lot of wasted connections and slowness.
Now we create the client only when we want to make a remote request.
This does not change any of the semantics of how the target is
instantiated (still can override
s3
property and theKey
property),except that making an STS connection is also deferred until the moment
it's actually used.
Have you tested this? If so, how?
I tested this with our workflows (which use Openstack Swift as an S3Target) and they continued to work. I also assume unit tests cover this code as well. (had some issues running the test suite locally). I also checked that connections were not being generated solely from instantiating S3Targets.