Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer S3 Client instantiation until needed #2079

Merged
merged 3 commits into from
Apr 29, 2017

Conversation

jtratner
Copy link
Contributor

Description

Delays instantiation of the S3Client until we need to contact S3, speeding the output() method and allowing nested task graphs to use fewer connections. Fixes #2048.

Motivation and Context

Previously the S3 Client was instantiated in the __init__ method of
S3Target, which meant that if you wanted to traverse a large task
graph using output(), quickly creating and throwing away targets, you'd end up
incurring a lot of wasted connections and slowness.

Now we create the client only when we want to make a remote request.

This does not change any of the semantics of how the target is
instantiated (still can override s3 property and the Key property),
except that making an STS connection is also deferred until the moment
it's actually used.

Have you tested this? If so, how?

I tested this with our workflows (which use Openstack Swift as an S3Target) and they continued to work. I also assume unit tests cover this code as well. (had some issues running the test suite locally). I also checked that connections were not being generated solely from instantiating S3Targets.

@jtratner jtratner force-pushed the delayed-s3-client-instantiation branch from 5185212 to ba04570 Compare March 28, 2017 17:00
options.update(kwargs)
options = dict(self._options)

if getattr(self, '_s3', None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use self._s3 = None in the constructor, so that this check will be simply if self._s3: return self._s3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure - do you prefer setting in __init__ to setting _s3 = None in the class definition?

Copy link
Contributor

@kalvdans kalvdans Mar 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would prefer to initialize instance variables in __init__, but it is not a strong opinion.

def s3(self, value):
self._s3 = value

@s3.deleter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to provide a deleter, since no code in luigi deletes a member from an instance.


@s3.setter
def s3(self, value):
self._s3 = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can as well remove the setter, since the s3 member is not assigned to anywhere in the code (as far as I can see). With this change, I am okay with the changes, but I am not a core developer and not using the s3 stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave the setter in because it makes it easier to test out / mock interactions here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that is fine with me. Personally I keep my good-to-have-in-the-future code in my local repo only.

@dlstadther
Copy link
Collaborator

@jtratner Could you check on your build failures? Thanks!

@jtratner
Copy link
Contributor Author

@dlstadther - to me - they're all entirely unrelated. I'll try merging in latest master and see if it works this time.

Previously the S3 Client was instantiated in the `__init__` method of
S3Target, which meant that if you had to, say, traverse a large task
graph, quickly creating and throwing away targets, you'd end up
incurring a lot of wasted connections and slowness.

Now we create the client only when we want to make a remote request.

This does not change any of the semantics of how the target is
instantiated (still can override `s3` property and the `Key` property),
except that making an STS connection is also deferred until the moment
it's actually used.
@jtratner jtratner force-pushed the delayed-s3-client-instantiation branch from 786e5e9 to e2694cc Compare April 28, 2017 07:01
Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did @kalvdans concern get addressed? If so let's mege. :)

@jtratner
Copy link
Contributor Author

jtratner commented Apr 28, 2017 via email

@dlstadther dlstadther merged commit 1147e0f into spotify:master Apr 29, 2017
@dlstadther
Copy link
Collaborator

Thanks @jtratner !

This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants