Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clients are never GC'd #805

Closed
magcius opened this issue Feb 23, 2016 · 11 comments
Closed

Clients are never GC'd #805

magcius opened this issue Feb 23, 2016 · 11 comments

Comments

@magcius
Copy link

magcius commented Feb 23, 2016

def __del__(obj):
    print "deleting", obj

class foo(object):
    pass

def main():
    import botocore.session
    session = botocore.session.get_session()
    session.__class__.__del__ = __del__
    client = session.create_client('s3', region_name='us-west-2')
    client.__class__.__del__ = __del__

    thing = foo()
    thing.__class__.__del__ = __del__

print "before"
main()
print "after"

Prints:

before
deleting <botocore.session.Session object at 0x7fd23c7ec550>
deleting <__main__.foo object at 0x7fd23c7a4110>
after

This is causing major memory leaks in our applications that create lots of clients.

@jamesls
Copy link
Member

jamesls commented Feb 26, 2016

Taking a look. We have specific tests in place to ensure we don't leak memory when creating clients: https://github.com/boto/botocore/blob/develop/tests/functional/leak/test_resource_leaks.py#L39-L65

One of the things I've noticed from your code is that you're assigning a __del__ to a client, which in this scenario will prevent the object from being gc'd because of the cycles created within a client. This isn't specific to botocore, this applies to any python class: For example:

import gc


class ObjectWithDel(object):
    def __init__(self):
        pass

    def __del__(self):
        # This method will never be called because we'll create a cyclic reference.
        print "__del__ called for:", id(self)


class ObjectWithNoDel(object):
    pass


def test():
    a = ObjectWithDel()
    a.a = a

    b = ObjectWithNoDel()
    b.b = b
    print "created a, id(a) ->", id(a), "type:", type(a)
    print "created b, id(b) ->", id(b), "type:", type(b)


test()
gc.collect()
print gc.garbage


$ python test.py
created a, id(a) -> 4353206544 type: <class '__main__.ObjectWithDel'>
created b, id(b) -> 4353206608 type: <class '__main__.ObjectWithNoDel'>
[<__main__.ObjectWithDel object at 0x10378a910>]

However, the real botocore client's don't actually implement a __del__ so this behavior isn't something you'll actually see in real code.

Let me run a few more tests to confirm.

@jamesls
Copy link
Member

jamesls commented Feb 26, 2016

Here's another test I ran:

#!/usr/bin/env python

import gc
import botocore.session

session = botocore.session.get_session()

print "Normal case, using standard clients"
def test():
    clients = []
    for i in xrange(10):
        c = session.create_client('s3')
        clients.append(c)
test()
gc.collect()
print "Uncollected:", gc.garbage

Running this I get:

$ python t.py
Normal case, using standard clients
Uncollected: []

So I'm not able to create a scenario where clients don't get gc'd. If you could share a repro using unmodified clients that leak memory, I can investigate further.

@magcius
Copy link
Author

magcius commented Feb 26, 2016

Hm, I'm sure that gc.collect does indeed make it work, but I'm not happy to rely on the cycle collector. I think you should try to make the normal refcount GC run, because the cycle collector isn't guaranteed to run, ever. That said, some objects are indeed being leaked:

#!/usr/bin/env python

import gc
import botocore.session

session = botocore.session.get_session()

print "Normal case, using standard clients"
print "Before", len([c for c in gc.get_objects() if 'botocore' in repr(type(c))])

def test():
    clients = []
    for i in xrange(10):
        c = session.create_client('s3')
        clients.append(c)
test()
gc.collect()
print "Uncollected:", gc.garbage
print "After", len([c for c in gc.get_objects() if 'botocore' in repr(type(c))])

Prints:

$ python botoleak.py 
Normal case, using standard clients
Before 272
Uncollected: []
After 296

@magcius
Copy link
Author

magcius commented Feb 26, 2016

I will also continue to investigate our code here -- I have a feeling it's related to threading. We do try to use a separate session / client per-thread.

@jamesls
Copy link
Member

jamesls commented Feb 27, 2016

Interestingly, when I run your original code, I actually see the client being deleted, but not the session:

$ python /tmp/repro.py
before
deleting <botocore.client.S3 object at 0x1042e3ad0>
deleting <__main__.foo object at 0x10425bd10>
after

When I print out the differences in the before and after from your latest code snippet, I don't see any clients:

Code:

#!/usr/bin/env python

import gc
import botocore.session

session = botocore.session.get_session()

print "Normal case, using standard clients"
before = [c for c in gc.get_objects() if 'botocore' in repr(type(c))]
print "Before", len(before)

def test():
    clients = []
    for i in xrange(10):
        c = session.create_client('s3')
        clients.append(c)
test()
gc.collect()
print "Uncollected:", gc.garbage
after = [c for c in gc.get_objects() if 'botocore' in repr(type(c))]
for i in after:
    if i not in before:
        print i
print "After", len(after)

Normal case, using standard clients
Before 280
Uncollected: []
<botocore.loaders.Loader object at 0x108566450>
<botocore.loaders.JSONFileLoader object at 0x108566850>
<botocore.credentials.CredentialResolver object at 0x108566ad0>
<botocore.credentials.Credentials object at 0x108566b50>
<botocore.regions.EndpointResolver object at 0x108566b10>
<botocore.credentials.InstanceMetadataProvider object at 0x108566a90>
<botocore.credentials.BotoProvider object at 0x108566a10>
<botocore.credentials.OriginalEC2Provider object at 0x1085669d0>
<botocore.credentials.ConfigProvider object at 0x108566990>
<botocore.credentials.SharedCredentialProvider object at 0x108566950>
<botocore.credentials.AssumeRoleProvider object at 0x108566910>
<botocore.credentials.EnvProvider object at 0x1085668d0>
<botocore.utils.InstanceMetadataFetcher object at 0x108566a50>
NodeList(first=[], middle=[<botocore.retryhandler.RetryHandler object at 0x1087fe3d0>], last=[])
<botocore.retryhandler.RetryHandler object at 0x1087fe3d0>
<botocore.retryhandler.MaxAttemptsDecorator object at 0x1087fe390>
<botocore.retryhandler.MultiChecker object at 0x1087fe350>
<botocore.retryhandler.ServiceErrorCodeChecker object at 0x1087fe310>
<botocore.retryhandler.ServiceErrorCodeChecker object at 0x1087fe2d0>
<botocore.retryhandler.HTTPStatusCodeChecker object at 0x1087fe290>
<botocore.retryhandler.ServiceErrorCodeChecker object at 0x1087fe250>
<botocore.retryhandler.ServiceErrorCodeChecker object at 0x1087fe210>
<botocore.retryhandler.HTTPStatusCodeChecker object at 0x1087fe1d0>
<botocore.retryhandler.HTTPStatusCodeChecker object at 0x1087fe190>
<botocore.retryhandler.HTTPStatusCodeChecker object at 0x1087fe150>
<botocore.retryhandler.ExceptionRaiser object at 0x1087fe110>
After 306

And those objects above are expected, the session creates them lazily as needed and then reuses those objects across multiple clients which is why they aren't in the "before" list.

@magcius
Copy link
Author

magcius commented Feb 27, 2016

Interesting. I think this might also be related to the fact that we instantiate these clients across different threads. When we inspect the heap, we definitely see lots of things from botocore. I'll see if I can come up with a simple testcase here.

@jamesls
Copy link
Member

jamesls commented Jun 24, 2016

Closing due to inactivity. If you were able to find any additional info, let us know and I'll reopen and take another look.

@jamesls jamesls closed this as completed Jun 24, 2016
@magcius
Copy link
Author

magcius commented Jun 24, 2016

Thanks. We're still having issues, but I haven't gotten around to doing another investigation. It might be due to our use of threads, though.

@jbvsmo
Copy link

jbvsmo commented Aug 17, 2018

I am noticing severe memory leaks in an application that I was porting from using the original boto to boto3.

I use a similar pattern for one Session per thread as described here and this started happening as soon as I implemented it.

Instead of fire and forget threads that end at some point, this program architecture reuses threads from a pool of 100 threads that start and die with the Main thread.

So each of those long living threads will have a boto3 Session attached to it (which I believe has a botocore session object within) and after a few hours, memory usage that is usually below 300MB is now over 3GB and growing.

I think this is similar enough to the issue above, but as I'm not using botocore directly, it is hard to say. Can I have some guidance? as I see a pull request for removing circular references and it is not merged

@nicolefinnie
Copy link

We hit the very same problem when we call boto.client() and client.get_object() in multiple threads, e.g. calling this API in a ThreadPoolExecutor. And the memory keeps going up and doesn't get garbage collected, even when we tried to collect the memory manually. In multiple threads, it seems some unreachable objects are generated and cannot be collected. We don't hit this problem if we call the client() and get_object() APIs in a single thread, however, we need our application to work asynchronously, so this is not the way to go. We ended up working around this problem by not using boto.

@arpitsabharwal
Copy link

hi @nicolefinnie , i am also facing similar issue in my application , can u please suggest the alternatives to boto3 library which u used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants