Memory leak #2047

itachaaa · 2020-05-14T08:54:16Z

Please fill out the sections below to help us address your issue.

What issue did you see ?
When I call the AWS interface in large numbers, and cannot connect to the corresponding area due to network reasons, an EndpointConnectionError is thrown. As time goes by, the memory occupied by my process will continue to increase, and the maximum observed is currently 6G. Using gc and pyrasite to check, it is found that gc.garbage is [], and the data type that takes up the most memory is str or unicode. The unicode description is the document description of DescribeInstancesRequest in service-2.json.
The library versions I use: boto3-1.12.24, botocore-1.15.24, urllib3-1.21.1

Steps to reproduce
If you have a runnable example, please include it as a snippet or link to a repository/gist for larger code examples.

Debug logs
Full stack trace by adding

import botocore.session
botocore.session.Session().set_debug_logger('')

to your code.

The text was updated successfully, but these errors were encountered:

itachaaa · 2020-05-14T09:38:06Z

The way i get AWS client：

class AwsClient(object):

    def __init__(self, region='eu-central-1', server_name='ec2'):
        self.region = region
        self.server_name = server_name

    @property
    def client(self):
        return boto3.client(self.server_name,
                            region_name=self.region,
                            aws_access_key_id=ACCESS,
                            aws_secret_access_key=SECRET)

I used 2 process and multy green threads to make requests.And there is only one boto3.session.Session instance and one botocore.session.Session in one process.

swetashre · 2020-05-14T23:47:29Z

@itachaaa - Thank you for your post. It is recommended to create a resource instance for each thread / process in a multithreaded or multiprocess application rather than sharing a single instance among the threads / processes.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?highlight=multithreading#multithreading-multiprocessing

Are you creating boto3 session from botocore session ? Can you please provide me your exact code sample that resulting in memory leak ?

itachaaa · 2020-05-15T03:21:47Z

Thanks for reply.
I use boto3.client() to get an instance and make requests.
And there is one process in my program, but with multy Coroutine rather than multy thread.
So there is one session for one process.

itachaaa · 2020-05-15T03:57:13Z

And only when there is throwning lots of exception the memory will increase, otherwise won't.

swetashre · 2020-05-15T18:33:39Z

@itachaaa - Thanks for the reply. Is it possible for you to provide me a code sample so that i can try reproduce the issue ? Without looking at the code it is a little difficult for me to find out the exact cause.
Please make sure you are doing garbage collection as you are using multiple coroutine.

itachaaa · 2020-05-18T01:30:08Z

here is my test code：

from memory_profiler import profile
from eventlet.greenpool import GreenPool
from instance import InstanceResource

manager = InstanceResource()

#@profile
def get_data():
    try:
        instances = manager.list_resource()
    except Exception as e:
        print(e)


#@profile
def loop_call():
    pool  = GreenPool(10000)
    times = 0
    for i in range(100000):
        pool.spawn_n(get_data)
        times += 1
        print(times)

    import time; time.sleep(10)

if __name__ == '__main__':
    loop_call()

from common import Resource


class InstanceResource(Resource):
    action = 'describe_instances'
    create_action = 'run_instances'

    # entity = 'Instances'

    @staticmethod
    def get_filters():
        params = {
            'Filters': [
                # {'Name': 'instance-id', 'Values': ['i-07a20968066e2ad87', ]}
            ],
        }
        return params

from client import AwsClient


class Resource(object):
    action = None  # 默认是查询
    create_action = None
    update_action = None
    delete_action = None
    entity = None

    def __init__(self, region='eu-central-1', server_name='ec2'):
        self._init(region=region, server_name=server_name)

    def _init(self, region='eu-central-1', server_name='ec2'):
        client = AwsClient(region=region, server_name=server_name)
        self.client = client.client
        self.resource = client.resource

    @staticmethod
    def get_filters():
        """
        获取GET接口的过滤参数
        :return:
        """

class AwsClient(object):

    def __init__(self, region='eu-central-1', server_name='ec2'):
        self.region = region
        self.server_name = server_name
        self.config = Config(retries=dict(max_attempts=2), connect_timeout=5, read_timeout=5)

    @property
    def client(self):
        return boto3.client(self.server_name,
                            region_name=self.region,
                            aws_access_key_id=ACCESS,
                            aws_secret_access_key=SECRET,
                            config=self.config)

and let it throw errors a lots such as EndpointConnectionError，then the memory will continue to increase.

swetashre · 2020-05-21T17:49:16Z

@itachaaa - Thank you for providing me the sample code. Marking this as bug. I am able to reproduce the issue with this script:

import os
import boto3
import botocore
import resource
import psutil
from resource import *
import matplotlib.pyplot as pp
import sys
from botocore.config import Config
from eventlet.greenpool import GreenPool

used =[]

def get_data():
        client = boto3.client('ec2',config = Config(retries={'max_attempts':0},connect_timeout=5, read_timeout=5))
        client.describe_instances()

pool  = GreenPool(10000)
for i in range(100000):
        process = psutil.Process(os.getpid())
        memory = process.memory_info().rss/1024/1024
        used.append(memory)
        pool.spawn_n(get_data)

pp.plot(used)
pp.show()

itachaaa · 2020-05-22T02:08:27Z

I would like to ask if you have a situation where the network environment is not very good and sometimes throws an exception when you reproduce it.
Because at that time there were frequent printing of network-related anomalies, such as EndpointConnectError, ReadTimeoutError and so on.I don't know if this is an influential factor.

willbengtson · 2020-08-20T21:51:36Z

I am also tracking down a memory leak. tracemalloc pointed me to https://github.com/boto/botocore/blob/develop/botocore/client.py#L322

When running a Flask application and looping through gc.garbage after a gc.collect() I am left with boto docs. I currently am using boto3 and creating clients as client = boto3.client('sts') as an example.

Running in lambda the following is a graph of memory from cloudwatch metrics filter:

rl-ilasic · 2020-12-11T17:09:38Z

Is there an update or workaround for this problem?

mwek · 2022-09-19T17:22:56Z

Creating one session per thread and reusing them across ThreadPool mitigated the issue for us. Snippet for aiobotocore below:

import threading
from aiobotocore.session import AioSession, get_session

# NOTE: botocore has a memory leak in Session objects. Recommended workaround is to cache the session object locally per thread.
# See https://github.com/boto/botocore/issues/2047
_aio_session_cache = threading.local()


def _cached_session() -> AioSession:
    if not hasattr(_aio_session_cache, "session"):
        _aio_session_cache.session = get_session()
    return _aio_session_cache.session

ericman93 · 2022-12-08T12:04:31Z

I ran my FastAPI app and tracemalloc pointed me that python3.8/json/decoder.py file is leaking ~100MB every 15 minutes
looking into the tracebacks of that file I see that they are all being called by botocore

File \"/opt/venv/lib/python3.8/site-packages/botocore/session.py\", line 787
  return self._internal_components.get_component(name)
File \"/opt/venv/lib/python3.8/site-packages/botocore/session.py\", line 1081
  self._components[name] = factory()
File \"/opt/venv/lib/python3.8/site-packages/botocore/session.py\", line 188
  endpoints = loader.load_data('endpoints')
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 142
  data = func(self, *args, **kwargs)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 454
  found = self.file_loader.load_file(possible_path)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 194
  data = self._load_file(file_path + ext, open_method)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 181
  return json.loads(payload, object_pairs_hook=OrderedDict)
File \"/usr/local/lib/python3.8/json/__init__.py\", line 370
  return cls(**kw).decode(s)
File \"/usr/local/lib/python3.8/json/decoder.py\", line 337
  obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File \"/usr/local/lib/python3.8/json/decoder.py\", line 353
  obj, end = self.scan_once(s, idx)

and

File \"/opt/venv/lib/python3.8/site-packages/botocore/client.py\", line 202
  json_model = self._loader.load_service_model(
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 142
  data = func(self, *args, **kwargs)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 417
  model = self.load_data(full_path)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 142
  data = func(self, *args, **kwargs)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 454
  found = self.file_loader.load_file(possible_path)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 194
  data = self._load_file(file_path + ext, open_method)
File \"/opt/venv/lib/python3.8/site-packages/botocore/loaders.py\", line 181
  return json.loads(payload, object_pairs_hook=OrderedDict)
File \"/usr/local/lib/python3.8/json/__init__.py\", line 370
  return cls(**kw).decode(s)
File \"/usr/local/lib/python3.8/json/decoder.py\", line 337
  obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File \"/usr/local/lib/python3.8/json/decoder.py\", line 353
  obj, end = self.scan_once(s, idx)

I'm using botocore==1.27.59 and can't upgrade it since aiobotocore is fixed to ^1.27
I'll try to update my python version

itachaaa added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels May 14, 2020

swetashre self-assigned this May 14, 2020

swetashre added response-requested Waiting on additional info and feedback. and removed needs-triage This issue or PR still needs to be triaged. labels May 14, 2020

github-actions bot removed the response-requested Waiting on additional info and feedback. label May 15, 2020

swetashre added the response-requested Waiting on additional info and feedback. label May 15, 2020

github-actions bot removed the response-requested Waiting on additional info and feedback. label May 18, 2020

swetashre added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels May 20, 2020

swetashre added bug This issue is a confirmed bug. and removed guidance Question that needs advice or information. labels May 21, 2020

swetashre assigned kdaily and unassigned swetashre Mar 25, 2021

iainelder mentioned this issue Jan 15, 2022

How to reduce memory usage? connelldave/botocove#20

Closed

RyanFitzSimmonsAK added the p2 This is a standard priority issue label Nov 4, 2022

ewdurbin mentioned this issue Dec 5, 2022

store aiobotocore session in thread local to attempt to mitigate memory leak pypi/conveyor#17

Merged

tim-finnigan mentioned this issue Dec 27, 2022

Garbage collection using gc.collect() is not showing any effect boto/boto3#3541

Open

ron8mcr mentioned this issue Mar 3, 2023

Memory leak after updating to 1.25.0 boto/boto3#3614

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak #2047

Memory leak #2047

itachaaa commented May 14, 2020

itachaaa commented May 14, 2020 •

edited

Loading

swetashre commented May 14, 2020

itachaaa commented May 15, 2020

itachaaa commented May 15, 2020

swetashre commented May 15, 2020

itachaaa commented May 18, 2020 •

edited

Loading

swetashre commented May 21, 2020

itachaaa commented May 22, 2020 •

edited

Loading

willbengtson commented Aug 20, 2020

rl-ilasic commented Dec 11, 2020

mwek commented Sep 19, 2022

ericman93 commented Dec 8, 2022 •

edited

Loading

Memory leak #2047

Memory leak #2047

Comments

itachaaa commented May 14, 2020

itachaaa commented May 14, 2020 • edited Loading

swetashre commented May 14, 2020

itachaaa commented May 15, 2020

itachaaa commented May 15, 2020

swetashre commented May 15, 2020

itachaaa commented May 18, 2020 • edited Loading

swetashre commented May 21, 2020

itachaaa commented May 22, 2020 • edited Loading

willbengtson commented Aug 20, 2020

rl-ilasic commented Dec 11, 2020

mwek commented Sep 19, 2022

ericman93 commented Dec 8, 2022 • edited Loading

itachaaa commented May 14, 2020 •

edited

Loading

itachaaa commented May 18, 2020 •

edited

Loading

itachaaa commented May 22, 2020 •

edited

Loading

ericman93 commented Dec 8, 2022 •

edited

Loading