Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Usage #283

Open
mic4ael opened this issue Oct 30, 2024 · 0 comments
Open

High Memory Usage #283

mic4ael opened this issue Oct 30, 2024 · 0 comments

Comments

@mic4ael
Copy link

mic4ael commented Oct 30, 2024

Code Version

2.1.2

Expected Behavior

The memory used by pyff is properly freed up after a request finishes.

Current Behavior

Each request that leads to a 500 HTTP error results in a memory increase by 300MB.

Possible Solution

To alleviate the issue the parsed tree needs to be cleared explicitly as shown in the diff below.

diff --git i/src/pyff/api.py w/src/pyff/api.py
index 1050efb..2f17438 100644
--- i/src/pyff/api.py
+++ w/src/pyff/api.py
@@ -4,6 +4,7 @@ from datetime import datetime, timedelta
 from json import dumps
 from typing import Any, Dict, Generator, Iterable, List, Mapping, Optional, Tuple

+import lxml.etree
 import pkg_resources
 import pyramid.httpexceptions as exc
 import pytz
@@ -297,12 +298,18 @@ def process_handler(request: Request) -> Response:
     except ResourceException as ex:
         import traceback

+        if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+            r.clear()
+
         log.debug(traceback.format_exc())
         log.warning(f'Exception from processing pipeline: {ex}')
         raise exc.exception_response(409)
     except BaseException as ex:
         import traceback

+        if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+            r.clear()
+
         log.debug(traceback.format_exc())
         log.error(f'Exception from processing pipeline: {ex}')
         raise exc.exception_response(500)

Steps to Reproduce

XML files which are stored under tmp/dynamic are 50MB in total in our case and that seems to lead to higher memory usage since pyff parses them into Python representation using lxml. Each request results roughly in a 300MB increase in memory which is not then freed up properly.

To reproduce the issue use the following pipeline file:

- when update:
  - load:
      - tmp/dynamic
      - tmp/static
- when request:
  - select:
  - pipe:
      - when accept application/samlmetadata+xml application/xml:
          - first
          - finalize:
              cacheDuration: PT12H
              validUntil: P10D
          - sign:
              key: tmp/default.key
              cert: tmp/default.crt
          - emit application/samlmetadata+xml
          - break
      - when accept application/json:
          - discojson
          - emit application/json
          - break

Run pyff with caching disabled:

PYFF_CACHING_ENABLED=False pyffd -f --frequency=1200 --loglevel=INFO -H 0.0.0.0 -P 8080 --pid_file $PWD/tmp/pyff.pid --dir=$PWD/tmp/ $PWD/tmp/mdx.fd

And run the following:

for i in `seq 1 20 `;
do
http --print hH 0.0.0.0:8080 'Accept: text/plain'
done

High memory consumption is most likely related to lxml not freeing up the memory properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant