- 
                Notifications
    
You must be signed in to change notification settings  - Fork 38
 
Open
Description
Code Version
2.1.2
Expected Behavior
The memory used by pyff is properly freed up after a request finishes.
Current Behavior
Each request that leads to a 500 HTTP error results in a memory increase by 300MB.
Possible Solution
To alleviate the issue the parsed tree needs to be cleared explicitly as shown in the diff below.
diff --git i/src/pyff/api.py w/src/pyff/api.py
index 1050efb..2f17438 100644
--- i/src/pyff/api.py
+++ w/src/pyff/api.py
@@ -4,6 +4,7 @@ from datetime import datetime, timedelta
 from json import dumps
 from typing import Any, Dict, Generator, Iterable, List, Mapping, Optional, Tuple
+import lxml.etree
 import pkg_resources
 import pyramid.httpexceptions as exc
 import pytz
@@ -297,12 +298,18 @@ def process_handler(request: Request) -> Response:
     except ResourceException as ex:
         import traceback
+        if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+            r.clear()
+
         log.debug(traceback.format_exc())
         log.warning(f'Exception from processing pipeline: {ex}')
         raise exc.exception_response(409)
     except BaseException as ex:
         import traceback
+        if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+            r.clear()
+
         log.debug(traceback.format_exc())
         log.error(f'Exception from processing pipeline: {ex}')
         raise exc.exception_response(500)Steps to Reproduce
XML files which are stored under tmp/dynamic are 50MB in total in our case and that seems to lead to higher memory usage since pyff parses them into Python representation using lxml. Each request results roughly in a 300MB increase in memory which is not then freed up properly.
To reproduce the issue use the following pipeline file:
- when update:
  - load:
      - tmp/dynamic
      - tmp/static
- when request:
  - select:
  - pipe:
      - when accept application/samlmetadata+xml application/xml:
          - first
          - finalize:
              cacheDuration: PT12H
              validUntil: P10D
          - sign:
              key: tmp/default.key
              cert: tmp/default.crt
          - emit application/samlmetadata+xml
          - break
      - when accept application/json:
          - discojson
          - emit application/json
          - breakRun pyff with caching disabled:
PYFF_CACHING_ENABLED=False pyffd -f --frequency=1200 --loglevel=INFO -H 0.0.0.0 -P 8080 --pid_file $PWD/tmp/pyff.pid --dir=$PWD/tmp/ $PWD/tmp/mdx.fdAnd run the following:
for i in `seq 1 20 `;
do
http --print hH 0.0.0.0:8080 'Accept: text/plain'
doneHigh memory consumption is most likely related to lxml not freeing up the memory properly.
Metadata
Metadata
Assignees
Labels
No labels