Open
Description
Code Version
2.1.2
Expected Behavior
The memory used by pyff
is properly freed up after a request finishes.
Current Behavior
Each request that leads to a 500 HTTP error results in a memory increase by 300MB.
Possible Solution
To alleviate the issue the parsed tree needs to be cleared explicitly as shown in the diff below.
diff --git i/src/pyff/api.py w/src/pyff/api.py
index 1050efb..2f17438 100644
--- i/src/pyff/api.py
+++ w/src/pyff/api.py
@@ -4,6 +4,7 @@ from datetime import datetime, timedelta
from json import dumps
from typing import Any, Dict, Generator, Iterable, List, Mapping, Optional, Tuple
+import lxml.etree
import pkg_resources
import pyramid.httpexceptions as exc
import pytz
@@ -297,12 +298,18 @@ def process_handler(request: Request) -> Response:
except ResourceException as ex:
import traceback
+ if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+ r.clear()
+
log.debug(traceback.format_exc())
log.warning(f'Exception from processing pipeline: {ex}')
raise exc.exception_response(409)
except BaseException as ex:
import traceback
+ if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+ r.clear()
+
log.debug(traceback.format_exc())
log.error(f'Exception from processing pipeline: {ex}')
raise exc.exception_response(500)
Steps to Reproduce
XML files which are stored under tmp/dynamic
are 50MB in total in our case and that seems to lead to higher memory usage since pyff
parses them into Python representation using lxml
. Each request results roughly in a 300MB increase in memory which is not then freed up properly.
To reproduce the issue use the following pipeline file:
- when update:
- load:
- tmp/dynamic
- tmp/static
- when request:
- select:
- pipe:
- when accept application/samlmetadata+xml application/xml:
- first
- finalize:
cacheDuration: PT12H
validUntil: P10D
- sign:
key: tmp/default.key
cert: tmp/default.crt
- emit application/samlmetadata+xml
- break
- when accept application/json:
- discojson
- emit application/json
- break
Run pyff
with caching disabled:
PYFF_CACHING_ENABLED=False pyffd -f --frequency=1200 --loglevel=INFO -H 0.0.0.0 -P 8080 --pid_file $PWD/tmp/pyff.pid --dir=$PWD/tmp/ $PWD/tmp/mdx.fd
And run the following:
for i in `seq 1 20 `;
do
http --print hH 0.0.0.0:8080 'Accept: text/plain'
done
High memory consumption is most likely related to lxml
not freeing up the memory properly.
Metadata
Metadata
Assignees
Labels
No labels