High Memory Usage

## Code Version

2.1.2

## Expected Behavior

The memory used by `pyff` is properly freed up after a request finishes.

## Current Behavior

Each request that leads to a 500 HTTP error results in a memory increase by 300MB.

## Possible Solution


To alleviate the issue the parsed tree needs to be cleared explicitly as shown in the diff below.

```diff
diff --git i/src/pyff/api.py w/src/pyff/api.py
index 1050efb..2f17438 100644
--- i/src/pyff/api.py
+++ w/src/pyff/api.py
@@ -4,6 +4,7 @@ from datetime import datetime, timedelta
 from json import dumps
 from typing import Any, Dict, Generator, Iterable, List, Mapping, Optional, Tuple

+import lxml.etree
 import pkg_resources
 import pyramid.httpexceptions as exc
 import pytz
@@ -297,12 +298,18 @@ def process_handler(request: Request) -> Response:
     except ResourceException as ex:
         import traceback

+        if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+            r.clear()
+
         log.debug(traceback.format_exc())
         log.warning(f'Exception from processing pipeline: {ex}')
         raise exc.exception_response(409)
     except BaseException as ex:
         import traceback

+        if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+            r.clear()
+
         log.debug(traceback.format_exc())
         log.error(f'Exception from processing pipeline: {ex}')
         raise exc.exception_response(500)
```

## Steps to Reproduce


XML files which are stored under `tmp/dynamic` are 50MB in total in our case and that seems to lead to higher memory usage since `pyff` parses them into Python representation using `lxml`. Each request results roughly in a 300MB increase in memory which is not then freed up properly.

To reproduce the issue use the following pipeline file:

```yaml
- when update:
  - load:
      - tmp/dynamic
      - tmp/static
- when request:
  - select:
  - pipe:
      - when accept application/samlmetadata+xml application/xml:
          - first
          - finalize:
              cacheDuration: PT12H
              validUntil: P10D
          - sign:
              key: tmp/default.key
              cert: tmp/default.crt
          - emit application/samlmetadata+xml
          - break
      - when accept application/json:
          - discojson
          - emit application/json
          - break
```

Run `pyff` with caching disabled:

```bash
PYFF_CACHING_ENABLED=False pyffd -f --frequency=1200 --loglevel=INFO -H 0.0.0.0 -P 8080 --pid_file $PWD/tmp/pyff.pid --dir=$PWD/tmp/ $PWD/tmp/mdx.fd
```

And run the following:

```bash
for i in `seq 1 20 `;
do
http --print hH 0.0.0.0:8080 'Accept: text/plain'
done
```

High memory consumption is most likely related to `lxml` not freeing up the memory properly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High Memory Usage #283

Code Version

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High Memory Usage #283

Description

Code Version

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions