Investigate possible memory leak issue using jsonschema.validate() #2826

kevin-bates · 2022-07-11T14:21:19Z

In last week's Jupyter Server meeting it was brought to our attention that the jsonschema package can exhibit a memory leak when the validate method is repeatedly called to validate an instance of a given schema. The leak isn't necessarily in the validation of the instance itself, but rather in the acquisition of the validator used to perform the validation and the validate method's documentation even states:

"If you know you have a valid schema already, especially if you intend to validate multiple instances with the same schema, you likely would prefer using the Validator.validate method directly on a specific validator (e.g. Draft7Validator.validate)."

Since Elyra validates all instances on retrieval (and save) and is intended to remain running for long periods of time, we should look into how this impacts Elyra vs. the cost to address this issue.

The text was updated successfully, but these errors were encountered:

kevin-bates · 2022-07-11T14:34:02Z

I wrote a simple unittest to exercise the validate() method on the MetadataManager:

def test_manager_validation_performance():
    import psutil
    metadata_mgr = MetadataManager(schemaspace=METADATA_TEST_SCHEMASPACE)
    metadata_dict = {**valid_metadata_json}
    metadata = Metadata.from_dict(METADATA_TEST_SCHEMASPACE_ID, metadata_dict)

    process = psutil.Process(os.getpid())
    # warm up
    metadata_mgr.validate("perf_test", metadata)

    memory_start = process.memory_info()
    for _ in range(0, 10000):
        metadata_mgr.validate("perf_test", metadata)

    memory_end = process.memory_info()
    diff = (memory_end.rss - memory_start.rss)/1024
    print(f"Memory used: {diff:,} kb, Memory start: {memory_start.rss/1024/1024:,.3f} mb, "
          f"Memory end: {memory_end.rss/1024/1024:,.3f} mb.")

MetadataManager.validate() essentially gets the schema referenced by the instance, then calls jsonschema.validate(). Here are results from two runs...

================ 1 passed, 10001 warnings in 103.34s (0:01:43) =================
Memory used: 1,860.0 kb, Memory start: 109.980 mb, Memory end: 111.797 mb.

================ 1 passed, 10001 warnings in 101.90s (0:01:41) =================
Memory used: 1,672.0 kb, Memory start: 109.609 mb, Memory end: 111.242 mb.

When updating the MetadataManager.validate() to use a validator (Draft7Validator(schema, format_checker=draft7_format_checker).validate(metadata_dict)), the results become:

============================== 1 passed in 16.58s ==============================
Memory used: 92.0 kb, Memory start: 107.562 mb, Memory end: 107.652 mb.

============================== 1 passed in 16.09s ==============================
Memory used: 52.0 kb, Memory start: 108.203 mb, Memory end: 108.254 mb.

indicating a 20-30X decrease in memory usage, and a 6X increase in performance.

I think we can improve things further by letting the SchemaManager create the validator instances for each schema and cache those instances, as this will avoid the creation of the validator instance on each validation request.

Because Elyra is a long-lived application, and given these changes (so far) appear to be straightforward, I believe we should make these changes.

kevin-bates added component:metadata metadata runtime area:back-end status:Needs Triage labels Jul 11, 2022

ptitzler added this to the 3.11.0 milestone Jul 11, 2022

ptitzler assigned kevin-bates Jul 11, 2022

ptitzler removed status:Needs Triage labels Jul 11, 2022

kevin-bates mentioned this issue Jul 11, 2022

Add support for schema validators #2829

Merged

kiersten-stokes mentioned this issue Jul 15, 2022

Markdown versions 3.4.0 and 3.4.1 break documentation build #2837

Closed

kiersten-stokes closed this as completed in #2829 Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate possible memory leak issue using jsonschema.validate() #2826

Investigate possible memory leak issue using jsonschema.validate() #2826

kevin-bates commented Jul 11, 2022

kevin-bates commented Jul 11, 2022

Investigate possible memory leak issue using jsonschema.validate() #2826

Investigate possible memory leak issue using jsonschema.validate() #2826

Comments

kevin-bates commented Jul 11, 2022

kevin-bates commented Jul 11, 2022