Validator.iter_errors() doc should not recommend sorted

The doc example for lazy printing of error messages with [`iter_errors()`](https://python-jsonschema.readthedocs.io/en/latest/validate/#jsonschema.IValidator.iter_errors) uses Python's `sorted`. However, this forces the whole generator to be evaluated in memory, thereby defeating the benefit of lazy evaluation if you only want to print the first N errors. The snippet below consumes an additional 2.5 GB of memory on my machine once the errors are sorted

I would recommend removing the `sorted` call so users do not naively think they are getting the benefit of lazy evaluation. When error counts are small, the memory usage is not noticeable which leaves a dangerous blind spot if the error counts increase unexpectedly in the future (i.e. for data that is nowhere near the schema specs)

```
import jsonschema
import psutil

# Create a schema for a nested array and 1M mock examples that violate it
schema = {
    'type': 'array',
    'items': {
        'type': 'array',
        'minItems': 3,
        'maxItems': 3,
        'additionalItems': False,
        'items': {'type': 'integer'}}}
data = [{'a': 'b'} for _ in range(1000000)]

# Track memory throughout validation and error printing process
mem = {}
max_errors = 5
mem['pre_val'] = psutil.virtual_memory().used
validator = jsonschema.Draft7Validator(schema)
mem['post_val'] = psutil.virtual_memory().used
errors = validator.iter_errors(data)
mem['pre_iter'] = psutil.virtual_memory().used
for i, e in enumerate(errors):
    print(e.message)
    if i + 1 >= max_errors:
        break
mem['post_iter'] = psutil.virtual_memory().used
mem['pre_sort'] = psutil.virtual_memory().used
errors_sort = sorted(errors, key=lambda e: e.path)
mem['post_sort'] = psutil.virtual_memory().used

# Summarize usage
print(f'{len(errors_sort)} errors')
for k, v in mem.items():
    print(f'{k}: {v / 1000000:,.2f} MB')
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Validator.iter_errors() doc should not recommend sorted #757

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Validator.iter_errors() doc should not recommend sorted #757

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions