Skip to content

_PyPegen_is_memoized() has a complexity of O(n) #93289

@vstinner

Description

@vstinner

Using Linux perf, I noticed that the Python parser spends a significant time in the _PyPegen_is_memoized() function which iterates on a linked list to find a value:

int  // bool
_PyPegen_is_memoized(Parser *p, int type, void *pres)
{
    if (p->mark == p->fill) {
        if (_PyPegen_fill_token(p) < 0) {
            p->error_indicator = 1;
            return -1;
        }
    }

    Token *t = p->tokens[p->mark];

    for (Memo *m = t->memo; m != NULL; m = m->next) {
        if (m->type == type) {
#if defined(PY_DEBUG)
            if (0 <= type && type < NSTATISTICS) {
                long count = m->mark - p->mark;
                // A memoized negative result counts for one.
                if (count <= 0) {
                    count = 1;
                }
                memo_statistics[type] += count;
            }
#endif
            p->mark = m->mark;
            *(void **)(pres) = m->node;
            return 1;
        }
    }
    return 0;
}

script.py:

import tokenize

# wc -l $(find -name "*.py")|sort -n
large_files = """
    5259 ./Tools/clinic/clinic.py
    5553 ./Lib/test/test_email/test_email.py
    5577 ./Lib/test/test_argparse.py
    5659 ./Lib/test/test_logging.py
    5806 ./Lib/test/test_descr.py
    5878 ./Lib/test/test_decimal.py
    5993 ./Lib/test/_test_multiprocessing.py
    6425 ./Lib/_pydecimal.py
    6626 ./Lib/test/datetimetester.py
    6664 ./Lib/test/test_socket.py
    7325 ./Lib/test/test_typing.py
   15534 ./Lib/pydoc_data/topics.py
"""
large_files = [line.split()[-1] for line in large_files.splitlines() if line.strip()]

files = []
for filename in large_files:
    with tokenize.open(filename) as fp:
        content = fp.read()
    files.append((filename, content))

for loops in range(5):
    for filename, content in files:
        print(filename)
        compile(content, filename, "exec")

Linux perf says that overall, Python spent 7% of its runtime in this function:

$ perf record ./python script.py
$ perf report
Samples: 10K of event 'cycles:u', Event count (approx.): 7308792716
Overhead  Command  Shared Object         Symbol
   7,35%  python   python                [.] _PyPegen_is_memoized
   5,18%  python   python                [.] assemble
   3,80%  python   python                [.] _PyPegen_expect_token
   3,53%  python   python                [.] unicodekeys_lookup_unicode
   3,06%  python   python                [.] tok_get
   2,59%  python   python                [.] _PyPegen_update_memo
   2,27%  python   python                [.] _Py_dict_lookup
   2,19%  python   python                [.] _PyObject_Free
(...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions