-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Labels
performancePerformance or resource usagePerformance or resource usagetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Using Linux perf, I noticed that the Python parser spends a significant time in the _PyPegen_is_memoized() function which iterates on a linked list to find a value:
int // bool
_PyPegen_is_memoized(Parser *p, int type, void *pres)
{
if (p->mark == p->fill) {
if (_PyPegen_fill_token(p) < 0) {
p->error_indicator = 1;
return -1;
}
}
Token *t = p->tokens[p->mark];
for (Memo *m = t->memo; m != NULL; m = m->next) {
if (m->type == type) {
#if defined(PY_DEBUG)
if (0 <= type && type < NSTATISTICS) {
long count = m->mark - p->mark;
// A memoized negative result counts for one.
if (count <= 0) {
count = 1;
}
memo_statistics[type] += count;
}
#endif
p->mark = m->mark;
*(void **)(pres) = m->node;
return 1;
}
}
return 0;
}
script.py:
import tokenize
# wc -l $(find -name "*.py")|sort -n
large_files = """
5259 ./Tools/clinic/clinic.py
5553 ./Lib/test/test_email/test_email.py
5577 ./Lib/test/test_argparse.py
5659 ./Lib/test/test_logging.py
5806 ./Lib/test/test_descr.py
5878 ./Lib/test/test_decimal.py
5993 ./Lib/test/_test_multiprocessing.py
6425 ./Lib/_pydecimal.py
6626 ./Lib/test/datetimetester.py
6664 ./Lib/test/test_socket.py
7325 ./Lib/test/test_typing.py
15534 ./Lib/pydoc_data/topics.py
"""
large_files = [line.split()[-1] for line in large_files.splitlines() if line.strip()]
files = []
for filename in large_files:
with tokenize.open(filename) as fp:
content = fp.read()
files.append((filename, content))
for loops in range(5):
for filename, content in files:
print(filename)
compile(content, filename, "exec")
Linux perf says that overall, Python spent 7% of its runtime in this function:
$ perf record ./python script.py
$ perf report
Samples: 10K of event 'cycles:u', Event count (approx.): 7308792716
Overhead Command Shared Object Symbol
7,35% python python [.] _PyPegen_is_memoized
5,18% python python [.] assemble
3,80% python python [.] _PyPegen_expect_token
3,53% python python [.] unicodekeys_lookup_unicode
3,06% python python [.] tok_get
2,59% python python [.] _PyPegen_update_memo
2,27% python python [.] _Py_dict_lookup
2,19% python python [.] _PyObject_Free
(...)
Metadata
Metadata
Assignees
Labels
performancePerformance or resource usagePerformance or resource usagetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error