Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate slow formatting #6016

Closed
Tracked by #6203
konstin opened this issue Jul 23, 2023 · 6 comments
Closed
Tracked by #6203

Investigate slow formatting #6016

konstin opened this issue Jul 23, 2023 · 6 comments
Labels
formatter Related to the formatter performance Potential performance improvement

Comments

@konstin
Copy link
Member

konstin commented Jul 23, 2023

Some files are much slower to format than others, i've posted the top 20 below (for context: i can format 294k files in 290s on my 8 threads laptop, so roughly an average of 8ms per file, the real median is likely a good bit lower). The exact timings and order are inaccurate since we they come from the ecosystem check which runs in parallel, but no files should be that slow, especially since the files aren't that large. We need to format these files with a profiler attached (see CONTRIBUTING.md for instructions), find the bottleneck and optimize the relevant function.

$ cat target/formatter-ecosystem-errors.txt | grep "Slow formatting" | sd "Slow formatting .*/checkouts/(.*): Formatting the file took (.*)" '$2 $1' | sort -h -r | head -n 20
2157ms angr:angr/angr/procedures/definitions/win32_fwpuclnt.py
1430ms angr:angr/angr/procedures/definitions/win32_rpcrt4.py
1052ms kevoreilly:CAPEv2/tests/utils_pretty_print_funcs_data.py
965ms angr:angr/angr/procedures/definitions/win32_propsys.py
614ms angr:angr/angr/procedures/definitions/win32_dnsapi.py
553ms angr:angr/angr/procedures/definitions/win32_crypt32.py
531ms angr:angr/angr/procedures/definitions/win32_kernel32.py
491ms sympy:sympy/sympy/physics/quantum/tests/test_spin.py
446ms debian-calibre:calibre/src/calibre/ebooks/unihandecode/jacodepoints.py
435ms debian-calibre:calibre/src/calibre/ebooks/unihandecode/zhcodepoints.py
435ms debian-calibre:calibre/src/calibre/ebooks/unihandecode/krcodepoints.py
431ms pwndbg:pwndbg/pwndbg/lib/functions.py
378ms angr:angr/angr/procedures/definitions/win32_icu.py
368ms angr:angr/angr/procedures/definitions/win32_oleaut32.py
365ms angr:angr/angr/procedures/definitions/win32_wldap32.py
353ms angr:angr/angr/procedures/definitions/win32_advapi32.py
339ms kovidgoyal:calibre/src/calibre/ebooks/unihandecode/vncodepoints.py
335ms kovidgoyal:calibre/src/calibre/ebooks/unihandecode/jacodepoints.py
324ms angr:angr/angr/procedures/definitions/win32_user32.py
322ms kovidgoyal:calibre/src/calibre/ebooks/unihandecode/krcodepoints.py
Path Source file size Formatted file size
angr:angr/angr/procedures/definitions/win32_fwpuclnt.py 4.1M 35M
angr:angr/angr/procedures/definitions/win32_rpcrt4.py 2.0M 8.5M
kevoreilly:CAPEv2/tests/utils_pretty_print_funcs_data.py 8.9M 11M
angr:angr/angr/procedures/definitions/win32_propsys.py 1.2M 5.4M
angr:angr/angr/procedures/definitions/win32_dnsapi.py 833KB 3.2M
angr:angr/angr/procedures/definitions/win32_crypt32.py 794K 4.0M
angr:angr/angr/procedures/definitions/win32_kernel32.py 727K 1.3M
sympy:sympy/sympy/physics/quantum/tests/test_spin.py 337K 453K
debian-calibre:calibre/src/calibre/ebooks/unihandecode/jacodepoints.py 397K 1010K
debian-calibre:calibre/src/calibre/ebooks/unihandecode/zhcodepoints.py 397K 1010K
debian-calibre:calibre/src/calibre/ebooks/unihandecode/krcodepoints.py 397K 1010K
pwndbg:pwndbg/pwndbg/lib/functions.py ???? ????
angr:angr/angr/procedures/definitions/win32_icu.py 641K 1.5M
angr:angr/angr/procedures/definitions/win32_oleaut32.py 566K 2.2M
angr:angr/angr/procedures/definitions/win32_wldap32.py 496K 1.1M
angr:angr/angr/procedures/definitions/win32_advapi32.py 556K 1.2M
kovidgoyal:calibre/src/calibre/ebooks/unihandecode/vncodepoints.py 397K 1010K
kovidgoyal:calibre/src/calibre/ebooks/unihandecode/jacodepoints.py 397K 1010K
angr:angr/angr/procedures/definitions/win32_user32.py 397K 726K
kovidgoyal:calibre/src/calibre/ebooks/unihandecode/krcodepoints.py 397K 1010K
@konstin konstin added performance Potential performance improvement formatter Related to the formatter labels Jul 23, 2023
@MichaReiser
Copy link
Member

We may need to take the line count after formatting. angr:angr/angr/procedures/definitions/win32_fwpuclnt.py has 334'645 after formatting... 35% of the time is spent inside the parser (after lexer refactor), another 30% is spent inside printing (yeah, printing 35mb takes a while). Profile

The same is probably true for most other files in the angr repository (win32_rpcrt4 has 129'402 lines after formatting) but also seems true for other files (utils_pretty_print_funcs_data has 216'887 after formatting). It is probably still worth to investigate file by file to make sure there are no real slow files.

@konstin
Copy link
Member Author

konstin commented Jul 25, 2023

good point, do you want to edit the OP with counts or should i change the script to sort differently? we could also use a file size filter or a millisecond per kilobyte formatted score.

@MichaReiser
Copy link
Member

MichaReiser commented Jul 28, 2023

good point, do you want to edit the OP with counts or should i change the script to sort differently? we could also use a file size filter or a millisecond per kilobyte formatted score.

Sorry for the late reply. I think including the file size would be a good metric. Ideally, we would use the file size after formatting because it makes the file easier to compare. But that's probably somewhat hard.

Edit: I started updating a few

@MichaReiser
Copy link
Member

I updated the table with the original and formatted file size. I'm not too concerned about the many MB files. But we may want to take a look why some of the <= 4MB file take so long to format.

@MichaReiser MichaReiser added this to the Formatter: Beta milestone Jul 31, 2023
@MichaReiser MichaReiser removed this from the Formatter: Beta milestone Aug 22, 2023
@MichaReiser
Copy link
Member

@konstin I remember that you changed the diffing algorithm. Do you know if this is still a problem or if we can close the issue?

@konstin
Copy link
Member Author

konstin commented Sep 8, 2023

Let's close this. A pass over the slowest files would be nice but not pressing given the current performance on the test projects

@konstin konstin closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter performance Potential performance improvement
Projects
None yet
Development

No branches or pull requests

2 participants