-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more detailed self profiling #58085
Conversation
r? @oli-obk (rust_highfive has picked a reviewer for you, use r? to override) |
Does this change the JSON emitted in a backwards-incompatible way? If so, we might want to start emitting a version of some kind (probably just an integer) which we can increment on backwards incompatible changes to aid perf. |
No, currently the additional per-query data isn't being reported in the JSON at all. When it is, I expect we can do so in a backwards-compatible way. |
@michaelwoerister I also fixed the reported times to be self-time as we discussed at All Hands. |
Looks great! |
📌 Commit fd566f087deea557abcf6eb29587c4e890de902b has been approved by |
☔ The latest upstream changes (presumably #58254) made this pull request unmergeable. Please resolve the merge conflicts. |
Timing data and cache hits/misses are now recorded at the query level. This allows us to show detailed per query information such as total time for each query. To see detailed query information in the summary pass the `-Z verbose` flag. For example: ``` rustc -Z self-profile -Z verbose hello_world.rs ```
fd566f0
to
584081a
Compare
Rebased @bors r+ |
📌 Commit 584081a has been approved by |
This also fixes #54141 |
⌛ Testing commit 584081a with merge 703dbe0abe66c8c349fdd0dd59ac61207eaba23d... |
💔 Test failed - checks-travis |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Implement more detailed self profiling Timing data and cache hits/misses are now recorded at the query level. This allows us to show detailed per query information such as total time for each query. To see detailed query information in the summary pass the `-Z verbose` flag. For example: ``` rustc -Z self-profile -Z verbose hello_world.rs ``` results in something like: ```md Self profiling results: | Phase | Time (ms) | Time (%) | Queries | Hits (%) | ----------------------------------------- | -------------- | -------- | -------------- | -------- | Other | 177 | 54.97 | 8094 | 45.47 | - {time spent not running queries} | 113 | 35.09 | 0 | 0.00 | - const_eval | 16 | 4.97 | 26 | 11.54 | - type_of | 9 | 2.80 | 627 | 27.75 | - const_eval_raw | 8 | 2.48 | 22 | 0.00 | - adt_def | 7 | 2.17 | 381 | 11.55 | - visible_parent_map | 7 | 2.17 | 99 | 98.99 | - item_attrs | 6 | 1.86 | 698 | 50.14 | - item_children | 5 | 1.55 | 2815 | 0.00 | - adt_dtorck_constraint | 4 | 1.24 | 2 | 0.00 | - adt_destructor | 2 | 0.62 | 15 | 86.67 | TypeChecking | 53 | 16.46 | 2834 | 79.89 | - trait_impls_of | 9 | 2.80 | 65 | 86.15 | - evaluate_obligation | 7 | 2.17 | 80 | 2.50 | - const_is_rvalue_promotable_to_static | 6 | 1.86 | 1 | 0.00 | - is_copy_raw | 6 | 1.86 | 29 | 58.62 | - rvalue_promotable_map | 6 | 1.86 | 2 | 50.00 | - {time spent not running queries} | 6 | 1.86 | 0 | 0.00 | - typeck_item_bodies | 5 | 1.55 | 1 | 0.00 | - typeck_tables_of | 5 | 1.55 | 19 | 94.74 | - dropck_outlives | 2 | 0.62 | 1 | 0.00 | - layout_raw | 1 | 0.31 | 668 | 87.87 | Linking | 48 | 14.91 | 43 | 46.51 | - {time spent not running queries} | 48 | 14.91 | 0 | 0.00 | Codegen | 29 | 9.01 | 420 | 61.90 | - {time spent not running queries} | 16 | 4.97 | 0 | 0.00 | - collect_and_partition_mono_items | 11 | 3.42 | 13 | 92.31 | - mir_const | 1 | 0.31 | 1 | 0.00 | - mir_validated | 1 | 0.31 | 3 | 66.67 | Expansion | 14 | 4.35 | 0 | 0.00 | - {time spent not running queries} | 14 | 4.35 | 0 | 0.00 | BorrowChecking | 1 | 0.31 | 12 | 41.67 | - borrowck | 1 | 0.31 | 2 | 50.00 | Parsing | 0 | 0.00 | 0 | 0.00 Optimization level: No Incremental: off ``` <details> <summary>Rendered</summary> Self profiling results: | Phase | Time (ms) | Time (%) | Queries | Hits (%) | ----------------------------------------- | -------------- | -------- | -------------- | -------- | **Other** | **177** | **54.97** | **8094** | **45.47** | - {time spent not running queries} | 113 | 35.09 | 0 | 0.00 | - const_eval | 16 | 4.97 | 26 | 11.54 | - type_of | 9 | 2.80 | 627 | 27.75 | - const_eval_raw | 8 | 2.48 | 22 | 0.00 | - adt_def | 7 | 2.17 | 381 | 11.55 | - visible_parent_map | 7 | 2.17 | 99 | 98.99 | - item_attrs | 6 | 1.86 | 698 | 50.14 | - item_children | 5 | 1.55 | 2815 | 0.00 | - adt_dtorck_constraint | 4 | 1.24 | 2 | 0.00 | - adt_destructor | 2 | 0.62 | 15 | 86.67 | TypeChecking | 53 | 16.46 | 2834 | 79.89 | - trait_impls_of | 9 | 2.80 | 65 | 86.15 | - evaluate_obligation | 7 | 2.17 | 80 | 2.50 | - const_is_rvalue_promotable_to_static | 6 | 1.86 | 1 | 0.00 | - is_copy_raw | 6 | 1.86 | 29 | 58.62 | - rvalue_promotable_map | 6 | 1.86 | 2 | 50.00 | - {time spent not running queries} | 6 | 1.86 | 0 | 0.00 | - typeck_item_bodies | 5 | 1.55 | 1 | 0.00 | - typeck_tables_of | 5 | 1.55 | 19 | 94.74 | - dropck_outlives | 2 | 0.62 | 1 | 0.00 | - layout_raw | 1 | 0.31 | 668 | 87.87 | Linking | 48 | 14.91 | 43 | 46.51 | - {time spent not running queries} | 48 | 14.91 | 0 | 0.00 | Codegen | 29 | 9.01 | 420 | 61.90 | - {time spent not running queries} | 16 | 4.97 | 0 | 0.00 | - collect_and_partition_mono_items | 11 | 3.42 | 13 | 92.31 | - mir_const | 1 | 0.31 | 1 | 0.00 | - mir_validated | 1 | 0.31 | 3 | 66.67 | Expansion | 14 | 4.35 | 0 | 0.00 | - {time spent not running queries} | 14 | 4.35 | 0 | 0.00 | BorrowChecking | 1 | 0.31 | 12 | 41.67 | - borrowck | 1 | 0.31 | 2 | 50.00 | Parsing | 0 | 0.00 | 0 | 0.00 Optimization level: No Incremental: off </details> cc @nikomatsakis @michaelwoerister @Zoxc Fixes #54141
☀️ Test successful - checks-travis, status-appveyor |
I think this broke parts of |
This new code appears to have a drastic performance overhead. E.g. running
Lots of BTreeMap stuff, and lots of copying. |
Timing data and cache hits/misses are now recorded at the query level.
This allows us to show detailed per query information such as total time
for each query.
To see detailed query information in the summary pass the
-Z verbose
flag. For example:
results in something like:
Rendered
Self profiling results:
Optimization level: No
Incremental: off
Fixes #54141