Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump tracing crate dependency #75391

Closed
wants to merge 3 commits into from
Closed

Conversation

oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Aug 11, 2020

r? @ghost

cc @hawkw

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 11, 2020

Testing for potential speed improvements as per #74726 (comment)

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Aug 11, 2020

⌛ Trying commit b466292 with merge b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8...

@bors
Copy link
Contributor

bors commented Aug 11, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8 (b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8)

@rust-timer
Copy link
Collaborator

Queued b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8 with parent 441fd22, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (b7c1f382b4b695dcd40edc5432ef9b9c6db0dba8): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 11, 2020

huh... this regressed rustdoc? It seems to have had no effect on any of the other tests

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 11, 2020

cc @rust-lang/rustdoc are rustdoc tests on the perfbot reliable? I've never seen them before

@Manishearth
Copy link
Member

They're new (got added a month ago iirc), and I don't know how reliable they are

cc @jyn514

@hawkw
Copy link
Contributor

hawkw commented Aug 11, 2020

huh... this regressed rustdoc?

that's pretty strange! i have no idea if 1% is meaningful in these tests though.

It seems to have had no effect on any of the other tests

i'm assuming these are run with a release build, which statically disables all tracing, right? i didn't expect a performance improvement in that case --- i expected it in the case where tracing was not statically disabled. the hope, specifically, was that performance would be improved enough that it wouldn't be necessary to disable tracing at compile time, so that it can be enabled in release builds just by setting RUSTC_LOG without recompiling, which @Mark-Simulacrum expressed interest in in #74726 (comment).

@jyn514
Copy link
Member

jyn514 commented Aug 11, 2020

Rustdoc perf is extremely variable so I wouldn't block on a 1% regression. In particular until we stop loading all extern crates on startup I don't think it makes sense to have the rest of the compiler be more careful than we are ourselves: https://github.com/rust-lang/rust/blob/master/src/librustdoc/core.rs#L423

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 12, 2020

i expected it in the case where tracing was not statically disabled. the hope, specifically, was that performance would be improved enough that it wouldn't be necessary to disable tracing at compile time, so that it can be enabled in release builds just by setting RUSTC_LOG without recompiling

oh heh, oops

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Aug 12, 2020

⌛ Trying commit 7d63e11 with merge dfcb3809f32e44b06f33c0781af9bf5bb42b20a3...

@bors
Copy link
Contributor

bors commented Aug 12, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: dfcb3809f32e44b06f33c0781af9bf5bb42b20a3 (dfcb3809f32e44b06f33c0781af9bf5bb42b20a3)

@rust-timer
Copy link
Collaborator

Queued dfcb3809f32e44b06f33c0781af9bf5bb42b20a3 with parent 840dbe7, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (dfcb3809f32e44b06f33c0781af9bf5bb42b20a3): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 12, 2020

@Mark-Simulacrum we're losing up to 10% on perf if we enable trace! even with debug-assertions off. We could start debugging them on a case by case basis to find ones that cause optimizations to bail out, because I don't think it's because of the actual invocations themselves.

@hawkw since one of the things we do if no RUSTC_LOG is enabled is to no register a subscriber at all, I don't think any additional fast paths you added will make it faster, as it's already a single atomic bool access.

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 12, 2020

trying with just allowing debug! and keeping trace! completely cfg-ed out

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Aug 12, 2020

⌛ Trying commit 9697841 with merge 05ccae7f970661c4e9dd14263a7232a881077a8f...

@bors
Copy link
Contributor

bors commented Aug 12, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 05ccae7f970661c4e9dd14263a7232a881077a8f (05ccae7f970661c4e9dd14263a7232a881077a8f)

@rust-timer
Copy link
Collaborator

Queued 05ccae7f970661c4e9dd14263a7232a881077a8f with parent 4745cbe, future comparison URL.

@jyn514
Copy link
Member

jyn514 commented Aug 12, 2020

the hope, specifically, was that performance would be improved enough that it wouldn't be necessary to disable tracing at compile time, so that it can be enabled in release builds just by setting RUSTC_LOG without recompiling

Is it possible to do this for debug! but not for trace! since the performance hit for trace is too high? e.g. rustdoc does silly things with trace like printing all documentation and DefIds to stdout, but the debug logs are much more reasonable (and I could move some of them to trace! as well without a problem).

@jyn514
Copy link
Member

jyn514 commented Aug 12, 2020

Having RUSTC_LOG=debug work by the default was the main motivation for debug = true in rust-lang/compiler-team#326 (which didn't make it in for performance reasons) so I'm very interested in getting this to work :)

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (05ccae7f970661c4e9dd14263a7232a881077a8f): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@jyn514
Copy link
Member

jyn514 commented Aug 12, 2020

Even debug! has a pretty hefty hit :(

@oli-obk
Copy link
Contributor Author

oli-obk commented Aug 12, 2020

Yea I don't think we can do anything here without really digging into all the sites where logging is used

@oli-obk oli-obk closed this Aug 12, 2020
@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Aug 12, 2020

Here is a cachegrind diff on wf-projection-stress Check/Full between the debug! try and the parent. My guess is that the extra load is causing a vectorization or similar failure

--------------------------------------------------------------------------------
Ir
--------------------------------------------------------------------------------
488,637,125  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
115,583,666  ???:<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty
 72,854,689  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty
 47,512,130  ???:rustc_infer::infer::at::At::sub_exp
 45,945,600  ???:rustc_infer::infer::region_constraints::RegionConstraintCollector::rollback_to
 43,219,576  ???:rustc_infer::infer::region_constraints::RegionConstraintCollector::start_snapshot
 33,203,704  ???:rustc_infer::infer::InferCtxt::commit_if_ok
 30,939,291  ???:rustc_infer::infer::InferCtxt::start_snapshot
 29,405,184  ???:rustc_infer::infer::InferCtxt::rollback_to
 28,227,544  ???:rustc_infer::infer::InferCtxt::probe
-22,882,155  ???:<core::iter::adapters::Copied<I> as core::iter::traits::iterator::Iterator>::try_fold
-17,317,321  ???:rustc_trait_selection::traits::select::SelectionContext::match_projection
 14,711,112  ???:rustc_infer::infer::undo_log::<impl rustc_infer::infer::InferCtxtInner>::rollback_to
 11,867,495  ???:rustc_infer::infer::higher_ranked::<impl rustc_infer::infer::combine::CombineFields>::higher_ranked_sub
 11,248,672  ???:<rustc_infer::infer::equate::Equate as rustc_middle::ty::relate::TypeRelation>::tys
  6,987,960  ???:<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::structural_impls::PredicateVisitor>::visit_predicate
  6,353,522  ???:rustc_middle::ty::context::TyCtxt::features
  6,330,920  ???:rustc_infer::infer::higher_ranked::<impl rustc_infer::infer::InferCtxt>::replace_bound_vars_with_placeholders
 -5,434,025  /rustc/4745cbe83e0b3299bfe7f7f305b975c3c09f92db//library/core/src/str/mod.rs:core::str::from_utf8

@Mark-Simulacrum
Copy link
Member

The trace diff looks basically identical FWIW, all of the cost seems to be in the debug! calls here at least for wf-projection-stress.

@jyn514
Copy link
Member

jyn514 commented Aug 12, 2020

Could those debug! calls be turned into trace!?

@Mark-Simulacrum
Copy link
Member

Yeah, so this is the old vs. new disassembly (presumably also the old one is inlined all over the place, but still present in un-inlined form) for the type folding:

000000000325f230 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty>:
 325f230:       mov    eax,DWORD PTR [rsi+0x18]
 325f233:       test   DWORD PTR [rdi],eax
 325f235:       setne  al
 325f238:       ret
00000000035dfe30 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty>:
 35dfe30:	                push   rbp
 35dfe31:	                push   r15
 35dfe33:	                push   r14
 35dfe35:	                push   r13
 35dfe37:	                push   r12
 35dfe39:	                push   rbx
 35dfe3a:	                sub    rsp,0xf8
 35dfe41:	                mov    rbp,rdi
 35dfe44:	                mov    QWORD PTR [rsp+0x8],rsi
 35dfe49:	                mov    rax,QWORD PTR [rip+0x14ffeb8]        # 4adfd08 <tracing_core::metadata::MAX_LEVEL@@Base-0x1d770>
 35dfe50:	                mov    rax,QWORD PTR [rax]
 35dfe53:	                add    rax,0xfffffffffffffffe
 35dfe57:	                cmp    rax,0x4
 35dfe5b:	/-------------- jb     35e003f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x20f>
 35dfe61:	|               lea    rbx,[rip+0x151bae8]        # 4afb950 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE>
 35dfe68:	|               mov    QWORD PTR [rsp+0xa0],rbx
 35dfe70:	|               mov    rax,QWORD PTR [rip+0x151bae9]        # 4afb960 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE+0x10>
 35dfe77:	|               cmp    rax,0x3
 35dfe7b:	|           /-- jne    35dfea4 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x74>
 35dfe7d:	|           |   mov    QWORD PTR [rsp+0x10],rbx
 35dfe82:	|           |   mov    rax,QWORD PTR [rip+0x151bac7]        # 4afb950 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE>
 35dfe89:	|           |   test   rax,0xfffffffffffffffd
 35dfe8f:	|  /--------|-- je     35dfeea <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0xba>
 35dfe91:	|  |     /--|-> lea    rdi,[rsp+0x10]
 35dfe96:	|  |     |  |   call   33a9060 <tracing_core::dispatcher::get_default>
 35dfe9b:	|  |     |  |   test   al,al
 35dfe9d:	|  |  /--|--|-- jne    35dfef4 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0xc4>
 35dfe9f:	+--|--|--|--|-- jmp    35e003f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x20f>
 35dfea4:	|  |  |  |  \-> lea    rax,[rsp+0xa0]
 35dfeac:	|  |  |  |      mov    QWORD PTR [rsp+0x70],rax
 35dfeb1:	|  |  |  |      lea    rax,[rsp+0x70]
 35dfeb6:	|  |  |  |      mov    QWORD PTR [rsp+0x10],rax
 35dfebb:	|  |  |  |      lea    rdi,[rip+0x151ba9e]        # 4afb960 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE+0x10>
 35dfec2:	|  |  |  |      lea    rcx,[rip+0x1482d9f]        # 4a62c68 <anon.265d2ff7bf1138c8a9d5475f9b93bdc0.107.llvm.161654895804039685>
 35dfec9:	|  |  |  |      lea    rdx,[rsp+0x10]
 35dfece:	|  |  |  |      xor    esi,esi
 35dfed0:	|  |  |  |      call   QWORD PTR [rip+0x14fec42]        # 4adeb18 <std::sync::once::Once::call_inner>
 35dfed6:	|  |  |  |      mov    QWORD PTR [rsp+0x10],rbx
 35dfedb:	|  |  |  |      mov    rax,QWORD PTR [rip+0x151ba6e]        # 4afb950 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE>
 35dfee2:	|  |  |  |      test   rax,0xfffffffffffffffd
 35dfee8:	|  |  |  \----- jne    35dfe91 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x61>
 35dfeea:	|  \--|-------> cmp    rax,0x2
 35dfeee:	+-----|-------- jne    35e003f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x20f>
 35dfef4:	|     \-------> mov    r14,QWORD PTR [rip+0x151ba5d]        # 4afb958 <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty::CALLSITE+0x8>
 35dfefb:	|               mov    rbx,QWORD PTR [rip+0x15030ce]        # 4ae2fd0 <tracing_core::metadata::Metadata::fields@@Base+0x1129fa0>
 35dff02:	|               mov    rdi,r14
 35dff05:	|               call   rbx
 35dff07:	|               lea    r13,[rsp+0xa0]
 35dff0f:	|               mov    rdi,r13
 35dff12:	|               mov    rsi,rax
 35dff15:	|               call   QWORD PTR [rip+0x15060e5]        # 4ae6000 <tracing_core::field::FieldSet::iter@@Base+0x112d6d0>
 35dff1b:	|               mov    rdi,r14
 35dff1e:	|               call   rbx
 35dff20:	|               mov    r15,rax
 35dff23:	|               lea    r12,[rsp+0x10]
 35dff28:	|               mov    rdi,r12
 35dff2b:	|               mov    rsi,r13
 35dff2e:	|               call   QWORD PTR [rip+0x1506e6c]        # 4ae6da0 <<tracing_core::field::Iter as core::iter::traits::iterator::Iterator>::next@@Base+0x112e340>
 35dff34:	|               cmp    QWORD PTR [rsp+0x18],0x0
 35dff3a:	|           /-- je     35e005f <<rustc_middle::ty::fold::HasTypeFlagsVisitor as rustc_middle::ty::fold::TypeVisitor>::visit_ty+0x22f>
 35dff40:	|           |   mov    rax,QWORD PTR [rsp+0x30]
 35dff45:	|           |   mov    QWORD PTR [rsp+0xf0],rax
 35dff4d:	|           |   movups xmm0,XMMWORD PTR [rsp+0x10]
 35dff52:	|           |   movups xmm1,XMMWORD PTR [rsp+0x20]
 35dff57:	|           |   movaps XMMWORD PTR [rsp+0xe0],xmm1
 35dff5f:	|           |   movaps XMMWORD PTR [rsp+0xd0],xmm0
 35dff67:	|           |   mov    rax,QWORD PTR [rsp+0x8]
 35dff6c:	|           |   add    rax,0x18
 35dff70:	|           |   lea    rcx,[rsp+0x8]
 35dff75:	|           |   mov    QWORD PTR [rsp+0x10],rcx
 35dff7a:	|           |   mov    rcx,QWORD PTR [rip+0x14fe0cf]        # 4ade050 <rustc_middle::ty::structural_impls::<impl core::fmt::Debug for &rustc_middle::ty::TyS>::fmt@@Base+0x162b9a0>
 35dff81:	|           |   mov    QWORD PTR [rsp+0x18],rcx
 35dff86:	|           |   mov    QWORD PTR [rsp+0x20],rax
 35dff8b:	|           |   mov    rax,QWORD PTR [rip+0x14ff136]        # 4adf0c8 <<rustc_middle::ty::TypeFlags as core::fmt::Debug>::fmt@@Base+0x15ebd28>
 35dff92:	|           |   mov    QWORD PTR [rsp+0x28],rax
 35dff97:	|           |   mov    QWORD PTR [rsp+0x30],rbp
 35dff9c:	|           |   mov    QWORD PTR [rsp+0x38],rax
 35dffa1:	|           |   lea    rax,[rip+0x14c1120]        # 4aa10c8 <anon.5586f39a61421f7cfcd1ecd8d73ab9b6.1061.llvm.5764158716530063693+0x78>
 35dffa8:	|           |   mov    QWORD PTR [rsp+0x70],rax
 35dffad:	|           |   mov    QWORD PTR [rsp+0x78],0x3
 35dffb6:	|           |   mov    QWORD PTR [rsp+0x80],0x0
 35dffc2:	|           |   mov    QWORD PTR [rsp+0x90],r12
 35dffca:	|           |   mov    QWORD PTR [rsp+0x98],0x3
 35dffd6:	|           |   lea    rax,[rsp+0xd0]
 35dffde:	|           |   mov    QWORD PTR [rsp+0x58],rax
 35dffe3:	|           |   lea    rax,[rsp+0x70]
 35dffe8:	|           |   mov    QWORD PTR [rsp+0x60],rax
 35dffed:	|           |   lea    rax,[rip+0x14bf86c]        # 4a9f860 <anon.5586f39a61421f7cfcd1ecd8d73ab9b6.741.llvm.5764158716530063693+0xf8>
 35dfff4:	|           |   mov    QWORD PTR [rsp+0x68],rax
 35dfff9:	|           |   lea    rax,[rsp+0x58]
 35dfffe:	|           |   mov    QWORD PTR [rsp+0x40],rax
 35e0003:	|           |   mov    QWORD PTR [rsp+0x48],0x1
 35e000c:	|           |   mov    QWORD PTR [rsp+0x50],r15
 35e0011:	|           |   lea    rax,[rsp+0x40]
 35e0016:	|           |   mov    QWORD PTR [rsp+0xa0],rax
 35e001e:	|           |   mov    QWORD PTR [rsp+0xa8],r14
 35e0026:	|           |   mov    QWORD PTR [rsp+0xb0],0x1
 35e0032:	|           |   lea    rdi,[rsp+0xa0]
 35e003a:	|           |   call   33a9200 <tracing_core::dispatcher::get_default>
 35e003f:	\-----------|-> mov    rax,QWORD PTR [rsp+0x8]
 35e0044:	            |   mov    eax,DWORD PTR [rax+0x18]
 35e0047:	            |   test   DWORD PTR [rbp+0x0],eax
 35e004a:	            |   setne  al
 35e004d:	            |   add    rsp,0xf8
 35e0054:	            |   pop    rbx
 35e0055:	            |   pop    r12
 35e0057:	            |   pop    r13
 35e0059:	            |   pop    r14
 35e005b:	            |   pop    r15
 35e005d:	            |   pop    rbp
 35e005e:	            |   ret    
 35e005f:	            \-> lea    rdi,[rip+0x602226]        # 3be228c <str.0+0x13c>
 35e0066:	                lea    rdx,[rip+0x14c1043]        # 4aa10b0 <anon.5586f39a61421f7cfcd1ecd8d73ab9b6.1061.llvm.5764158716530063693+0x60>
 35e006d:	                mov    esi,0x22
 35e0072:	                call   QWORD PTR [rip+0x150a790]        # 4aea808 <core::option::expect_failed>
 35e0078:	                ud2    

@hawkw
Copy link
Contributor

hawkw commented Aug 12, 2020

@Mark-Simulacrum oof, that is a pretty hairy disassembly. I suspect there are additional upstream changes we can make as a follow-on from tokio-rs/tracing#869 to simplify the per-callsite generated code some more, which should hopefully make it a little friendlier to rustc optimization.

As a side note, what disassembler are you using? That output is quite nice :)

@Mark-Simulacrum
Copy link
Member

Just regular objdump (GNU binutils version 2.34), with --visualize-jumps --no-show-raw-insn.

@hawkw
Copy link
Contributor

hawkw commented Aug 12, 2020

Ah, --visualize-jumps is the flag I wanted, TIL. Thanks!

@oli-obk oli-obk deleted the tracing_bump branch March 16, 2021 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants