Use undef for (some) partially-uninit constants #94130

erikdesjardins · 2022-02-18T21:16:07Z

There needs to be some limit to avoid perf regressions on large arrays
with undef in each element (see comment in the code).

Fixes: #84565
Original PR: #83698

Depends on LLVM 14: #93577

There needs to be some limit to avoid perf regressions on large arrays with undef in each element (see comment in the code).

lqd · 2022-02-18T21:44:04Z

@bors try @rust-timer queue

rust-timer · 2022-02-18T21:44:05Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-02-18T21:44:11Z

⌛ Trying commit b7e5597 with merge 796a3796f114911f2d4b98ea97c8ff28a6f37634...

RalfJung · 2022-02-18T22:13:34Z

I know next to nothing about how LLVM represents consts and when or how that might cause slowdowns, and this doesn't even touch the CTFE engine, so I don't think I am a suitable reviewer for this.

erikdesjardins · 2022-02-18T22:18:01Z

r? @nagisa

nagisa · 2022-02-18T22:18:51Z

cc @nikic

nikic · 2022-02-18T22:32:02Z

I wonder if the right heuristic here isn't related to size, but rather whether the constant would otherwise be zeroinitializer? If it isn't, then the value needs to be made explicit anyway, and it shouldn't matter whether the explicit value is undef or some other value.

bors · 2022-02-18T23:13:19Z

☀️ Try build successful - checks-actions
Build commit: 796a3796f114911f2d4b98ea97c8ff28a6f37634 (796a3796f114911f2d4b98ea97c8ff28a6f37634)

rust-timer · 2022-02-18T23:13:21Z

Queued 796a3796f114911f2d4b98ea97c8ff28a6f37634 with parent b8c56fa, future comparison URL.

erikdesjardins · 2022-02-18T23:42:08Z

I wonder if the right heuristic here isn't related to size, but rather whether the constant would otherwise be zeroinitializer

I think that makes some sense* -- but I do want to emit undef for some values that would otherwise be zeroinitializer.

e.g. a type like

struct Foo {
    init: bool,
    data: MaybeUninit<SomeLargeStruct>,
}
const FOO: Foo = Foo { init: false, data: MaybeUninit::uninit() };

could be zeroinitializer, but would benefit from being partially undef since only one byte is initialized.
(i.e. so memcpys of it could be shrunk to just one byte--doesn't look like LLVM currently does this, but it could be done)

Combining the two heuristics is an option of course (using undef for consts that are either nonzero or < size limit), although at that point I'm not sure it would be worth it.

Another option: limit the number of fields in the anonymous struct that we generate, i.e., the number of contiguous chunks of all-init or all-uninit bytes. I think this would be a decently close match to the actual cost of generating the const expression. It would also let us handle (potentially) profitable cases like my struct Foo example regardless of how large they are. (Hmm, I think I've convinced myself that this is better than size, at least.)

* I do think it's still not fully accurate though--it ought to be more expensive to generate { [1 x i8] c"1", [1 x i8] undef, [1 x i8] c"2", ... } than { [1000 x i8] c"..." }, since for the former we have to allocate ~~1000~~500 const strings vs. just 1 large one for the latter.

erikdesjardins · 2022-02-19T06:19:27Z

(Hmm, I think I've convinced myself that this is better than size, at least.)

Switched to limiting the number of chunks. Let me know if you'd prefer something else.

rust-timer · 2022-02-19T10:47:10Z

Finished benchmarking commit (796a3796f114911f2d4b98ea97c8ff28a6f37634): comparison url.

Summary: This benchmark run did not return any relevant results. 5 results were found to be statistically significant but too small to be relevant.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

nikic · 2022-02-19T15:23:27Z

Yeah, using the number of chunks makes a lot of sense to me.

compiler/rustc_session/src/options.rs

nikic · 2022-02-19T15:39:33Z

@bors r+

RalfJung · 2022-02-20T16:42:45Z

@bors r=nikic

bors · 2022-02-20T16:42:47Z

📌 Commit 067f628 has been approved by nikic

RalfJung · 2022-02-20T16:55:11Z

@bors r-

nikic · 2022-02-20T21:22:19Z

@bors r+

bors · 2022-02-20T21:22:20Z

📌 Commit 5bf8303 has been approved by nikic

bors · 2022-02-24T16:30:33Z

⌛ Testing commit 5bf8303 with merge 3450b91e88c8e5343cffeead37603a63a8f904a9...

bors · 2022-02-24T16:34:14Z

💔 Test failed - checks-actions

rust-log-analyzer · 2022-02-24T16:37:11Z

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

nikic · 2022-02-24T21:08:49Z

@bors retry spurious network error

bors · 2022-02-25T05:44:36Z

⌛ Testing commit 5bf8303 with merge ece55d4...

bors · 2022-02-25T08:09:19Z

☀️ Test successful - checks-actions
Approved by: nikic
Pushing ece55d4 to master...

rust-timer · 2022-02-25T09:38:51Z

Finished benchmarking commit (ece55d4): comparison url.

Summary: This benchmark run shows 3 relevant improvements 🎉 but 6 relevant regressions 😿 to instruction counts.

Arithmetic mean of relevant regressions: 0.7%
Arithmetic mean of relevant improvements: -1.9%
Arithmetic mean of all relevant changes: -0.1%
Largest improvement in instruction counts: -2.9% on full builds of ctfe-stress-4 opt
Largest regression in instruction counts: 0.9% on incr-unchanged builds of externs opt

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

erikdesjardins · 2022-02-25T20:06:51Z

Perf changes:

ctfe-stress-4 opt full improvement:

LLVM does less work. Judging from the involved functions, this is likely because the generated bitcode is much smaller, since we emit just undef for large parts of some constant(s) instead of a large string.

--------------------------------------------------------------------------------
Ir             
--------------------------------------------------------------------------------
-1,234,509,749  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir             file:function
--------------------------------------------------------------------------------
 -293,601,221  ???:(anonymous namespace)::ModuleBitcodeWriter::writeConstants(unsigned int, unsigned int, bool)
 -270,535,020  ???:void llvm::BitstreamWriter::EmitRecordWithAbbrevImpl<unsigned long>(unsigned int, llvm::ArrayRef<unsigned long>, llvm::StringRef, llvm::Optional<unsigned int>)
 -255,855,613  ???:llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*)
 -184,944,003  ???:llvm::SHA1::hashBlock()
 -117,440,526  ???:llvm::Type::getPrimitiveSizeInBits() const
  -87,031,635  ???:llvm::ConstantDataArray::getString(llvm::LLVMContext&, llvm::StringRef, bool)
  -66,059,028  ???:(anonymous namespace)::BitcodeReader::parseConstants()
   17,301,570  ???:llvm::raw_svector_ostream::write_impl(char const*, unsigned long)
   15,204,381  ???:llvm::raw_ostream::write(char const*, unsigned long)
  -15,006,677  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_sse2_unaligned_erms
    9,831,661  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::run
    9,437,220  ???:llvm::raw_ostream::flush_tied_then_write(char const*, unsigned long)
   -7,077,468  ???:llvm::SHA1::update(llvm::ArrayRef<unsigned char>)
    5,244,443  ???:llvm::MCAssembler::writeSectionData(llvm::raw_ostream&, llvm::MCSection const*, llvm::MCAsmLayout const&) const
    5,240,004  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:memcpy@GLIBC_2.2.5
    2,753,552  ???:<rustc_middle::mir::interpret::allocation::InitMask>::find_bit
   -1,310,727  ???:<rustc_const_eval::const_eval::machine::CompileTimeInterpreter as rustc_const_eval::interpret::machine::Machine>::find_mir_or_eval_fn

ucd debug full regression:

LLVM does more work, likely because we generate constants with more fields (which is expected).

Functions like <&mut alloc::vec::Vec<ena::unify::VarValue<rustc_type_ir::TyVid>> as core::convert::AsRef<[ena::unify::VarValue<rustc_type_ir::TyVid>]>>::as_ref (this is impl<T> AsRef<T> for &mut T) should never show up in diffs since they're trivial (and Vec<T> as AsRef<[T]> is also pretty simple). I'll open a PR adding #[inline] to those to see if it causes improvements more generally.

Edit: opened #94372

--------------------------------------------------------------------------------
Ir          
--------------------------------------------------------------------------------
112,870,638  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
-19,111,572  ???:<&mut alloc::vec::Vec<ena::unify::VarValue<rustc_type_ir::TyVid>> as core::convert::AsRef<[ena::unify::VarValue<rustc_type_ir::TyVid>]>>::as_ref
 15,031,469  ???:llvm::DataLayout::getAlignment(llvm::Type*, bool) const
 14,773,437  ???:llvm::MCAssembler::writeSectionData(llvm::raw_ostream&, llvm::MCSection const*, llvm::MCAsmLayout const&) const
  6,930,326  ???:llvm::ConstantDataArray::getString(llvm::LLVMContext&, llvm::StringRef, bool)
  6,135,110  ???:llvm::SmallPtrSetImplBase::insert_imp_big(void const*)
  5,341,507  ???:llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const
  5,186,440  ???:llvm::MCAsmLayout::layoutFragment(llvm::MCFragment*)
 -4,139,756  ???:<rustc_infer::infer::sub::Sub as rustc_middle::ty::relate::TypeRelation>::tys
  4,015,730  ???:llvm::MCAssembler::layout(llvm::MCAsmLayout&)
  3,861,034  ???:<rustc_const_eval::interpret::validity::ValidityVisitor<rustc_const_eval::const_eval::machine::CompileTimeInterpreter> as rustc_const_eval::interpret::visitor::ValueVisitor<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::walk_value
  3,765,544  ???:llvm::MCObjectStreamer::emitFill(llvm::MCExpr const&, unsigned long, llvm::SMLoc)
  3,615,753  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/src/arena.c:arena_dalloc_bin_locked_impl
  3,447,968  ???:llvm::MCObjectStreamer::getOrCreateDataFragment(llvm::MCSubtargetInfo const*)
...

all the externs regressions look like:

--------------------------------------------------------------------------------
Ir         
--------------------------------------------------------------------------------
13,448,267  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir          file:function
--------------------------------------------------------------------------------
13,504,500  ???:rustc_middle::ty::context::tls::TLV::__getit
  -145,666  ???:<rustc_query_system::dep_graph::graph::CurrentDepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::intern_node
   -66,004  ???:<rustc_middle::dep_graph::dep_node::DepKind as rustc_query_system::dep_graph::DepKind>::read_deps::<<rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::read_index::{closure
    27,018  ./nptl/../nptl/pthread_mutex_trylock.c:pthread_mutex_trylock
   -20,467  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/src/extent.c:_rjem_je_extent_heap_remove_first
    18,004  ???:<rustc_ast::ast::NestedMetaItem>::from_tokens::<rustc_ast::tokenstream::Cursor>
   -15,157  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/src/base.c:base_alloc_impl
    14,724  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/include/jemalloc/internal/hash.h:extent_try_coalesce_impl
    14,315  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/include/jemalloc/internal/hash.h:extent_lock_from_addr
    14,220  ./nptl/pthread_mutex_unlock.c:__pthread_mutex_unlock_usercnt

Judging from callgrind (omitted from this comment), this is a __getit call that was inlined before this change and isn't afterwards.
__getit is already #[inline], I'll open a PR to try making it #[inline(always)] to see if that is worthwhile.

Edit: opened #94373

Use undef for partially-uninit constants up to 1024 bytes

b7e5597

There needs to be some limit to avoid perf regressions on large arrays with undef in each element (see comment in the code).

rust-highfive assigned RalfJung Feb 18, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 18, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 18, 2022

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2022

rust-highfive assigned nagisa and unassigned RalfJung Feb 18, 2022

erikdesjardins changed the title ~~Use undef for partially-uninit constants up to 1024 bytes~~ Use undef for (some) partially-uninit constants Feb 19, 2022

This comment has been minimized.

Sign in to view

switch to limiting the number of init/uninit chunks

d5769e9

erikdesjardins force-pushed the partially branch from 8ae018f to d5769e9 Compare February 19, 2022 06:29

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 19, 2022

nikic reviewed Feb 19, 2022

View reviewed changes

compiler/rustc_session/src/options.rs Outdated Show resolved Hide resolved

reduce default uninit_const_chunk_threshold to 16 (from 256)

c2e84fa

erikdesjardins force-pushed the partially branch from adbf133 to c2e84fa Compare February 19, 2022 15:36

This comment has been minimized.

Sign in to view

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 20, 2022

limit tests to llvm 14+

5bf8303

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 20, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 24, 2022

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 24, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 25, 2022

bors merged commit ece55d4 into rust-lang:master Feb 25, 2022

rustbot added this to the 1.61.0 milestone Feb 25, 2022

rustbot added the perf-regression Performance regression. label Feb 25, 2022

erikdesjardins deleted the partially branch February 25, 2022 19:14

erikdesjardins mentioned this pull request Feb 26, 2022

rustc_errors: let DiagnosticBuilder::emit return a "guarantee of emission". #93368

Merged

Use undef for (some) partially-uninit constants #94130

Use undef for (some) partially-uninit constants #94130

Uh oh!

Conversation

erikdesjardins commented Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lqd commented Feb 18, 2022

Uh oh!

rust-timer commented Feb 18, 2022

Uh oh!

bors commented Feb 18, 2022

Uh oh!

RalfJung commented Feb 18, 2022

Uh oh!

erikdesjardins commented Feb 18, 2022

Uh oh!

nagisa commented Feb 18, 2022

Uh oh!

nikic commented Feb 18, 2022

Uh oh!

bors commented Feb 18, 2022

Uh oh!

rust-timer commented Feb 18, 2022

Uh oh!

erikdesjardins commented Feb 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikdesjardins commented Feb 19, 2022

Uh oh!

This comment has been minimized.

rust-timer commented Feb 19, 2022

Uh oh!

nikic commented Feb 19, 2022

Uh oh!

Uh oh!

nikic commented Feb 19, 2022

Uh oh!

RalfJung commented Feb 20, 2022

Uh oh!

bors commented Feb 20, 2022

Uh oh!

This comment has been minimized.

RalfJung commented Feb 20, 2022

Uh oh!

nikic commented Feb 20, 2022

Uh oh!

bors commented Feb 20, 2022

Uh oh!

bors commented Feb 24, 2022

Uh oh!

bors commented Feb 24, 2022

Uh oh!

rust-log-analyzer commented Feb 24, 2022

Uh oh!

nikic commented Feb 24, 2022

Uh oh!

bors commented Feb 25, 2022

Uh oh!

bors commented Feb 25, 2022

Uh oh!

rust-timer commented Feb 25, 2022

Uh oh!

erikdesjardins commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

erikdesjardins commented Feb 18, 2022 •

edited

Loading

erikdesjardins commented Feb 18, 2022 •

edited

Loading

erikdesjardins commented Feb 25, 2022 •

edited

Loading