-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve codegen of fmt_num to delete unreachable panic #122770
improve codegen of fmt_num to delete unreachable panic #122770
Conversation
it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())` for at least one character output, and `curr` will always be at least one below `buf.len()`. adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers.
rustbot has assigned @workingjubilee. Use |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…ion, r=<try> improve codegen of fmt_num to delete unreachable panic it seems LLVM doesn't realize that `curr` is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later `&buf[curr..]` generates a check for out-of-bounds access and panic. this is unreachable in reality as even for `x == T::zero()` we'll produce at least the character `Self::digit(T::zero())`, yielding at least one character output, and `curr` will always be at least one below `buf.len()`. adjust `fmt_int` to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers. in the program i'd noticed this in, you can see the `cmp $0x80,%rdi; ja 7c` here, which branches to a slice index fail helper: <img width="660" alt="before" src="https://github.com/rust-lang/rust/assets/4615790/ac482d54-21f8-494b-9c83-4beadc3ca0ef"> where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken `cmp/ja` being gone: <img width="646" alt="after" src="https://github.com/rust-lang/rust/assets/4615790/1bee1d76-b674-43ec-9b21-4587364563aa"> this represents a ~2-3% difference in runtime in my [admittedly comically i32-formatting-bound](https://github.com/athre0z/disas-bench/blob/master/bench/yaxpeax/src/main.rs#L58-L67) use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x. the impact on `<impl LowerHex for i8>::fmt` is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl for `i32`. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with the `lea/add/neg` at the end of the loop. it behaves about the same before and after. --- i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like ```diff if x == zero { // No more digits left to accumulate. break; }; } } + + if curr >= buf.len() { + unsafe { core::hint::unreachable_unchecked(); } + } let buf = &buf[curr..]; ``` posting a random PR to `rust-lang/rust` to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case with `buf.iter_mut().rev()` as the iterator, `<impl LowerHex for i8>::fmt` actually unrolls into something like ``` put_char(x & 0xf); let mut len = 1; if x > 0xf { put_char((x >> 4) & 0xf); len = 2; } pad_integral(buf[buf.len() - len..]); ``` it's pretty cool! `<impl LowerHex for i32>::fmt` also was slightly better. that all resulted in closer to an 6% difference in my use case. --- i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were _worse_. (i have absolutely _no_ idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
@iximeow this can in fact be regression-tested! we'd use a codegen (llvm level) or assembly (x86-64 level) test. they use FileCheck to verify that a certain pattern is fulfilled, and can be parameterized with a bit of fussing with their DSL and regex so that e.g. it doesn't matter exactly which |
Finished benchmarking commit (baf578c): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 669.372s -> 670.355s (0.15%) |
are there codegen tests for how rustc compiles |
🤨 or given that i expected this at worst to have no effect, maybe i should double-check how |
@iximeow That performance suite, and the remark, is about compiler tests, primarily. There's a runtime perf suite, but it showed no change. It's not surprising to me that rustc's happy path instruction counts aren't greatly affected by this change (this branch, after all, should not be hit), and the two tests it is affected on are fairly "noisy" tests and also reflect synthetic stress tests.
It will be under An example that is relevant is this, with its |
@iximeow the perf bot's default remarks are also based on instructions executed. note that the wall time is reduced: this, combined with the (slight) binary size reduction, in generated artifacts (admittedly, the only non-synthetic mark here is helloworld): suggests to me that this may in fact be resulting in smaller code sizes with better cache layout. also, I'd be happy to accept the |
i added for tests like this is a problem for the new
where the assertion i want to write is really that but in this .ll all we get is the function declaration,
i was hoping that maybe by making the crate type sufficiently statically linked that i could get a .ll with this function filled in, but i don't see how. at least then writing a correct test would be reduced to "write a i couldn't see similar prior art in other forms of (i also added benches for formatting |
@iximeow Oh, no, you're right, this is tricky to test. Hrm. |
@saethlin do we have a sufficient combination of commands such that with enough
we can make this codegen-testable anyway? |
That is true and I really do not like that approach. Playing games with the inliner rules tends to have undesired effects; for example it's possible if In addition, codegen tests are built against whatever sysroot happens to be specified by the settings in I recently added a test based on using The usual LTO at home flags are:
Full LLVM optimizations, the LTO at home flag, enable MIR inlining in all possible cases then crank up the thresholds to an absurd level. (be aware that just applying such a high MIR inlining threshold to arbitrary crates will cause absurd memory usage, even my beefy desktop cannot compile But this doesn't help in part because of what I said above: The sysroot has already been compiled with a different set of flags that make LTO at home fail to apply. (as an optimization to compile time grumble grumble) #115306 we only encode MIR which is deduced to be useful to other crates when compiling the current crate. The compilation of |
An assembly test with |
this empty commit reproduces a github comment that describes the work on commits from this point back to, roughly, 1.2.2. since many commits between these two points are interesting in the context of performance optimization (especially uarch-relevant tweaks), many WIP commits are preserved. as a result there is no clear squash merge, and this commit will be the next best thing. on Rust 1.68.0 and a Xeon E3-1230 V2, relative changes are measured roughly as: starting at ed4f238: - non-fmt ns/decode: 15ns - non-fmt instructions/decode: 94.6 - non-fmt IPC: 1.71 - fmt ns/decode+display: 91ns - fmt instructions/decode+display: 683.8 - fmt IPC: 2.035 ending at 6a5ea10 - non-fmt ns/decode: 15ns - non-fmt instructions/decode: 94.6 - non-fmt IPC: 1.71 - fmt ns/decode+display: 47ns - fmt instructions/decode+display: 329.6 - fmt IPC: 1.898 for an overall ~50% reduction in runtimes to display instructions. writing into InstructionTextBuffer reduces overhead another ~10%. -- original message follows -- this is where much of iximeow/yaxpeax-arch#7 originated. `std::fmt` as a primary writing mechanism has.. some limitations: * rust-lang/rust#92993 (comment) * llvm/llvm-project#87440 * rust-lang/rust#122770 and some more interesting more fundamental limitations - writing to a `T: fmt::Write` means implementations don't know if it's possible to write bytes in reverse order (useful for printing digits) or if it's OK to write too many bytes and then only advance `len` by the correct amount (useful for copying variable-length-but-short strings like register names). these are both perfectly fine to a `String` or `Vec`, less fine to do to a file descriptor like stdout. at the same time, `Colorize` and traits depending on it are very broken, for reasons described in yaxpeax-arch. so, this adapts `yaxpeax-x86` to use the new `DisplaySink` type for writing, with optimizations where appropriate and output spans for certain kinds of tokens - registers, integers, opcodes, etc. it's not a perfect replacement for Colorize-to-ANSI-supporting-outputs but it's more flexible and i think can be made right. along the way this completes the move of `safer_unchecked` out to yaxpeax-arch (ty @5225225 it's still so useful), cleans up some docs, and comes with a few new test cases. because of the major version bump of yaxpeax-arch, and because this removes most functionality of the Colorize impl - it prints the correct words, just without coloring - this is itself a major version bump to 2.0.0. yay! this in turn is a good point to change the `Opcode` enums from being tuple-like to struct-like, and i've done so in 1b8019d. full notes in CHANGELOG ofc. this is notes for myself when i'm trying to remember any of this in two years :)
We discussed this in a T-libs meeting. Given the difficulty of writing a codegen test that properly inlines everything we're ok with merging this without a test. AIUI the test currently isn't working as intended? If so it should be removed from the PR and someone wants to add a working codegen test that can be done in a followup PR. |
this test was included for demonstrative and discussion purposes but does not test what its name alleges - in fact it does not test much of value at all! as mentioned in this comment, rust-lang#122770 (comment) , writing a correct test for this codegen outcome is difficult (partially because the public interface to std::fmt intentionally includes an #[inline(never)] function!), so to test this correctly will require more than i can offer in 122770.
correct, i'd included it in the hopes it was close to being useful (alas!) - i've removed it but kept the benches for u8 formatting as those are informative. |
Ah, thank you! Sorry about the everything, @bors r+ rollup=never |
☀️ Test successful - checks-actions |
Finished benchmarking commit (22bcb81): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -3.3%, secondary 1.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.0%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 786.257s -> 787.179s (0.12%) |
78fc550 Auto merge of rust-lang#133247 - GuillaumeGomez:reduce-integer-display-impl, r=workingjubilee db5c2c6 Rollup merge of rust-lang#132982 - suaviloquence:2-doc-changed-alloc-methods, r=Mark-Simulacrum 117ad4f Rollup merge of rust-lang#132533 - SUPERCILEX:patch-4, r=Mark-Simulacrum e2aa7c1 fix `Allocator` method names in `alloc` free function docs 6b141ee Rollup merge of rust-lang#133298 - n0toose:remove-dir-all-but-not-paths, r=Noratrieb e3691db Rollup merge of rust-lang#133260 - compiler-errors:deref, r=fee1-dead 895f290 Rollup merge of rust-lang#132730 - joboet:after_main_sync, r=Noratrieb 6ffa455 Rollup merge of rust-lang#133389 - eduardosm:stabilize-const_float_methods, r=RalfJung f413935 Rollup merge of rust-lang#133301 - GuillaumeGomez:add-example-wrapping-neg, r=workingjubilee 6112cfd Auto merge of rust-lang#132611 - compiler-errors:async-prelude, r=ibraheemdev 23a5a0e Auto merge of rust-lang#132597 - lukas-code:btree-plug-leak, r=jhpratt f0b0942 Constify Deref and DerefMut d05e8e8 Auto merge of rust-lang#133379 - jieyouxu:rollup-00jxo71, r=jieyouxu 641c1ae Stabilize `const_float_methods` 256c54d Auto merge of rust-lang#133377 - jieyouxu:rollup-n536hzq, r=jieyouxu dff533f Improve code by using `unsigned_abs` a850f7c Rollup merge of rust-lang#133237 - fee1-dead-contrib:constadd, r=compiler-errors 99741dd Rollup merge of rust-lang#133332 - bjoernager:const-array-as-mut-slice, r=jhpratt 9a152e2 Rollup merge of rust-lang#131505 - madsmtm:darwin_user_temp_dir, r=dtolnay a12c838 Auto merge of rust-lang#132994 - clubby789:cc-bisect, r=Kobzol 6548ad8 Auto merge of rust-lang#133360 - compiler-errors:rollup-a2o38tq, r=compiler-errors a4f797e Rollup merge of rust-lang#133264 - lolbinarycat:os-string-truncate, r=joboet a939801 Auto merge of rust-lang#132329 - compiler-errors:fn-and-destruct, r=lcnr 30aa6db Add code example for `wrapping_neg` method for signed integers bc77567 Deduplicate checking drop terminator 6f3ec5c Gate const drop behind const_destruct feature, and fix const_precise_live_drops post-drop-elaboration check fb6f0c2 Auto merge of rust-lang#133339 - jieyouxu:rollup-gav0nvr, r=jieyouxu c792ef3 Rollup merge of rust-lang#133337 - ColinFinck:thread-scoped-fix-typo, r=joboet cfed1c6 Rollup merge of rust-lang#133330 - RalfJung:close, r=the8472 e26edf0 Rollup merge of rust-lang#133313 - thesummer:fix-arc4random, r=cuviper 90a85ef Rollup merge of rust-lang#133288 - bjoernager:const-array-each-ref, r=jhpratt 4e6f154 Rollup merge of rust-lang#133238 - heiher:loong-stdarch-rexport, r=Amanieu 23a1b31 Auto merge of rust-lang#130867 - michirakara:steps_between, r=dtolnay 9693572 Fix typo in `std::thread::Scope::spawn` documentation. b4a5067 Mark '<[T; N]>::as_mut_slice' as 'const'; b6b40ef library: update comment around close() 6ce7e79 Don't try to use confstr in Miri 40d6e2c Auto merge of rust-lang#129238 - umgefahren:stabilize-ipv6-unique-local, r=dtolnay 276c0fc distinguish overflow and unimplemented in Step::steps_between 8be952b Use arc4random of libc for RTEMS target 4583dde Mention that std::fs::remove_dir_all fails on files 4f6ca37 Mark and implement 'each_ref' and 'each_mut' in '[T; N]' as const; ec220b6 constify `Add` 3c558bf Rollup merge of rust-lang#131736 - hoodmane:emscripten-wasm-bigint, r=workingjubilee 38d4c11 implement OsString::truncate 4fd2c8d Rollup merge of rust-lang#133226 - compiler-errors:opt-in-pointer-like, r=lcnr 3f03a0f Rollup merge of rust-lang#130800 - bjoernager:const-mut-cursor, r=joshtriplett eea7e23 Rollup merge of rust-lang#129838 - Ayush1325:uefi-process-args, r=joboet 8b4995a Make PointerLike opt-in as a trait f74b38a Reduce integer `Display` implementation size 2f179d1 Stabilize const_pin_2 b2dc297 re-export `is_loongarch_feature_detected` e26c298 Rollup merge of rust-lang#132732 - gavincrawford:as_ptr_attribute, r=Urgau d6ee9db Rollup merge of rust-lang#133183 - n0toose:improve-remove-dir-docs, r=joboet 40735d3 Rollup merge of rust-lang#125405 - m-ou-se:thread-add-spawn-hook, r=WaffleLapkin 6c20348 Rollup merge of rust-lang#123947 - zopsicle:vec_deque-Iter-as_slices, r=Amanieu 2089cb3 Update doc comments for spawn hook. c02090d Address review comments. 79bffa9 Fix tracking issue. 3eff64c Add tracking issue. 15bac4f Use Send + Sync for spawn hooks. a42af06 Add thread Builder::no_hooks(). 49ac15b Update thread spawn hooks. 2cc4b2e Use add_spawn_hook for libtest's output capturing. 24a0765 Add std::thread::add_spawn_hook. 50ac725 Correct comments concerning updated dangling pointer lint cdf5486 Auto merge of rust-lang#133205 - matthiaskrgr:rollup-xhhhp5u, r=matthiaskrgr 543667a Rollup merge of rust-lang#133200 - RalfJung:miri-rwlock-test, r=tgross35 7430eb4 ignore an occasionally-failing test in Miri 607b493 Rollup merge of rust-lang#133182 - RalfJung:const-panic-inline, r=tgross35 e6cd122 Rollup merge of rust-lang#132758 - nnethercote:improve-get_key_value-docs, r=cuviper a3c9597 Mention std::fs::remove_dir_all in std::fs::remove_dir bd5c142 Bump `stdarch` to the latest master e84f865 const_panic: inline in bootstrap builds to avoid f16/f128 crashes 05fecb9 std: allow after-main use of synchronization primitives c1beb25 Auto merge of rust-lang#133160 - jhpratt:rollup-wzj9q15, r=jhpratt ce80c9f Rollup merge of rust-lang#133145 - kornelski:static-mutex, r=traviscross f385ac2 Auto merge of rust-lang#128219 - connortsui20:rwlock-downgrade, r=tgross35 86151ab rename rustc_const_stable_intrinsic -> rustc_intrinsic_const_stable_indirect a33f889 Improve `{BTreeMap,HashMap}::get_key_value` docs. 15e6fc0 Document alternatives to `static mut` 1cd1dd7 Auto merge of rust-lang#120370 - x17jiri:likely_unlikely_fix, r=saethlin e475f40 Likely unlikely fix ddcabfe Rollup merge of rust-lang#133126 - ohno418:fix-String-doc, r=jhpratt e4eff6a Rollup merge of rust-lang#133116 - RalfJung:const-null-ptr, r=dtolnay 16e6d20 alloc: fix `String`'s doc e4fb962 clean up const stability around UB checks ee78601 stabilize const_ptr_is_null 1e4a9ee Rollup merge of rust-lang#132449 - RalfJung:is_val_statically_known, r=compiler-errors 1dfe94c Rollup merge of rust-lang#131717 - tgross35:stabilize-const_atomic_from_ptr, r=RalfJung 70326e8 reduce threads in downgrade test d58e4f2 fix `DOWNGRADED` bit unpreserved 5d68316 fix memory ordering bug + bad test 0604b8f add safety comments for queue implementation 00255e6 add `downgrade` to `queue` implementation 40256c6 modify queue implementation documentation f804164 add `downgrade` to `futex` implementation 572aded add simple `downgrade` implementations 48bcf09 add `downgrade` method onto `RwLockWriteGuard` 5416aef add `RwLock` `downgrade` tests 4010980 Rollup merge of rust-lang#133050 - tgross35:inline-f16-f128, r=saethlin 2ee4159 Rollup merge of rust-lang#133048 - cyrgani:ptr-doc-update, r=Amanieu e1448de Rollup merge of rust-lang#133019 - sorairolake:add-missing-period-and-colon, r=tgross35 b1d31d2 Rollup merge of rust-lang#132984 - sunshowers:pipe2, r=tgross35 8cef1ef Rollup merge of rust-lang#132977 - cberner:fix_solaris, r=tgross35 daa9c43 Rollup merge of rust-lang#132790 - aDotInTheVoid:ioslice-asslice-rides-again, r=cuviper cdb5ff5 Pass `f16` and `f128` by value in `const_assert!` 60ef479 use `&raw` in `{read, write}_unaligned` documentation d2983ff Auto merge of rust-lang#132709 - programmerjake:optimize-charto_digit, r=joshtriplett 918cc8d Rollup merge of rust-lang#133027 - no1wudi:master, r=jhpratt 25f5512 Auto merge of rust-lang#133026 - workingjubilee:rollup-q8ig6ah, r=workingjubilee d8de2cc Fix a copy-paste issue in the NuttX raw type definition c06bb34 Rollup merge of rust-lang#133008 - onur-ozkan:update-outdated-comment, r=jieyouxu 8eaea39 Rollup merge of rust-lang#133004 - cuviper:unrecover-btree, r=ibraheemdev 81a191a Rollup merge of rust-lang#133003 - zachs18:clonetouninit-dyn-compat-u8, r=dtolnay e3e5e35 Rollup merge of rust-lang#132907 - BLANKatGITHUB:intrinsic, r=saethlin f57853b Rollup merge of rust-lang#131304 - RalfJung:float-core, r=tgross35 7bc0436 Auto merge of rust-lang#122770 - iximeow:ixi/int-formatting-optimization, r=workingjubilee ce2e318 docs: Fix missing colon in methods for primitive types 1870e92 docs: Fix missing period in methods for integer types 6439774 Auto merge of rust-lang#133006 - matthiaskrgr:rollup-dz6oiq5, r=matthiaskrgr 98dad0b update outdated comment about test-float-parse 520d4fd Rollup merge of rust-lang#126046 - davidzeng0:mixed_integer_ops_unsigned_sub, r=Amanieu e3c425b Auto merge of rust-lang#132662 - RalfJung:const-panic-inlining, r=tgross35 c4b77cf Update core CloneToUninit tests d4e21f5 btree: simplify the backdoor between set and map 5d61cf9 Bump `cc` 44f376b Fix compilation error on Solaris due to flock usage 75609d6 Auto merge of rust-lang#132556 - clubby789:cargo-update, r=Mark-Simulacrum 5ba28a4 Run `cargo update` and update licenses 0820004 const_panic: don't wrap it in a separate function d30e2c0 [illumos] use pipe2 to create anonymous pipes 7e12686 Auto merge of rust-lang#132883 - LaihoE:vectorized_is_sorted, r=thomcc 02e32d7 Auto merge of rust-lang#132972 - matthiaskrgr:rollup-456osr7, r=matthiaskrgr 157eb1c Rollup merge of rust-lang#132970 - tyilo:nonzero-u-div-ceil-issue, r=tgross35 03e52a5 Rollup merge of rust-lang#132966 - RalfJung:const_option_ext, r=jhpratt 2f615a1 Rollup merge of rust-lang#132948 - RalfJung:const_unicode_case_lookup, r=Noratrieb f00e091 Rollup merge of rust-lang#132851 - chansuke:update-comment, r=thomcc 6560098 Auto merge of rust-lang#132870 - Noratrieb:inline-int-parsing, r=tgross35 a0c0c40 Add tracking issue number to unsigned_nonzero_div_ceil feature c229666 Make `CloneToUninit` dyn-compatible 6ab50dd stabilize const_option_ext 27fe6c7 Rollup merge of rust-lang#132541 - RalfJung:const-stable-extern-crate, r=compiler-errors 7fafe99 stabilize const_unicode_case_lookup c5ed625 Stabilize `Ipv6Addr::is_unique_local` and `Ipv6Addr::is_unicast_link_local` e0452c9 adds new declaration to codegen 33fa870 Auto merge of rust-lang#132943 - matthiaskrgr:rollup-164l3ej, r=matthiaskrgr 7f12f02 Rollup merge of rust-lang#132914 - rcorre:cell-grammar, r=tgross35 300a266 Rollup merge of rust-lang#132895 - scottmcm:generalize-nonnull-from-raw-parts, r=ibraheemdev a461cf9 remove no-longer-needed abs_private 170e993 allow rustc_private feature in force-unstable-if-unmarked crates 4a20245 Rollup merge of rust-lang#132929 - cuviper:check-alloc_zeroed, r=tgross35 992bbf7 Rollup merge of rust-lang#132869 - lolbinarycat:library-fix-too_long_first_doc_paragraph, r=tgross35 e3925fa Rollup merge of rust-lang#132847 - RalfJung:addr-dont-expose, r=Mark-Simulacrum 327a0d7 Auto merge of rust-lang#132919 - matthiaskrgr:rollup-ogghyvp, r=matthiaskrgr 67c3c9f Check for null in the `alloc_zeroed` example 068537a new intrinsic declaration b689951 new intrinsic declaration 16fa12e Rollup merge of rust-lang#132144 - adetaylor:receiver-trait-itself, r=wesleywiser 54f699d Rollup merge of rust-lang#120077 - SUPERCILEX:set-entry, r=Amanieu e541a4f Update dangling pointer tests 7707584 Tag relevant functions with #[rustc_as_ptr] attribute b541c5a Auto merge of rust-lang#132902 - matthiaskrgr:rollup-43qgg3t, r=matthiaskrgr 2d676d4 Update grammar in std::cell docs. 7325f33 Emscripten: link with -sWASM_BIGINT 1c482c9 Rollup merge of rust-lang#130999 - cberner:flock_pr, r=joboet 4dd2270 Auto merge of rust-lang#127589 - notriddle:notriddle/search-sem-3, r=GuillaumeGomez 0af64b6 Generalize `NonNull::from_raw_parts` per ACP362 2fd9ac4 vectorize slice::is_sorted 737521c `#[inline]` integer parsing functions b9be1dd split up the first paragraph of doc comments for better summaries f9063ff Update the doc comment of `ASCII_CASE_MASK` 57c7b80 elem_offset / subslice_range: use addr() instead of 'as usize' d19aa69 Rollup merge of rust-lang#132136 - RalfJung:target-feature-abi-compat, r=Mark-Simulacrum 6b0bd5a honor rustc_const_stable_indirect in non-staged_api crate with -Zforce-unstable-if-unmarked 070baf4 Add as_slice/into_slice for IoSlice/IoSliceMut. 978a553 Rollup merge of rust-lang#132778 - lolbinarycat:io-Error-into_inner-docs, r=cuviper 6d54bfe update io::Error::into_inner to acknowlage io::Error::other 7c0a90c Address review comments ac66068 Update library/std/src/sys/pal/windows/fs.rs d90f866 Auto merge of rust-lang#132717 - RalfJung:rustc_safe_intrinsic, r=compiler-errors f2bf9e6 remove support for rustc_safe_intrinsic attribute; use rustc_intrinsic functions instead 2391b4b Rollup merge of rust-lang#132738 - cuviper:channel-heap-init, r=ibraheemdev 086cfef mark is_val_statically_known intrinsic as stably const-callable dffc5e7 Rollup merge of rust-lang#132696 - fortanix:raoul/rte-235-fix_fmodl_missing_symbol_issue, r=tgross35 f14fc56 Rollup merge of rust-lang#132639 - RalfJung:intrinsics, r=workingjubilee,Amanieu 6d63012 Initialize channel `Block`s directly on the heap 7ff251b core: move intrinsics.rs into intrinsics folder 6244f48 Auto merge of rust-lang#132714 - mati865:update-memchr, r=tgross35 a2eaef7 Rollup merge of rust-lang#132715 - tabokie:fix-lazy-lock-doc, r=Noratrieb 6a77b21 Rollup merge of rust-lang#132665 - tyilo:nonzero-u-div-ceil, r=joboet 79d2063 Separate f128 `%` operation to deal with missing `fmodl` symbol 8022523 Auto merge of rust-lang#132705 - kornelski:inline-repeat, r=tgross35 df9f5db fix lazylock comment 7a82eb5 Auto merge of rust-lang#131888 - ChrisDenton:deopt, r=ibraheemdev 75b9ce3 unpin and update memchr 4d1c7d9 optimize char::to_digit and assert radix is at least 2 95bff3e Inline str::repeat 52c2a45 Rollup merge of rust-lang#132617 - uellenberg:fix-rendered-doc, r=cuviper 28f7e7b Auto merge of rust-lang#131721 - okaneco:const_eq_ignore_ascii_case, r=m-ou-se 41b7e5f Auto merge of rust-lang#132500 - RalfJung:char-is-whitespace-const, r=jhpratt 4ed08bd Add new unstable feature `const_eq_ignore_ascii_case` f4e9fe4 Auto merge of rust-lang#132664 - matthiaskrgr:rollup-i27nr7i, r=matthiaskrgr afc66fe Change some code blocks to quotes in rendered std doc 2e63cbd Rollup merge of rust-lang#131261 - clarfonthey:unsafe-cell-from-mut, r=m-ou-se ab6f663 Auto merge of rust-lang#132661 - matthiaskrgr:rollup-npytbl6, r=matthiaskrgr 8b165db Implement div_ceil for NonZero<unsigned> 6bc1b1b Rollup merge of rust-lang#132571 - RalfJung:const_eval_select_macro, r=oli-obk c12f4d1 Rollup merge of rust-lang#132473 - ZhekaS:core_fmt_radix_no_panic, r=joboet bbb9275 Rollup merge of rust-lang#132153 - bjoernager:const-char-encode-utf16, r=dtolnay 919de70 add const_eval_select macro to reduce redundancy 538f5b4 Rollup merge of rust-lang#132609 - NotWearingPants:patch-1, r=Amanieu 86c6f27 Rollup merge of rust-lang#132606 - eduardosm:char-slice-str-pattern-doc, r=tgross35 4660d7e most const intrinsics don't need an explicit rustc_const_unstable any more 8eb30fe add new rustc_const_stable_intrinsic attribute for const-stable intrinsics 792d164 convert all const-callable intrinsics into the new form (without extern block) fad7d68 docs: fix grammar in doc comment at unix/process.rs 92bb779 Improve example of `impl Pattern for &[char]` 553bb18 Add AsyncFn* to to the prelude in all editions 2ae24bf Fixed typo, rebased 47f60d7 Updated SAFETY comment to address underflow 581aa8d Replace checked slice indexing by unchecked to support panic-free code c5a0f6c Rollup merge of rust-lang#132579 - RalfJung:rustc-std-workspace-crates, r=Amanieu 9cdbf39 btree: don't leak value if destructor of key panics 4caff13 Stabilise 'const_char_encode_utf16'; 84fae7e Auto merge of rust-lang#132586 - workingjubilee:rollup-qrmn49a, r=workingjubilee 95b4127 update rustc-std-workspace crates 082b98d Rollup merge of rust-lang#132423 - RalfJung:const-eval-align-offset, r=dtolnay 3b40634 Auto merge of rust-lang#132434 - tgross35:f128-tests, r=workingjubilee 5dea8b2 Enable `f128` tests on all non-buggy platforms 🎉 2bb8ea3 Auto merge of rust-lang#132581 - workingjubilee:rollup-4wj318p, r=workingjubilee 83bd286 Update `compiler_builtins` to 0.1.138 and pin it 699702f Rollup merge of rust-lang#132563 - frectonz:master, r=Amanieu 4390c35 Auto merge of rust-lang#123723 - madsmtm:apple-std-os, r=dtolnay 1e8ed90 Auto merge of rust-lang#132479 - compiler-errors:fx-feat-yeet, r=fee1-dead 9a3b7c0 Rename the FIXMEs, remove a few that dont matter anymore ed4f110 Auto merge of rust-lang#132542 - RalfJung:const_panic, r=tgross35 d8bca01 remove const-support for align_offset 76b866c Modify `NonZero` documentation to reference the underlying integer type 9e57964 Rollup merge of rust-lang#132511 - RalfJung:const_arguments_as_str, r=dtolnay bfeeb74 Rollup merge of rust-lang#132503 - RalfJung:const-hash-map, r=Amanieu a42fc21 Rollup merge of rust-lang#132499 - RalfJung:unicode_data.rs, r=tgross35 0278cab Rollup merge of rust-lang#132393 - zedddie16:issue-131865-fix, r=tgross35 714115a Rollup merge of rust-lang#131377 - rick-de-water:nonzero-exp, r=dtolnay 9789c54 Rollup merge of rust-lang#129329 - eduardosm:rc-from-mut-slice, r=dtolnay ff9178b add const_panic macro to make it easier to fall back to non-formatting panic in const 9ef483b stabilize const_arguments_as_str 4c6593f Auto merge of rust-lang#132458 - RalfJung:rustc-const-unstable, r=Amanieu 81b20e0 Rustdoc: added brief colon explanation 73d9f4d Add Set entry API e883a60 Add BorrowedBuf::into_filled{,_mut} methods to allow returning buffer with original lifetime 261c5b9 remove const_hash feature leftovers d515da6 const_with_hasher test: actually construct a usable HashMap 11dc6c3 make char::is_whitespace unstably const 1a481fd unicode_data.rs: show command for generating file 3a5b026 get rid of a whole bunch of unnecessary rustc_const_unstable attributes 2e24b7f remove no-longer-needed attribute ffbcba0 add missing safety comments 768d0cd adjust test gating for f16/f128 6335056 float types: move copysign, abs, signum to libcore c353337 rustdoc-search: simplify rules for generics and type params 9d10ab7 Implement `From<&mut {slice}>` for `Box/Rc/Arc<{slice}>` 34329c0 Stabilize `const_atomic_from_ptr` a2e1edf Arbitrary self types v2: (unused) Receiver trait 2d26681 ABI compatibility: remove section on target features f1c9904 Support lock() and lock_shared() on async IO Files 7f6af4d Revert using `HEAP` static in Windows alloc 541bda1 Implement file_lock feature d7a7b0a uefi: process: Add args support 14aef3d Use with_capacity(0) because we're reading the capacity later on 5b16abe Prefer `target_vendor = "apple"` on confstr bc63981 use `confstr(_CS_DARWIN_USER_TEMP_DIR, ...)` as a `TMPDIR` fallback on darwin f8dc879 Add LowerExp and UpperExp implementations 50afc52 Stabilize UnsafeCell::from_mut aa74e93 Mark 'get_mut' and 'set_position' in 'std::io::Cursor' as const; c370665 Make `std::os::darwin` public 797c249 Implement `mixed_integer_ops_unsigned_sub` ff1212e Add vec_deque::Iter::as_slices and friends e938dea try adding a test that LowerHex and friends don't panic, but it doesn't work c6d2bb7 improve codegen of fmt_num to delete unreachable panic git-subtree-dir: library git-subtree-split: 78fc550
it seems LLVM doesn't realize that
curr
is always decremented at least once in either loop formatting characters of the input string by their appropriate radix, and so the later&buf[curr..]
generates a check for out-of-bounds access and panic. this is unreachable in reality as even forx == T::zero()
we'll produce at least the characterSelf::digit(T::zero())
, yielding at least one character output, andcurr
will always be at least one belowbuf.len()
.adjust
fmt_int
to make this fact more obvious to the compiler, which fortunately (or unfortunately) results in a measurable performance improvement for workloads heavy on formatting integers.in the program i'd noticed this in, you can see the
cmp $0x80,%rdi; ja 7c
here, which branches to a slice index fail helper:where after this change the function is broadly similar, but smaller, with one fewer registers updated in each pass through the loop in addition the never-taken
cmp/ja
being gone:this represents a ~2-3% difference in runtime in my admittedly comically i32-formatting-bound use case (printing x86 instructions, including i32 displacements and immediates) as measured on a ryzen 9 3950x.
the impact on
<impl LowerHex for i8>::fmt
is both more dramatic and less impactful: it continues to have a loop that is evaluated at most twice, though the compiler doesn't know that to unroll it. the generated code there is identical to the impl fori32
. there, the smaller loop body has less effect on runtime, and removing the never-taken slice bounds check is offset by whatever address recalculation is happening with thelea/add/neg
at the end of the loop. it behaves about the same before and after.i initially measured slightly better outcomes using
unreachable_unchecked()
here instead, but that was hacking on std and rebuilding with-Z build-std
on an older rustc (nightly 5b377cece, 2023-06-30). it does not yield better outcomes now, so i see no reason to proceed with that approach at all.initial notes about that, seemingly irrelevant on modern rustc
i went through a few tries at getting llvm to understand the bounds check isn't necessary, but i should mention the _best_ i'd seen here was actually from the existing `fmt_int` with a diff like ```diff if x == zero { // No more digits left to accumulate. break; }; } } + + if curr >= buf.len() { + unsafe { core::hint::unreachable_unchecked(); } + } let buf = &buf[curr..]; ```posting a random PR to
rust-lang/rust
to do that without a really really compelling reason seemed a bit absurd, so i tried to work that into something that seems more palatable at a glance. but if you're interested, that certainly produced better (x86_64) code through LLVM. in that case withbuf.iter_mut().rev()
as the iterator,<impl LowerHex for i8>::fmt
actually unrolls into something likeit's pretty cool!
<impl LowerHex for i32>::fmt
also was slightly better. that all resulted in closer to an 6% difference in my use case.i have not looked at formatters other than LowerHex/UpperHex with this change, though i'd be a bit shocked if any were worse.
(i have absolutely no idea how you'd regression test this, but that might be just my not knowing what the right tool for that would be in rust-lang/rust. i'm of half a mind that this is small and fiddly enough to not be worth landing lest it quietly regress in the future anyway. but i didn't want to discard the idea without at least offering it upstream here)