Polonius investigation #4

CohenArthur · 2022-03-15T17:23:24Z

No description provided.

With many thanks to H.J. for doing all the hard work, this patch resolves two P1 regressions; PR target/106933 and PR target/106959. Although superficially similar, the i386 backend's two scalar-to-vector (STV) passes perform their transformations in importantly different ways. The original pass converting SImode and DImode operations to V4SImode or V2DImode operations is "soft", allowing values to be maintained in both integer and vector hard registers. The newer pass converting TImode operations to V1TImode is "hard" (all or nothing) that converts all uses of a pseudo to vector form. To implement this it invokes powerful ju-ju calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates this pseudo's mode everywhere in the RTL chain. Hence, TImode STV can only be performed when all uses of a pseudo are convertible to V1TImode form. To ensure this the STV passes currently use data-flow analysis to inspect all DEFs and USEs in a chain. This works fine for chains that are in the usual single assignment form, but the occurrence of uninitialized variables, or multiple assignments that split a pseudo's usage into several independent chains (lifetimes) can lead to situations where some but not all of a pseudo's occurrences need to be updated. This is safe for the SImode/DImode pass, but leads to the above bugs during the TImode pass. My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959 is to only perform the new single_def_chain_p check for TImode STV; it turns out that STV of SImode/DImode min/max operates safely on multiple-def chains, and prohibiting this leads to testsuite regressions. We don't (yet) support V1TImode min/max, so this idiom isn't an issue during the TImode STV pass. For the record, the two alternate possible fixes are (i) make the TImode STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx with a new pseudo, or (ii) merging "chains" so that multiple DFA chains/lifetimes are considered a single STV chain. 2022-12-23 H.J. Lu <hjl.tools@gmail.com> Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/106933 PR target/106959 * config/i386/i386-features.cc (single_def_chain_p): New predicate function to check that a pseudo's use-def chain is in SSA form. (timode_scalar_to_vector_candidate_p): Check that TImode regs that are SET_DEST or SET_SRC of an insn match/are single_def_chain_p. gcc/testsuite/ChangeLog PR target/106933 PR target/106959 * gcc.target/i386/pr106933-1.c: New test case. * gcc.target/i386/pr106933-2.c: Likewise. * gcc.target/i386/pr106959-1.c: Likewise. * gcc.target/i386/pr106959-2.c: Likewise. * gcc.target/i386/pr106959-3.c: Likewise.

The aarch64 ISA specification allows a left shift amount to be applied after extension in the range of 0 to 4 (encoded in the imm3 field). This is true for at least the following instructions: * ADD (extend register) * ADDS (extended register) * SUB (extended register) The result of this patch can be seen, when compiling the following code: uint64_t myadd(uint64_t a, uint64_t b) { return a+(((uint8_t)b)<<4); } Without the patch the following sequence will be generated: 0000000000000000 <myadd>: 0: d37c1c21 ubfiz x1, x1, #4, #8 4: 8b000020 add x0, x1, x0 8: d65f03c0 ret With the patch the ubfiz will be merged into the add instruction: 0000000000000000 <myadd>: 0: 8b211000 add x0, x0, w1, uxtb #4 4: d65f03c0 ret gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_uxt_size): fix an off-by-one in checking the permissible shift-amount.

This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.

Here we have template<class T> auto is_throwable(T t) -> decltype(throw t, true) { ... } where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused the wrong overload to have been chosen. Jason figured out it's because we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266 says that an id-expression is move-eligible if "the id-expression (possibly parenthesized) is the operand of a throw-expression, and names an implicitly movable entity that belongs to a scope that does not contain the compound-statement of the innermost lambda-expression, try-block, or function-try-block (if any) whose compound-statement or ctor-initializer contains the throw-expression." I worked out that it's trying to say that given struct X { X(); X(const X&); X(X&&) = delete; }; the following should fail: the scope of the throw is an sk_try, and it's also x's scope S, and S "does not contain the compound-statement of the *try-block" so x is move-eligible, so we move, so we fail. void f () try { X x; throw x; // use of deleted function } catch (...) { } Whereas here: void g (X x) try { throw x; } catch (...) { } the throw is again in an sk_try, but x's scope is an sk_function_parms which *does* contain the {} of the *try-block, so x is not move-eligible, so we don't move, so we use X(const X&), and the code is fine. The current code also doesn't seem to handle void h (X x) { void z (decltype(throw x, true)); } where there's no enclosing lambda or sk_try so we should move. I'm not doing anything about lambdas because we shouldn't reach the code at the end of the function: the DECL_HAS_VALUE_EXPR_P check shouldn't let us go further. PR c++/113789 PR c++/113853 gcc/cp/ChangeLog: * typeck.cc (treat_lvalue_as_rvalue_p): Update code to better reflect [expr.prim.id.unqual]#4.2. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/sfinae69.C: Remove dg-bogus. * g++.dg/cpp0x/sfinae70.C: New test. * g++.dg/cpp0x/sfinae71.C: New test. * g++.dg/cpp0x/sfinae72.C: New test. * g++.dg/cpp2a/implicit-move4.C: New test.

…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case.

Whenever C1 and C2 are integer constants, X is of a wrapping type, and cmp is a relational operator, the expression X +- C1 cmp C2 can be simplified in the following cases: (a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial comparison in the following way: X +- C1 <= C2 -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1) -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions) -INF -+ C1 <= X <= +INF (due to (1)) -INF -+ C1 <= X (eliminate the right hand side since it holds for any X) (b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following sequence of transformations: X +- C1 >= C2 +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1) +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions) +INF -+ C1 >= X >= -INF (due to (1)) +INF -+ C1 >= X (eliminate the right hand side since it holds for any X) (c) The > and < cases are negations of (a) and (b), respectively. This transformation allows to occasionally save add / sub instructions, for instance the expression 3 + (uint32_t)f() < 2 compiles to cmn w0, #4 cset w0, ls instead of add w0, w0, 3 cmp w0, 2 cset w0, ls on aarch64. Testcases that go together with this patch have been split into two separate files, one containing testcases for unsigned variables and the other for wrapping signed ones (and thus compiled with -fwrapv). Additionally, one aarch64 test has been adjusted since the patch has caused the generated code to change from cmn w0, #2 csinc w0, w1, wzr, cc (x < -2) to cmn w0, #3 csinc w0, w1, wzr, cs (x <= -3) This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-2.c: New test. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.

CohenArthur force-pushed the polonius-investigation branch from b87aa4a to 898d299 Compare April 26, 2022 14:27

CohenArthur force-pushed the polonius-investigation branch from 898d299 to f911f03 Compare May 18, 2022 15:05

CohenArthur force-pushed the polonius-investigation branch from f911f03 to 442a225 Compare July 11, 2022 14:04

CohenArthur force-pushed the polonius-investigation branch from 442a225 to 7422d38 Compare July 29, 2022 14:03

CohenArthur force-pushed the polonius-investigation branch 2 times, most recently from 2a796cc to dad0d47 Compare August 29, 2022 15:08

CohenArthur force-pushed the polonius-investigation branch 2 times, most recently from 9e0a229 to 30cd88a Compare October 5, 2022 07:59

CohenArthur and others added 22 commits October 5, 2022 10:02

polonius: scrape: Add testsuite results for rustc with polonius

aaadc73

polonius: scrape: Fix formatting

32b83b3

polonius: scrape: Fix formatting of testsuite diffs

eea6049

polonius: Add base for gccrs-polonius ffi layer

4649399

polonius: wip: Add base for borrow-checker

542bae2

polonius: ffi: Fmt

adb3a84

polonius: ffi: Add logging to var_used_at()

fae0614

polonius: ffi: Interface with var_used_at

9d200a9

borrow-checker: Add more visitors

193e8e3

make: fixup

60f0617

polonius: Keep track of the resolver inside the track

6282b97

polonius: fixup

85318a1

polonius: Lookup variable usage properly

5c56570

borrow-checker: Add base

f64aae3

borrowck: Rework architecture and cleanup

5d2680f

borrowck: Do more in the borrow checker

a216c90

polonius: Add important note about goal implementation

e8f1b68

polonius: f: Remove visitors for IdentifierExpr

d4ba3dd

ffi-polonius: Add safety comment on polonius_borrow_var

29dc523

poloinus: Wip documentation

1580126

ffi-polonius: Fix safety documentation

eb7afcc

ffi-polonius: Add logging using env_logger

30cd88a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polonius investigation #4

Polonius investigation #4

CohenArthur commented Mar 15, 2022

Polonius investigation #4

Are you sure you want to change the base?

Polonius investigation #4

Conversation

CohenArthur commented Mar 15, 2022