-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge from my "master" branch #3
Merge from my "master" branch #3
Conversation
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model instead of SLP. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5.
Update in v2 * Remove emit helper functions. * Take expand_binop instead. Original log: This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 slli a2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addi a5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addi a4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add handling for shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
The following improves basic TBAA for access paths formed by C++ abstraction where we are able to combine a path from an address-taking operation with a path based on that access using a pun to avoid memory access semantics on the address-taking part. The trick is to identify the point the semantic memory access path starts which allows us to use the alias set of the outermost access instead of only that of the base of this path. PR tree-optimization/111715 * alias.cc (reference_alias_ptr_type_1): When we have a type-punning ref at the base search for the access path part that's still semantically valid. * gcc.dg/tree-ssa/ssa-fre-102.c: New testcase.
When generating CFI directives for the store-pair instruction, if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp) (const_int 8 [0x8])) [1 S8 A64]) (reg:DI 1 ra)) (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1 S8 A64]) (reg:DI 8 s0)) only the first expr_list will be recognized by dwarf2out_frame_debug funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR, which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the dwarf2out_frame_debug_expr function will iterate through all the sub-expressions and generate the corresponding CFI directives. gcc/ * config/riscv/thead.cc (th_mempair_save_regs): Fix missing CFI directives for store-pair instruction. gcc/testsuite/ * gcc.target/riscv/xtheadmempair-4.c: New test.
This patch fixed these following FAILs in regressions: FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects scan-tree-dump-not optimized "Invalid sum" FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum" Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit: gcc-mirror@f7bff24 I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern. However, after deep investigation && reading RVV ISA again and experiment on SPIKE, I realized I was wrong. RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints "If an element accessed by a vector memory instruction is not naturally aligned to the size of the element, either the element is transferred successfully or an address misaligned exception is raised on that element." It's obvious that RVV ISA does allow misaligned vector load/store. And experiment and confirm on SPIKE: [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader z 0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48 tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000 s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018 a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71 a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000 s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000 s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000 t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620 Store/AMO access fault! [jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64 a.out bbl loader We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE. So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since it can improve multiple vectorization tests and fix dumple FAILs. This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled). Consider this following case: struct s { unsigned i : 31; char a : 4; }; #define N 32 #define ELT0 {0x7FFFFFFFUL, 0} #define ELT1 {0x7FFFFFFFUL, 1} #define ELT2 {0x7FFFFFFFUL, 2} #define ELT3 {0x7FFFFFFFUL, 3} #define RES 48 struct s A[N] = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; int __attribute__ ((noipa)) f(struct s *ptr, unsigned n) { int res = 0; for (int i = 0; i < n; ++i) res += ptr[i].a; return res; } -O3 -S -fno-vect-cost-model (default strict-align): f: mv a4,a0 beq a1,zero,.L9 addiw a5,a1,-1 li a3,14 vsetivli zero,16,e64,m8,ta,ma bleu a5,a3,.L3 andi a5,a0,127 bne a5,zero,.L3 srliw a3,a1,4 slli a3,a3,7 li a0,15 slli a0,a0,32 add a3,a3,a4 mv a5,a4 li a2,32 vmv.v.x v16,a0 vsetvli zero,zero,e32,m4,ta,ma vmv.v.i v4,0 .L4: vsetvli zero,zero,e64,m8,ta,ma vle64.v v8,0(a5) addi a5,a5,128 vand.vv v8,v8,v16 vsetvli zero,zero,e32,m4,ta,ma vnsrl.wx v8,v8,a2 vadd.vv v4,v4,v8 bne a5,a3,.L4 li a3,0 andi a5,a1,15 vmv.s.x v1,a3 andi a3,a1,-16 vredsum.vs v1,v4,v1 vmv.x.s a0,v1 mv a2,a0 beq a5,zero,.L15 slli a5,a3,3 add a5,a4,a5 lw a0,4(a5) andi a0,a0,15 addiw a4,a3,1 addw a0,a0,a2 bgeu a4,a1,.L15 lw a2,12(a5) andi a2,a2,15 addiw a4,a3,2 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,20(a5) andi a2,a2,15 addiw a4,a3,3 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,28(a5) andi a2,a2,15 addiw a4,a3,4 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,36(a5) andi a2,a2,15 addiw a4,a3,5 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,44(a5) andi a2,a2,15 addiw a4,a3,6 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,52(a5) andi a2,a2,15 addiw a4,a3,7 addw a0,a2,a0 bgeu a4,a1,.L15 lw a4,60(a5) andi a4,a4,15 addw a4,a4,a0 addiw a2,a3,8 mv a0,a4 bgeu a2,a1,.L15 lw a0,68(a5) andi a0,a0,15 addiw a2,a3,9 addw a0,a0,a4 bgeu a2,a1,.L15 lw a2,76(a5) andi a2,a2,15 addiw a4,a3,10 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,84(a5) andi a2,a2,15 addiw a4,a3,11 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,92(a5) andi a2,a2,15 addiw a4,a3,12 addw a0,a2,a0 bgeu a4,a1,.L15 lw a2,100(a5) andi a2,a2,15 addiw a4,a3,13 addw a0,a2,a0 bgeu a4,a1,.L15 lw a4,108(a5) andi a4,a4,15 addiw a3,a3,14 addw a0,a4,a0 bgeu a3,a1,.L15 lw a5,116(a5) andi a5,a5,15 addw a0,a5,a0 ret .L9: li a0,0 .L15: ret .L3: mv a5,a4 slli a4,a1,32 srli a1,a4,29 add a1,a5,a1 li a0,0 .L7: lw a4,4(a5) andi a4,a4,15 addi a5,a5,8 addw a0,a4,a0 bne a5,a1,.L7 ret -O3 -S -mno-strict-align -fno-vect-cost-model: f: beq a1,zero,.L4 slli a1,a1,32 li a5,15 vsetvli a4,zero,e64,m1,ta,ma slli a5,a5,32 srli a1,a1,32 li a6,32 vmv.v.x v3,a5 vsetvli zero,zero,e32,mf2,ta,ma vmv.v.i v2,0 .L3: vsetvli a5,a1,e64,m1,ta,ma vle64.v v1,0(a0) vsetvli a3,zero,e64,m1,ta,ma slli a2,a5,3 vand.vv v1,v1,v3 sub a1,a1,a5 vsetvli zero,zero,e32,mf2,ta,ma add a0,a0,a2 vnsrl.wx v1,v1,a6 vsetvli zero,a5,e32,mf2,tu,ma vadd.vv v2,v2,v1 bne a1,zero,.L3 li a5,0 vsetvli a3,zero,e32,mf2,ta,ma vmv.s.x v1,a5 vredsum.vs v2,v2,v1 vmv.x.s a0,v2 ret .L4: li a0,0 ret We can see it improves this case codegen a lot. gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro. * config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern. * config/riscv/vector.md (movmisalign<mode>): New pattern.
This adds a pipeline description for a generic out-of-order core. Latency and units are not based on any real processor but more or less educated guesses what such a processor would look like. In order to account for latency scaling by LMUL != 1, sched_adjust_cost is implemented. It will scale an instruction's latency by its LMUL so an LMUL == 8 instruction will take 8 times the number of cycles the same instruction with LMUL == 1 would take. As this potentially causes very high latencies which, in turn, might lead to scheduling anomalies and a higher number of vsetvls emitted this feature is only enabled when specifying -madjust-lmul-cost. Additionally, in order to easily recognize pre-RA vsetvls this patch introduces an insn type vsetvl_pre which is used in sched_adjust_cost. In the future we might also want a latency adjustment similar to lmul for reductions, i.e. make the latency dependent on the type and its number of units. gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add generic_ooo. * config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement scheduler hook. (TARGET_SCHED_ADJUST_COST): Define. * config/riscv/riscv.md (no,yes"): Include generic-ooo.md * config/riscv/riscv.opt: Add -madjust-lmul-cost. * config/riscv/generic-ooo.md: New file. * config/riscv/vector.md: Add vsetvl_pre.
Like ARM SVE, RVV is vectorizing these 2 cases in the same way. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-23.c: Add RVV like ARM SVE. * gcc.dg/vect/slp-perm-10.c: Ditto.
RVV vectortizes this case with stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.
This case is vectorized by stride8 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP with -fno-vect-cost-model. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8. * gcc.dg/vect/pr97832-3.c: Ditto. * gcc.dg/vect/pr97832-4.c: Ditto.
RVV vectorize it with stride5 load_lanes. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.
Turns out we didnt need this as there is no unordered relations managed by the oracle. * gimple-range-gori.cc (gori_compute::compute_operand1_range): Do not call get_identity_relation. (gori_compute::compute_operand2_range): Ditto. * value-relation.cc (get_identity_relation): Remove. * value-relation.h (get_identity_relation): Remove protyotype.
A floating point equivalence may not properly reflect both signs of zero, so be pessimsitic and ensure both signs are included. PR tree-optimization/111694 gcc/ * gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust equivalence range. * value-relation.cc (adjust_equivalence_range): New. * value-relation.h (adjust_equivalence_range): New prototype. gcc/testsuite/ * gcc.dg/pr111694.c: New.
gcc/analyzer/ChangeLog: * access-diagram.cc (boundaries::add): Explicitly state "boundaries::" scope for "kind" enum. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Verifier checks have recently been strengthened to check that all counts and probabilities are initialized. The checks fired during autoprofiledbootstrap build and this patch fixes it. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons * tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count when scaling loop profile
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN from M1 to M8. For example, when TARGET_MIN_VLEN = 128 bits, we enable 128/256/512/1024 bits VLS modes. This patch fixes following FAIL: FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optimized: basic block" 2 FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2 gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add 256/512/1024
Refurbish add compare patterns: use 'r' constraint, fix identation, and fix pattern to match 'if (a+b) { ... }' constructions. gcc/ * config/arc/arc.cc (arc_select_cc_mode): Match NEG code with the first operand. * config/arc/arc.md (addsi_compare): Make pattern canonical. (addsi_compare_2): Fix identation, constraint letters. (addsi_compare_3): Likewise. gcc/testsuite/ * gcc.target/arc/add_f-combine.c: New test. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
Here is the reference comparing dump IR between ARM SVE and RVV. https://godbolt.org/z/zqess8Gss We can see RVV has one more dump IR: optimized: basic block part vectorized using 128 byte vectors since RVV has 1024 bit vectors. The codegen is reasonable good. However, I saw GCN also has 1024 bit vector. This patch may cause this case FAIL in GCN port ? Hi, GCN folk, could you check this patch in GCN port for me ? gcc/testsuite/ChangeLog: * gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant. * lib/target-supports.exp: Ditto.
The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made us cautionous about CSEing a load to an object that has padding bits. The added check also triggers for BLKmode entities like STRING_CSTs but by definition a BLKmode entity does not have padding bits. PR tree-optimization/111751 * tree-ssa-sccvn.cc (visit_reference_op_load): Exempt BLKmode result from the padding bits check.
Add testcase for PR111751 which has been fixed: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html PR target/111751 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr111751.c: New test.
…ning gcc/ada/ * sem_attr.adb (Analyze_Attribute): Protect the frontend against replacing 'Size by its static value if 'Size is not known at compile time and we are processing pragmas Compile_Time_Warning or Compile_Time_Errors.
The concept of extended nodes was retired at the same time Gen_IL was introduced, but there was a reference to that concept left over in a comment. This patch removes that reference. Also, the description of the field Comes_From_Check_Or_Contract was incorrectly placed in a section for fields present in all nodes in sinfo.ads. This patch fixes this. gcc/ada/ * atree.ads, nlists.ads, types.ads: Remove references to extended nodes. Fix typo. * sinfo.ads: Likewise and fix position of Comes_From_Check_Or_Contract description.
This patch fixes the behavior of Ada.Directories.Search when being requested to filter out regular files or directories. One of the configurations in which that behavior was incorrect was that when the caller requested only the regular and special files but not the directories, the directories would still be returned. gcc/ada/ * libgnat/a-direct.adb: Fix filesystem entry filtering.
This occurs when one of the types has an incomplete declaration in addition to its full declaration in its package. In this case AI05-129 says that the incomplete type is not part of the limited view of the package, i.e. only the full view is. Now, in the GNAT implementation, it's the opposite in the regular view of the package, i.e. the incomplete type is the visible one. That's why the implementation needs to also swap the types on the visibility chain while it is swapping the views when the clauses are either installed or removed. This works correctly for the installation, but does not for the removal, so this change rewrites the code doing the latter. gcc/ada/ PR ada/111434 * sem_ch10.adb (Replace): New procedure to replace an entity with another on the homonym chain. (Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the sake of consistency. Call Replace to do the replacements and split the code into the regular and the special cases. Add debuggging output controlled by -gnatdi. (Install_With_Clause): Print the Parent_With and Implicit_With flags in the debugging output controlled by -gnatdi. (Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite using a direct replacement of E4 by E2. Call Replace to do the replacements. Add debuggging output controlled by -gnatdi.
This happens when the conditional expression is immediately returned, for example in an expression function. gcc/ada/ * exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Return true if the aggregate is a dependent expression of a conditional expression being returned from a build-in-place function.
It is only called once. gcc/ada/ * sem_util.ads (Set_Scope_Is_Transient): Delete. * sem_util.adb (Set_Scope_Is_Transient): Likewise. * exp_ch7.adb (Create_Transient_Scope): Set Is_Transient directly.
The purpose of this patch is to work around false-positive warnings emitted by GNAT SAS (also known as CodePeer). It does not change the behavior of the modified subprogram. gcc/ada/ * libgnat/a-direct.adb (Start_Search_Internal): Tweak subprogram body.
…component This is a small bug present on strict-alignment platforms for questionable representation clauses. gcc/ada/ * gcc-interface/decl.cc (inline_status_for_subprog): Minor tweak. (gnat_to_gnu_field): Try harder to get a packable form of the type for a bitfield.
…etation The following ups the limit in fold_view_convert_expr to handle 1024bit vectors as used by GCN and RVV. It also robustifies the handling in visit_reference_op_load to properly give up when constants cannot be re-interpreted. PR tree-optimization/111751 * fold-const.cc (fold_view_convert_expr): Up the buffer size to 128 bytes. * tree-ssa-sccvn.cc (visit_reference_op_load): Special case constants, giving up when re-interpretation to the target type fails.
When ifcvt was initially added masking was not a thing and as such it was rather conservative in what it supported. For builtins it only allowed C99 builtin functions which it knew it can fold away. These days the vectorizer is able to deal with needing to mask IFNs itself. vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after the operation to emulate the masking. This is then used by match.pd to conver the IFN into a masked variant if it's available. For these reasons the restriction in ifconvert is no longer require and we needless block vectorization when we can effectively handle the operations. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Note: This patch is part of a testseries and tests for it are added in the AArch64 patch that adds supports for the optab. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
This rewrites the simd MOV patterns to use the new compact syntax. No change in semantics is expected. This will be needed in follow on patches. This also merges the splits into the define_insn which will also be needed soon. gcc/ChangeLog: PR tree-optimization/109154 * config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>): Rewrite to new syntax. (*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in splits.
This refactors the code to remove the args cache and index lookups in favor of a single structure. It also again, removes the use of std::sort as previously requested but avoids the new asserts in trunk. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove. (typedef struct ifcvt_arg_entry): New. (cmp_arg_entry): New. (gen_phi_arg_condition, gen_phi_nest_statement, predicate_scalar_phi): Use them.
This adds support for the minimum OS version data in assembler files. At present, we have no mechanism to detect the SDK version in use, and so that is omitted from build_versions. We follow the implementation in clang, '.build_version' is only emitted (where supported) for target macOS versions >= 10.14. For earlier macOS we fall back to using a '.macosx_version_min' directive. This latter is also emitted when the assembler supports it, but not build_version. gcc/ChangeLog: * config.in: Regenerate. * config/darwin.cc (darwin_file_start): Add assembler directives for the target OS version, where these are supported by the assembler. (darwin_override_options): Check for building >= macOS 10.14. * configure: Regenerate. * configure.ac: Check for assembler support of .build_version directives. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog: PR target/111093 * config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error instead of an assert ICE when no -march= has been specified.
GCC ICEs on the first testcase. Successful match_uaddc_usubc ends up with some dead stmts which DCE will remove (hopefully) later all. The ICE is because one of the dead stmts refers to a freed SSA_NAME. The code already gsi_removes a couple of stmts in the /* Remove some statements which can't be kept in the IL because they use SSA_NAME whose setter is going to be removed too. */ section for the same reason (the reason for the freed SSA_NAMEs is that we don't really have a replacement for those cases - all we have after a match is combined overflow from the addition/subtraction of 2 operands + a [0, 1] carry in, but not the individual overflows from the former 2 additions), but for the last (most significant) limb case, where we try to match x = op1 + op2 + carry1 + carry2; or x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but left around the 2 temporary stmts as dead; if we were unlucky enough that those referenced the carry flag that went away, it ICEs. So, the following patch remembers those temporary statements (rather than trying to rediscover them more expensively) and removes them before the final one is replaced. While working on it, I've noticed we didn't support all the reassociated possibilities of writing the addition of 4 operands or subtracting 3 operands from one, we supported e.g. x = ((op1 + op2) + op3) + op4; x = op1 + ((op2 + op3) + op4); but not x = (op1 + (op2 + op3)) + op4; x = op1 + (op2 + (op3 + op4)); Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what we were searching for (if non-NULL) - rhs[0] is inspected in the first loop and has different handling for the MINUS_EXPR case. 2023-10-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/111845 * tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary statements for the 4 operand addition or subtraction of 3 operands from 1 operand cases and remove them when successful. Look for nested additions even from rhs[2], not just rhs[1]. * gcc.dg/pr111845.c: New test. * gcc.target/i386/pr111845.c: New test.
gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute to avoid that auxillary statement list reaches LTO. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-13a.f90: New test.
In the discussion of promoting some pedwarns to be errors by default, rather than move them all into -fpermissive it seems to me to make sense to support DK_PERMERROR with an option flag. This way will also work with -fpermissive, but users can also still use -Wno-error=narrowing to downgrade that specific diagnostic rather than everything affected by -fpermissive. So, for diagnostics that we want to make errors by default we can just change the pedwarn call to permerror. The tests check desired behavior for such a permerror in a system header with various flags. The patch preserves the existing permerror behavior of ignoring -w and system headers by default, but respecting them when downgraded to a warning by -fpermissive. This seems similar to but a bit better than the approach of forcing -pedantic-errors that I previously used for -Wnarrowing: specifically, in that now -w by itself is not enough to silence the -Wnarrowing error (integer-pack2.C). gcc/ChangeLog: * doc/invoke.texi: Move -fpermissive to Warning Options. * diagnostic.cc (update_effective_level_from_pragmas): Remove redundant system header check. (diagnostic_report_diagnostic): Move down syshdr/-w check. (diagnostic_impl): Handle DK_PERMERROR with an option number. (permerror): Add new overloads. * diagnostic-core.h (permerror): Declare them. gcc/cp/ChangeLog: * typeck2.cc (check_narrowing): Use permerror. gcc/testsuite/ChangeLog: * g++.dg/ext/integer-pack2.C: Add -fpermissive. * g++.dg/diagnostic/sys-narrow.h: New test. * g++.dg/diagnostic/sys-narrow1.C: New test. * g++.dg/diagnostic/sys-narrow1a.C: New test. * g++.dg/diagnostic/sys-narrow1b.C: New test. * g++.dg/diagnostic/sys-narrow1c.C: New test. * g++.dg/diagnostic/sys-narrow1d.C: New test. * g++.dg/diagnostic/sys-narrow1e.C: New test. * g++.dg/diagnostic/sys-narrow1f.C: New test. * g++.dg/diagnostic/sys-narrow1g.C: New test. * g++.dg/diagnostic/sys-narrow1h.C: New test. * g++.dg/diagnostic/sys-narrow1i.C: New test.
Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile cse.cc which contains a variable with rtx_def type, because rtx_def contains a union with poly_uint16 element. poly_int template has defaulted default constructor and a variadic template constructor which could have empty parameter pack. GCC < 5 treated it as non-trivially constructible class and deleted rtunion and rtx_def default constructors. For the cse_insn purposes, all we need is a variable with size and alignment of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and fill in like rtx is normally allocated from heap, so this patch for GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment. 2023-10-18 Jakub Jelinek <jakub@redhat.com> PR bootstrap/111852 * cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of using rtx_def type for memory_extend_buf, use unsigned char arrayy with size of rtx_def and its alignment.
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_test_fractional_cost): Test <= instead of testing < twice.
libgcc/config/avr/libf7/ * libf7-asm.sx (mul_mant): Implement for devices without MUL. * asm-defs.h (wmov) [!HAVE_MUL]: Fix regno computation. * t-libf7 (F7_ASM_FLAGS): Add -g0.
This patch slightly improves the embench-iot benchmark score for PRU code size. There is also small improvement in a few real-world firmware programs. Embench-iot size ------------------------------------------ Benchmark before after delta --------- ---- ---- ----- aha-mont64 4.15 4.15 0 crc32 6.04 6.04 0 cubic 21.64 21.62 -0.02 edn 6.37 6.37 0 huffbench 18.63 18.55 -0.08 matmult-int 5.44 5.44 0 md5sum 25.56 25.43 -0.13 minver 12.82 12.76 -0.06 nbody 15.09 14.97 -0.12 nettle-aes 4.75 4.75 0 nettle-sha256 4.67 4.67 0 nsichneu 3.77 3.77 0 picojpeg 4.11 4.11 0 primecount 7.90 7.90 0 qrduino 7.18 7.16 -0.02 sglib-combined 13.63 13.59 -0.04 slre 5.19 5.19 0 st 14.23 14.12 -0.11 statemate 2.34 2.34 0 tarfind 36.85 36.64 -0.21 ud 10.51 10.46 -0.05 wikisort 7.44 7.41 -0.03 --------- ----- ----- Geometric mean 8.42 8.40 -0.02 Geometric SD 2.00 2.00 0 Geometric range 12.68 12.62 -0.06 gcc/ChangeLog: * config/pru/pru.cc (pru_insn_cost): New function. (TARGET_INSN_COST): Define for PRU. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
gcc/ChangeLog: PR tree-optimization/111648 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1 chooses base element from arg, ensure that it's a natural stepped sequence. (build_vec_cst_rand): New param natural_stepped and use it to construct a naturally stepped sequence. (test_nunits_min_2): Add new unit tests Case 6 and Case 7.
This is a simple error recovery issue when c_safe_arg_type_equiv_p was added in r8-5312-gc65e18d3331aa999. The issue is that after an error, an argument type (of a function type) might turn into an error mark node and c_safe_arg_type_equiv_p was not ready for that. So this just adds a check for error operand for its arguments before getting the main variant. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/101285 gcc/c/ChangeLog: * c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error operands early. gcc/testsuite/ChangeLog: * gcc.dg/pr101285-1.c: New test.
…ot checking for error When checking to see if we have a function declaration has a conflict due to promotations, there is no test to see if the type was an error mark and then calls c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an ICE. This adds a check for error before the call of c_type_promotes_to. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/101364 gcc/c/ChangeLog: * c-decl.cc (diagnose_arglist_conflict): Test for error mark before calling of c_type_promotes_to. gcc/testsuite/ChangeLog: * gcc.dg/pr101364-1.c: New test.
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only show up when disabling optimization passes as this above form would have been optimized away. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/111863 gcc/ChangeLog: * expr.cc (do_store_flag): Don't over write arg0 when stripping off `& POW2`. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111863-1.c: New test.
I will create duplicate to my branch. |
Sure, just let me know which one you mean. |
@cooljeanius |
ok I changed the base branch |
Did you finish to develop? |
I am done with this particular yaml file for now, yes. There are still some further improvements that could be made, like getting caching to work properly, or getting uploading of logfiles to work properly, but I'm not planning on focusing on either of those any more at the moment. |
Can you create changes only the yaml file? |
All right, I think that's what my |
OK, I opened PR #4 instead. |
Is this what you meant? Or did you mean from one of my other branches to one of your other branches?