Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from my "master" branch #3

Closed
wants to merge 2,394 commits into from

Conversation

cooljeanius
Copy link

Is this what you meant? Or did you mean from one of my other branches to one of your other branches?

zhongjuzhe and others added 30 commits October 9, 2023 21:10
RVV use load_lanes with stride = 5 vectorize this case with -fno-vect-cost-model
instead of SLP.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr45752.c: Adapt dump check for target supports load_lanes with stride = 5.
Update in v2

* Remove emit helper functions.
* Take expand_binop instead.

Original log:

This patch would like to refine the code gen for the bswap16.

We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.

  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 slli    a2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub     a4,a4,a5
7 vse16.v v1,0(a3)
8 add     a0,a0,a2
9 add     a3,a3,a2
  bne     a4,zero,.L2

But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.

  .L5
1 vle8.v  v2,0(a5)
2 addi    a5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addi    a4,a4,32
  bne     a5,a6,.L5

Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
	for shuffle bswap.
	(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
	* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
	* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
	* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
The following improves basic TBAA for access paths formed by
C++ abstraction where we are able to combine a path from an
address-taking operation with a path based on that access using
a pun to avoid memory access semantics on the address-taking part.

The trick is to identify the point the semantic memory access path
starts which allows us to use the alias set of the outermost access
instead of only that of the base of this path.

	PR tree-optimization/111715
	* alias.cc (reference_alias_ptr_type_1): When we have
	a type-punning ref at the base search for the access
	path part that's still semantically valid.

	* gcc.dg/tree-ssa/ssa-fre-102.c: New testcase.
When generating CFI directives for the store-pair instruction,
if we add two parallel REG_FRAME_RELATED_EXPR expr_lists like
  (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (plus:DI (reg/f:DI 2 sp)
    (const_int 8 [0x8])) [1  S8 A64])
    (reg:DI 1 ra))
  (expr_list:REG_FRAME_RELATED_EXPR (set (mem/c:DI (reg/f:DI 2 sp) [1  S8 A64])
    (reg:DI 8 s0))
only the first expr_list will be recognized by dwarf2out_frame_debug
funciton. So, here we generate a SEQUENCE expression of REG_FRAME_RELATED_EXPR,
which includes two sub-expressions of RTX_FRAME_RELATED_P. Then the
dwarf2out_frame_debug_expr function will iterate through all the sub-expressions
and generate the corresponding CFI directives.

gcc/
	* config/riscv/thead.cc (th_mempair_save_regs): Fix missing CFI
	directives for store-pair instruction.

gcc/testsuite/
	* gcc.target/riscv/xtheadmempair-4.c: New test.
This patch fixed these following FAILs in regressions:
FAIL: gcc.dg/vect/slp-perm-11.c -flto -ffat-lto-objects  scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-perm-11.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/vect-bitfield-read-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-2.c scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-read-4.c scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-2.c scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c -flto -ffat-lto-objects  scan-tree-dump-not optimized "Invalid sum"
FAIL: gcc.dg/vect/vect-bitfield-write-3.c scan-tree-dump-not optimized "Invalid sum"

Previously, I removed the movmisalign pattern to fix the execution FAILs in this commit:
gcc-mirror@f7bff24

I was thinking that RVV doesn't allow misaligned at the beginning so I removed that pattern.
However, after deep investigation && reading RVV ISA again and experiment on SPIKE,
I realized I was wrong.

RVV ISA reference: https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-memory-alignment-constraints

"If an element accessed by a vector memory instruction is not naturally aligned to the size of the element,
 either the element is transferred successfully or an address misaligned exception is raised on that element."

It's obvious that RVV ISA does allow misaligned vector load/store.

And experiment and confirm on SPIKE:

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
bbl loader
z  0000000000000000 ra 0000000000010158 sp 0000003ffffffb40 gp 0000000000012c48
tp 0000000000000000 t0 00000000000110da t1 000000000000000f t2 0000000000000000
s0 0000000000013460 s1 0000000000000000 a0 0000000000012ef5 a1 0000000000012018
a2 0000000000012a71 a3 000000000000000d a4 0000000000000004 a5 0000000000012a71
a6 0000000000012a71 a7 0000000000012018 s2 0000000000000000 s3 0000000000000000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
pc 0000000000010258 va/inst 00000000020660a7 sr 8000000200006620
Store/AMO access fault!

[jzzhong@rios-cad122:/work/home/jzzhong/work/toolchain/riscv/gcc/gcc/testsuite/gcc.dg/vect]$~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/bin/spike --misaligned --isa=rv64gcv --varch=vlen:128,elen:64 ~/work/toolchain/riscv/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug/install/riscv64-unknown-elf/bin/pk64  a.out
bbl loader

We can see SPIKE can pass previous *FAILED* execution tests with specifying --misaligned to SPIKE.

So, to honor RVV ISA SPEC, we should add movmisalign pattern back base on the investigations I have done since
it can improve multiple vectorization tests and fix dumple FAILs.

This patch adds TARGET_VECTOR_MISALIGN_SUPPORTED to decide whether we support misalign pattern for VLA modes (By default it is enabled).

Consider this following case:

struct s {
    unsigned i : 31;
    char a : 4;
};

#define N 32
#define ELT0 {0x7FFFFFFFUL, 0}
#define ELT1 {0x7FFFFFFFUL, 1}
#define ELT2 {0x7FFFFFFFUL, 2}
#define ELT3 {0x7FFFFFFFUL, 3}
#define RES 48
struct s A[N]
  = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3,
      ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3};

int __attribute__ ((noipa))
f(struct s *ptr, unsigned n) {
    int res = 0;
    for (int i = 0; i < n; ++i)
      res += ptr[i].a;
    return res;
}

-O3 -S -fno-vect-cost-model (default strict-align):

f:
	mv	a4,a0
	beq	a1,zero,.L9
	addiw	a5,a1,-1
	li	a3,14
	vsetivli	zero,16,e64,m8,ta,ma
	bleu	a5,a3,.L3
	andi	a5,a0,127
	bne	a5,zero,.L3
	srliw	a3,a1,4
	slli	a3,a3,7
	li	a0,15
	slli	a0,a0,32
	add	a3,a3,a4
	mv	a5,a4
	li	a2,32
	vmv.v.x	v16,a0
	vsetvli	zero,zero,e32,m4,ta,ma
	vmv.v.i	v4,0
.L4:
	vsetvli	zero,zero,e64,m8,ta,ma
	vle64.v	v8,0(a5)
	addi	a5,a5,128
	vand.vv	v8,v8,v16
	vsetvli	zero,zero,e32,m4,ta,ma
	vnsrl.wx	v8,v8,a2
	vadd.vv	v4,v4,v8
	bne	a5,a3,.L4
	li	a3,0
	andi	a5,a1,15
	vmv.s.x	v1,a3
	andi	a3,a1,-16
	vredsum.vs	v1,v4,v1
	vmv.x.s	a0,v1
	mv	a2,a0
	beq	a5,zero,.L15
	slli	a5,a3,3
	add	a5,a4,a5
	lw	a0,4(a5)
	andi	a0,a0,15
	addiw	a4,a3,1
	addw	a0,a0,a2
	bgeu	a4,a1,.L15
	lw	a2,12(a5)
	andi	a2,a2,15
	addiw	a4,a3,2
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,20(a5)
	andi	a2,a2,15
	addiw	a4,a3,3
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,28(a5)
	andi	a2,a2,15
	addiw	a4,a3,4
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,36(a5)
	andi	a2,a2,15
	addiw	a4,a3,5
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,44(a5)
	andi	a2,a2,15
	addiw	a4,a3,6
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,52(a5)
	andi	a2,a2,15
	addiw	a4,a3,7
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a4,60(a5)
	andi	a4,a4,15
	addw	a4,a4,a0
	addiw	a2,a3,8
	mv	a0,a4
	bgeu	a2,a1,.L15
	lw	a0,68(a5)
	andi	a0,a0,15
	addiw	a2,a3,9
	addw	a0,a0,a4
	bgeu	a2,a1,.L15
	lw	a2,76(a5)
	andi	a2,a2,15
	addiw	a4,a3,10
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,84(a5)
	andi	a2,a2,15
	addiw	a4,a3,11
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,92(a5)
	andi	a2,a2,15
	addiw	a4,a3,12
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a2,100(a5)
	andi	a2,a2,15
	addiw	a4,a3,13
	addw	a0,a2,a0
	bgeu	a4,a1,.L15
	lw	a4,108(a5)
	andi	a4,a4,15
	addiw	a3,a3,14
	addw	a0,a4,a0
	bgeu	a3,a1,.L15
	lw	a5,116(a5)
	andi	a5,a5,15
	addw	a0,a5,a0
	ret
.L9:
	li	a0,0
.L15:
	ret
.L3:
	mv	a5,a4
	slli	a4,a1,32
	srli	a1,a4,29
	add	a1,a5,a1
	li	a0,0
.L7:
	lw	a4,4(a5)
	andi	a4,a4,15
	addi	a5,a5,8
	addw	a0,a4,a0
	bne	a5,a1,.L7
	ret

-O3 -S -mno-strict-align -fno-vect-cost-model:

f:
	beq	a1,zero,.L4
	slli	a1,a1,32
	li	a5,15
	vsetvli	a4,zero,e64,m1,ta,ma
	slli	a5,a5,32
	srli	a1,a1,32
	li	a6,32
	vmv.v.x	v3,a5
	vsetvli	zero,zero,e32,mf2,ta,ma
	vmv.v.i	v2,0
.L3:
	vsetvli	a5,a1,e64,m1,ta,ma
	vle64.v	v1,0(a0)
	vsetvli	a3,zero,e64,m1,ta,ma
	slli	a2,a5,3
	vand.vv	v1,v1,v3
	sub	a1,a1,a5
	vsetvli	zero,zero,e32,mf2,ta,ma
	add	a0,a0,a2
	vnsrl.wx	v1,v1,a6
	vsetvli	zero,a5,e32,mf2,tu,ma
	vadd.vv	v2,v2,v1
	bne	a1,zero,.L3
	li	a5,0
	vsetvli	a3,zero,e32,mf2,ta,ma
	vmv.s.x	v1,a5
	vredsum.vs	v2,v2,v1
	vmv.x.s	a0,v2
	ret
.L4:
	li	a0,0
	ret

We can see it improves this case codegen a lot.

gcc/ChangeLog:

	* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED): New macro.
	* config/riscv/riscv.cc (riscv_support_vector_misalignment): Depend on movmisalign pattern.
	* config/riscv/vector.md (movmisalign<mode>): New pattern.
This adds a pipeline description for a generic out-of-order core.
Latency and units are not based on any real processor but more or less
educated guesses what such a processor would look like.

In order to account for latency scaling by LMUL != 1, sched_adjust_cost
is implemented.  It will scale an instruction's latency by its LMUL
so an LMUL == 8 instruction will take 8 times the number of cycles
the same instruction with LMUL == 1 would take.
As this potentially causes very high latencies which, in turn, might
lead to scheduling anomalies and a higher number of vsetvls emitted
this feature is only enabled when specifying -madjust-lmul-cost.

Additionally, in order to easily recognize pre-RA vsetvls this patch
introduces an insn type vsetvl_pre which is used in sched_adjust_cost.

In the future we might also want a latency adjustment similar to lmul
for reductions, i.e. make the latency dependent on the type and its
number of units.

gcc/ChangeLog:

	* config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter.
	* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
	Add generic_ooo.
	* config/riscv/riscv.cc (riscv_sched_adjust_cost): Implement
	scheduler hook.
	(TARGET_SCHED_ADJUST_COST): Define.
	* config/riscv/riscv.md (no,yes"): Include generic-ooo.md
	* config/riscv/riscv.opt: Add -madjust-lmul-cost.
	* config/riscv/generic-ooo.md: New file.
	* config/riscv/vector.md: Add vsetvl_pre.
Like ARM SVE, RVV is vectorizing these 2 cases in the same way.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-23.c: Add RVV like ARM SVE.
	* gcc.dg/vect/slp-perm-10.c: Ditto.
RVV vectortizes this case with stride8 load_lanes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-reduc-4.c: Adapt test for stride8 load_lanes.
This case is vectorized by stride8 load_lanes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-12a.c: Adapt for stride 8 load_lanes.
These cases are vectorized by vec_load_lanes with strided = 8 instead of SLP
with -fno-vect-cost-model.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr97832-2.c: Adapt dump check for target supports load_lanes with stride = 8.
	* gcc.dg/vect/pr97832-3.c: Ditto.
	* gcc.dg/vect/pr97832-4.c: Ditto.
RVV vectorize it with stride5 load_lanes.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-perm-4.c: Adapt test for stride5 load_lanes.
Turns out we didnt need this as there is no unordered relations
managed by the oracle.

	* gimple-range-gori.cc (gori_compute::compute_operand1_range): Do
	not call get_identity_relation.
	(gori_compute::compute_operand2_range): Ditto.
	* value-relation.cc (get_identity_relation): Remove.
	* value-relation.h (get_identity_relation): Remove protyotype.
A floating point equivalence may not properly reflect both signs of
zero, so be pessimsitic and ensure both signs are included.

	PR tree-optimization/111694
	gcc/
	* gimple-range-cache.cc (ranger_cache::fill_block_cache): Adjust
	equivalence range.
	* value-relation.cc (adjust_equivalence_range): New.
	* value-relation.h (adjust_equivalence_range): New prototype.

	gcc/testsuite/
	* gcc.dg/pr111694.c: New.
gcc/analyzer/ChangeLog:
	* access-diagram.cc (boundaries::add): Explicitly state
	"boundaries::" scope for "kind" enum.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Verifier checks have recently been strengthened to check that
all counts and probabilities are initialized. The checks fired
during autoprofiledbootstrap build and this patch fixes it.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
	* auto-profile.cc (afdo_calculate_branch_prob): Fix count comparisons
	* tree-vect-loop-manip.cc (vect_do_peeling): Guard against zero count
	when scaling loop profile
For RVV, we have VLS modes enable according to TARGET_MIN_VLEN
from M1 to M8.

For example, when TARGET_MIN_VLEN = 128 bits, we enable
128/256/512/1024 bits VLS modes.

This patch fixes following FAIL:
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c -flto -ffat-lto-objects  scan-tree-dump-times slp2 "optimized: basic block" 2
FAIL: gcc.dg/vect/bb-slp-subgroups-2.c scan-tree-dump-times slp2 "optimized: basic block" 2

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp: Add 256/512/1024
Refurbish add compare patterns: use 'r' constraint, fix identation,
and fix pattern to match 'if (a+b) { ... }' constructions.

gcc/

	* config/arc/arc.cc (arc_select_cc_mode): Match NEG code with
	the first operand.
	* config/arc/arc.md (addsi_compare): Make pattern canonical.
	(addsi_compare_2): Fix identation, constraint letters.
	(addsi_compare_3): Likewise.

gcc/testsuite/

	* gcc.target/arc/add_f-combine.c: New test.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
Here is the reference comparing dump IR between ARM SVE and RVV.

https://godbolt.org/z/zqess8Gss

We can see RVV has one more dump IR:
optimized: basic block part vectorized using 128 byte vectors
since RVV has 1024 bit vectors.

The codegen is reasonable good.

However, I saw GCN also has 1024 bit vector.
This patch may cause this case FAIL in GCN port ?

Hi, GCN folk, could you check this patch in GCN port for me ?

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/bb-slp-pr65935.c: Add vect1024 variant.
	* lib/target-supports.exp: Ditto.
The following fixes fallout of r10-7145-g1dc00a8ec9aeba which made
us cautionous about CSEing a load to an object that has padding bits.
The added check also triggers for BLKmode entities like STRING_CSTs
but by definition a BLKmode entity does not have padding bits.

	PR tree-optimization/111751
	* tree-ssa-sccvn.cc (visit_reference_op_load): Exempt
	BLKmode result from the padding bits check.
Add testcase for PR111751 which has been fixed:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632474.html

	PR target/111751

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr111751.c: New test.
…ning

gcc/ada/

	* sem_attr.adb (Analyze_Attribute): Protect the frontend against
	replacing 'Size by its static value if 'Size is not known at
	compile time and we are processing pragmas Compile_Time_Warning or
	Compile_Time_Errors.
The concept of extended nodes was retired at the same time Gen_IL
was introduced, but there was a reference to that concept left over
in a comment. This patch removes that reference.

Also, the description of the field Comes_From_Check_Or_Contract was
incorrectly placed in a section for fields present in all nodes in
sinfo.ads. This patch fixes this.

gcc/ada/

	* atree.ads, nlists.ads, types.ads: Remove references to extended
	nodes. Fix typo.
	* sinfo.ads: Likewise and fix position of
	Comes_From_Check_Or_Contract description.
This patch fixes the behavior of Ada.Directories.Search when being
requested to filter out regular files or directories. One of the
configurations in which that behavior was incorrect was that when the
caller requested only the regular and special files but not the
directories, the directories would still be returned.

gcc/ada/

	* libgnat/a-direct.adb: Fix filesystem entry filtering.
This occurs when one of the types has an incomplete declaration in addition
to its full declaration in its package. In this case AI05-129 says that the
incomplete type is not part of the limited view of the package, i.e. only
the full view is. Now, in the GNAT implementation, it's the opposite in the
regular view of the package, i.e. the incomplete type is the visible one.

That's why the implementation needs to also swap the types on the visibility
chain while it is swapping the views when the clauses are either installed
or removed. This works correctly for the installation, but does not for the
removal, so this change rewrites the code doing the latter.

gcc/ada/
	PR ada/111434
	* sem_ch10.adb (Replace): New procedure to replace an entity with
	another on the homonym chain.
	(Install_Limited_With_Clause): Rename Non_Lim_View to Typ for the
	sake of consistency.  Call Replace to do the replacements and split
	the code into the regular and the special cases.  Add debuggging
	output controlled by -gnatdi.
	(Install_With_Clause): Print the Parent_With and Implicit_With flags
	in the debugging output controlled by -gnatdi.
	(Remove_Limited_With_Unit.Restore_Chain_For_Shadow (Shadow)): Rewrite
	using a direct replacement of E4 by E2.   Call Replace to do the
	replacements.  Add debuggging output controlled by -gnatdi.
This happens when the conditional expression is immediately returned, for
example in an expression function.

gcc/ada/

	* exp_aggr.adb (Is_Build_In_Place_Aggregate_Return): Return true
	if the aggregate is a dependent expression of a conditional
	expression being returned from a build-in-place function.
It is only called once.

gcc/ada/

	* sem_util.ads (Set_Scope_Is_Transient): Delete.
	* sem_util.adb (Set_Scope_Is_Transient): Likewise.
	* exp_ch7.adb (Create_Transient_Scope): Set Is_Transient directly.
The purpose of this patch is to work around false-positive warnings
emitted by GNAT SAS (also known as CodePeer). It does not change
the behavior of the modified subprogram.

gcc/ada/

	* libgnat/a-direct.adb (Start_Search_Internal): Tweak subprogram
	body.
…component

This is a small bug present on strict-alignment platforms for questionable
representation clauses.

gcc/ada/

	* gcc-interface/decl.cc (inline_status_for_subprog): Minor tweak.
	(gnat_to_gnu_field): Try harder to get a packable form of the type
	for a bitfield.
…etation

The following ups the limit in fold_view_convert_expr to handle
1024bit vectors as used by GCN and RVV.  It also robustifies
the handling in visit_reference_op_load to properly give up when
constants cannot be re-interpreted.

	PR tree-optimization/111751
	* fold-const.cc (fold_view_convert_expr): Up the buffer size
	to 128 bytes.
	* tree-ssa-sccvn.cc (visit_reference_op_load): Special case
	constants, giving up when re-interpretation to the target type
	fails.
TamarChristinaArm and others added 18 commits October 18, 2023 09:53
When ifcvt was initially added masking was not a thing and as such it was
rather conservative in what it supported.

For builtins it only allowed C99 builtin functions which it knew it can fold
away.

These days the vectorizer is able to deal with needing to mask IFNs itself.
vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
the operation to emulate the masking.

This is then used by match.pd to conver the IFN into a masked variant if it's
available.

For these reasons the restriction in ifconvert is no longer require and we
needless block vectorization when we can effectively handle the operations.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Note: This patch is part of a testseries and tests for it are added in the
AArch64 patch that adds supports for the optab.

gcc/ChangeLog:

	PR tree-optimization/109154
	* tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
This rewrites the simd MOV patterns to use the new compact syntax.
No change in semantics is expected.  This will be needed in follow on patches.

This also merges the splits into the define_insn which will also be needed soon.

gcc/ChangeLog:

	PR tree-optimization/109154
	* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
	Rewrite to new syntax.
	(*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in
	splits.
This refactors the code to remove the args cache and index lookups
in favor of a single structure. It also again, removes the use of
std::sort as previously requested but avoids the new asserts in
trunk.

gcc/ChangeLog:

	PR tree-optimization/109154
	* tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
	(typedef struct ifcvt_arg_entry): New.
	(cmp_arg_entry): New.
	(gen_phi_arg_condition, gen_phi_nest_statement,
	predicate_scalar_phi): Use them.
This adds support for the minimum OS version data in assembler files.
At present, we have no mechanism to detect the SDK version in use, and
so that is omitted from build_versions.

We follow the implementation in clang, '.build_version' is only emitted
(where supported) for target macOS versions >= 10.14.  For earlier macOS
we fall back to using a '.macosx_version_min' directive.  This latter is
also emitted when the assembler supports it, but not build_version.

gcc/ChangeLog:

	* config.in: Regenerate.
	* config/darwin.cc (darwin_file_start): Add assembler directives
	for the target OS version, where these are supported by the
	assembler.
	(darwin_override_options): Check for building >= macOS 10.14.
	* configure: Regenerate.
	* configure.ac: Check for assembler support of .build_version
	directives.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:

	PR target/111093
	* config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error
	instead of an assert ICE when no -march= has been specified.
GCC ICEs on the first testcase.  Successful match_uaddc_usubc ends up with
some dead stmts which DCE will remove (hopefully) later all.
The ICE is because one of the dead stmts refers to a freed SSA_NAME.
The code already gsi_removes a couple of stmts in the
  /* Remove some statements which can't be kept in the IL because they
     use SSA_NAME whose setter is going to be removed too.  */
section for the same reason (the reason for the freed SSA_NAMEs is that
we don't really have a replacement for those cases - all we have after
a match is combined overflow from the addition/subtraction of 2 operands + a
[0, 1] carry in, but not the individual overflows from the former 2
additions), but for the last (most significant) limb case, where we try
to match x = op1 + op2 + carry1 + carry2; or
x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but
left around the 2 temporary stmts as dead; if we were unlucky enough that
those referenced the carry flag that went away, it ICEs.

So, the following patch remembers those temporary statements (rather than
trying to rediscover them more expensively) and removes them before the
final one is replaced.

While working on it, I've noticed we didn't support all the reassociated
possibilities of writing the addition of 4 operands or subtracting 3
operands from one, we supported e.g.
x = ((op1 + op2) + op3) + op4;
x = op1 + ((op2 + op3) + op4);
but not
x = (op1 + (op2 + op3)) + op4;
x = op1 + (op2 + (op3 + op4));
Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what
we were searching for (if non-NULL) - rhs[0] is inspected in the first
loop and has different handling for the MINUS_EXPR case.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/111845
	* tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary
	statements for the 4 operand addition or subtraction of 3 operands
	from 1 operand cases and remove them when successful.  Look for
	nested additions even from rhs[2], not just rhs[1].

	* gcc.dg/pr111845.c: New test.
	* gcc.target/i386/pr111845.c: New test.
gcc/ChangeLog:

	* gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute
	to avoid that auxillary statement list reaches LTO.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/allocate-13a.f90: New test.
In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag.  This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.

So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.

The tests check desired behavior for such a permerror in a system header
with various flags.  The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.

This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).

gcc/ChangeLog:

	* doc/invoke.texi: Move -fpermissive to Warning Options.
	* diagnostic.cc (update_effective_level_from_pragmas): Remove
	redundant system header check.
	(diagnostic_report_diagnostic): Move down syshdr/-w check.
	(diagnostic_impl): Handle DK_PERMERROR with an option number.
	(permerror): Add new overloads.
	* diagnostic-core.h (permerror): Declare them.

gcc/cp/ChangeLog:

	* typeck2.cc (check_narrowing): Use permerror.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/integer-pack2.C: Add -fpermissive.
	* g++.dg/diagnostic/sys-narrow.h: New test.
	* g++.dg/diagnostic/sys-narrow1.C: New test.
	* g++.dg/diagnostic/sys-narrow1a.C: New test.
	* g++.dg/diagnostic/sys-narrow1b.C: New test.
	* g++.dg/diagnostic/sys-narrow1c.C: New test.
	* g++.dg/diagnostic/sys-narrow1d.C: New test.
	* g++.dg/diagnostic/sys-narrow1e.C: New test.
	* g++.dg/diagnostic/sys-narrow1f.C: New test.
	* g++.dg/diagnostic/sys-narrow1g.C: New test.
	* g++.dg/diagnostic/sys-narrow1h.C: New test.
	* g++.dg/diagnostic/sys-narrow1i.C: New test.
Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile
cse.cc which contains a variable with rtx_def type, because rtx_def
contains a union with poly_uint16 element.  poly_int template has
defaulted default constructor and a variadic template constructor which
could have empty parameter pack. GCC < 5 treated it as non-trivially
constructible class and deleted rtunion and rtx_def default constructors.

For the cse_insn purposes, all we need is a variable with size and alignment
of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and
fill in like rtx is normally allocated from heap, so this patch for
GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment.

2023-10-18  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111852
	* cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of
	using rtx_def type for memory_extend_buf, use unsigned char
	arrayy with size of rtx_def and its alignment.
gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_test_fractional_cost):
	Test <= instead of testing < twice.
libgcc/config/avr/libf7/
	* libf7-asm.sx (mul_mant): Implement for devices without MUL.
	* asm-defs.h (wmov) [!HAVE_MUL]: Fix regno computation.
	* t-libf7 (F7_ASM_FLAGS): Add -g0.
This patch slightly improves the embench-iot benchmark score for
PRU code size.  There is also small improvement in a few real-world
firmware programs.

  Embench-iot size
  ------------------------------------------
  Benchmark          before   after    delta
  ---------           ----    ----     -----
  aha-mont64          4.15    4.15         0
  crc32               6.04    6.04         0
  cubic              21.64   21.62     -0.02
  edn                 6.37    6.37         0
  huffbench          18.63   18.55     -0.08
  matmult-int         5.44    5.44         0
  md5sum             25.56   25.43     -0.13
  minver             12.82   12.76     -0.06
  nbody              15.09   14.97     -0.12
  nettle-aes          4.75    4.75         0
  nettle-sha256       4.67    4.67         0
  nsichneu            3.77    3.77         0
  picojpeg            4.11    4.11         0
  primecount          7.90    7.90         0
  qrduino             7.18    7.16     -0.02
  sglib-combined     13.63   13.59     -0.04
  slre                5.19    5.19         0
  st                 14.23   14.12     -0.11
  statemate           2.34    2.34         0
  tarfind            36.85   36.64     -0.21
  ud                 10.51   10.46     -0.05
  wikisort            7.44    7.41     -0.03
  ---------          -----   -----
  Geometric mean      8.42    8.40     -0.02
  Geometric SD        2.00    2.00         0
  Geometric range    12.68   12.62     -0.06

gcc/ChangeLog:

	* config/pru/pru.cc (pru_insn_cost): New function.
	(TARGET_INSN_COST): Define for PRU.

Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
gcc/ChangeLog:
	PR tree-optimization/111648
	* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1
	chooses base element from arg, ensure that it's a natural stepped
	sequence.
	(build_vec_cst_rand): New param natural_stepped and use it to
	construct a naturally stepped sequence.
	(test_nunits_min_2): Add new unit tests Case 6 and Case 7.
This is a simple error recovery issue when c_safe_arg_type_equiv_p
was added in r8-5312-gc65e18d3331aa999. The issue is that after
an error, an argument type (of a function type) might turn
into an error mark node and c_safe_arg_type_equiv_p was not ready
for that. So this just adds a check for error operand for its
arguments before getting the main variant.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR c/101285

gcc/c/ChangeLog:

	* c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
	operands early.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr101285-1.c: New test.
…ot checking for error

When checking to see if we have a function declaration has a conflict due to
promotations, there is no test to see if the type was an error mark and then calls
c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an
ICE.

This adds a check for error before the call of c_type_promotes_to.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

	PR c/101364

gcc/c/ChangeLog:

	* c-decl.cc (diagnose_arglist_conflict): Test for
	error mark before calling of c_type_promotes_to.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr101364-1.c: New test.
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670
where we would remove the `& CST` part if we ended up not calling
expand_single_bit_test.
This fixes the problem by introducing a new variable that will be used
for calling expand_single_bit_test.
As afar as I know this can only show up when disabling optimization
passes as this above form would have been optimized away.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

	PR middle-end/111863

gcc/ChangeLog:

	* expr.cc (do_store_flag): Don't over write arg0
	when stripping off `& POW2`.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/execute/pr111863-1.c: New test.
@talregev
Copy link
Owner

I will create duplicate to my branch.
Can you merge this to my duplicate?

@cooljeanius
Copy link
Author

I will create duplicate to my branch. Can you merge this to my duplicate?

Sure, just let me know which one you mean.

@talregev
Copy link
Owner

talregev commented Oct 19, 2023

@cooljeanius
this is the new branch
TalR/gcc_ci_cooljeanius

@cooljeanius cooljeanius changed the base branch from TalR/gcc_ci to TalR/gcc_ci_cooljeanius October 19, 2023 02:30
@cooljeanius
Copy link
Author

ok I changed the base branch

@talregev
Copy link
Owner

Did you finish to develop?

@cooljeanius
Copy link
Author

Did you finish to develop?

I am done with this particular yaml file for now, yes. There are still some further improvements that could be made, like getting caching to work properly, or getting uploading of logfiles to work properly, but I'm not planning on focusing on either of those any more at the moment.

@talregev
Copy link
Owner

Can you create changes only the yaml file?
For other changes, we can create a different branch, with different PR.

@cooljeanius
Copy link
Author

Can you create changes only the yaml file? For other changes, we can create a different branch, with different PR.

All right, I think that's what my me/CI branch does; let me check...

@cooljeanius
Copy link
Author

Can you create changes only the yaml file? For other changes, we can create a different branch, with different PR.

All right, I think that's what my me/CI branch does; let me check...

OK, I opened PR #4 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.