Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI, integration script to build complete ESP8266 #1

Closed
pfalcon opened this issue Nov 16, 2014 · 3 comments
Closed

FYI, integration script to build complete ESP8266 #1

pfalcon opened this issue Nov 16, 2014 · 3 comments

Comments

@pfalcon
Copy link

pfalcon commented Nov 16, 2014

Strictly FYI ;-). https://github.com/pfalcon/esp-open-sdk

@jcmvbkbc
Copy link
Owner

Ok.

@pfalcon
Copy link
Author

pfalcon commented Nov 19, 2014

Btw, can you please enable bug tracker in https://github.com/jcmvbkbc/crosstool-NG - they're disabled by default in forks on github.

@jcmvbkbc
Copy link
Owner

Done.

jcmvbkbc pushed a commit that referenced this issue May 30, 2017
	* pt.c (most_specialized_instantiation): Cope with duplicate
	instantiations.

	PR c++/80891 (#1)
	* g++.dg/lookup/pr80891-1.C: New.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248573 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue May 30, 2017
	* cp-tree.h (lookup_maybe_add): Add DEDUPING argument.
	* name-lookup.c (name_lookup): Add deduping field.
	(name_lookup::preserve_state, name_lookup::restore_state): Deal
	with deduping.
	(name_lookup::add_overload): New.
	(name_lookup::add_value, name_lookup::add_fns): Call add_overload.
	(name_lookup::search_adl): Set deduping.  Don't unmark here.
	* pt.c (most_specialized_instantiation): Revert previous change,
	Assert not given duplicates.
	* tree.c (lookup_mark): Just mark the underlying decls.
	(lookup_maybe_add): Dedup using marked decls.

	PR c++/80891 (#5)
	* g++.dg/lookup/pr80891-5.C: New.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248578 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jun 18, 2018
When -fcf-protection -mcet is used, I got

FAIL: g++.dg/eh/sighandle.C

(gdb) bt
 #0  _Unwind_RaiseException (exc=exc@entry=0x416ed0)
    at /export/gnu/import/git/sources/gcc/libgcc/unwind.inc:140
 #1  0x00007ffff7d9936b in __cxxabiv1::__cxa_throw (obj=<optimized out>,
    tinfo=0x403dd0 <typeinfo for int@@CXXABI_1.3>, dest=0x0)
    at /export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
 #2  0x0000000000401255 in sighandler (signo=11, si=0x7fffffffd6f8,
    uc=0x7fffffffd5c0)
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:9
 #3  <signal handler called> <<<< Signal frame which isn't on shadow stack
 #4  dosegv ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:14
 #5  0x00000000004012e3 in main ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:30
(gdb) p frames
$6 = 5
(gdb)

frame count should be 4, not 5.  This patch skips signal frames when
unwinding shadow stack.

gcc/testsuite/

	PR libgcc/85334
	* g++.dg/torture/pr85334.C: New test.

libgcc/

	PR libgcc/85334
	* unwind-generic.h (_Unwind_Frames_Increment): New.
	* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
	Likewise.
	* unwind.inc (_Unwind_RaiseException_Phase2): Increment frame
	count with _Unwind_Frames_Increment.
	(_Unwind_ForcedUnwind_Phase2): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259502 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jun 19, 2018
This fixes a long-standing quirk present in the layout information for record
types displayed by the -gnatR3 switch: when a component has a variable
(starting) position, its corresponding line in the output has an irregular and
awkward format.  After this change, the format is the same as in all the other
cases.

For the following record:

    type R (m : natural) is record
        s : string (1 .. m);
        r : natural;
        b : boolean;
    end record;
    for R'alignment use 4;
    pragma Pack (R);

the output of -gnatR3 used to be:

for R'Object_Size use 17179869248;
for R'Value_Size use ((#1 + 8) * 8);
for R'Alignment use 4;
for R use record
   m at  0 range  0 .. 30;
   s at  4 range  0 .. ((#1 * 8)) - 1;
   r at bit offset (((#1 + 4) * 8)) size in bits = 31
   b at bit offset ((((#1 + 7) * 8) + 7)) size in bits = 1
end record;

and is changed into:

for R'Object_Size use 17179869248;
for R'Value_Size use ((#1 + 8) * 8);
for R'Alignment use 4;
for R use record
   m at  0 range  0 .. 30;
   s at  4 range  0 .. ((#1 * 8)) - 1;
   r at (#1 + 4) range  0 .. 30;
   b at (#1 + 7) range  7 ..  7;
end record;

2018-05-24  Eric Botcazou  <ebotcazou@adacore.com>

gcc/ada/

	* fe.h (Set_Normalized_First_Bit): Declare.
	(Set_Normalized_Position): Likewise.
	* repinfo.adb (List_Record_Layout): Do not use irregular output for a
	variable position.  Fix minor spacing issue.
	* gcc-interface/decl.c (annotate_rep): If a field has a variable
	offset, compute the normalized position and annotate it in addition to
	the bit offset.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@260669 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jan 28, 2019
This adds a 4th information level for the -gnatR output, where relevant
compiler-generated types are listed in addition to the information
already output by -gnatR3.

For the following package P:

package P is

  type Arr0 is array (Positive range <>) of Boolean;

    type Rec (D1 : Positive; D2 : Boolean) is record
       C1 : Integer;
       C2 : Arr0 (1 .. D1);

       case D2 is
          when False =>
             C3 : Character;
          when True =>
             C4 : String (1 .. 3);
             C5 : Float;
       end case;
    end record;

    type Arr1 is array (1 .. 8) of Rec (1, True);

end P;

the output generated by -gnatR4 must be:

Representation information for unit P (spec)
--------------------------------------------

for Arr0'Alignment use 1;
for Arr0'Component_Size use 8;

for Rec'Object_Size use 17179869344;
for Rec'Value_Size use (if (#2 != 0) then ((((#1 + 15) & -4) + 8) * 8)
else ((((#1 + 15) & -4) + 1) * 8) end);
for Rec'Alignment use 4;
for Rec use record
   D1 at  0 range  0 .. 31;
   D2 at  4 range  0 ..  7;
   C1 at  8 range  0 .. 31;
   C2 at 12 range  0 .. ((#1 * 8)) - 1;
   C3 at ((#1 + 15) & -4) range  0 ..  7;
   C4 at ((#1 + 15) & -4) range  0 .. 23;
   C5 at (((#1 + 15) & -4) + 4) range  0 .. 31;
end record;

for Arr1'Size use 1536;
for Arr1'Alignment use 4;
for Arr1'Component_Size use 192;

for Tarr1c'Size use 192;
for Tarr1c'Alignment use 4;
for Tarr1c use record
   D1 at  0 range  0 .. 31;
   D2 at  4 range  0 ..  7;
   C1 at  8 range  0 .. 31;
   C2 at 12 range  0 ..  7;
   C4 at 16 range  0 .. 23;
   C5 at 20 range  0 .. 31;
end record;

2018-11-14  Eric Botcazou  <ebotcazou@adacore.com>

gcc/ada/

	* doc/gnat_ugn/building_executable_programs_with_gnat.rst
	(-gnatR): Document new -gnatR4 level.
	* gnat_ugn.texi: Regenerate.
	* opt.ads (List_Representation_Info): Bump upper bound to 4.
	* repinfo.adb: Add with clause for GNAT.HTable.
	(Relevant_Entities_Size): New constant.
	(Entity_Header_Num): New type.
	(Entity_Hash): New function.
	(Relevant_Entities): New set implemented with GNAT.HTable.
	(List_Entities): Also list compiled-generated entities present
	in the Relevant_Entities set. Consider that the Component_Type
	of an array type is relevant.
	(List_Rep_Info): Reset Relevant_Entities for each unit.
	* switch-c.adb (Scan_Front_End_Switches): Add support for -gnatR4.
	* switch-m.adb (Normalize_Compiler_Switches): Likewise
	* usage.adb (Usage): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@266131 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
Here ever since r12-6022-gbb2a7f80a98de3 we stopped deeming the partial
specialization #2 to be more specialized than #1 ultimately because
dependent operator expressions now have a DEPENDENT_OPERATOR_TYPE type
instead of an empty type, and this made unify stop deducing T(2) == 1
for K during partial ordering for #1 and #2.

This minimal patch fixes this by making the relevant logic in unify
treat DEPENDENT_OPERATOR_TYPE like an empty type.

	PR c++/105425

gcc/cp/ChangeLog:

	* pt.cc (unify) <case TEMPLATE_PARM_INDEX>: Treat
	DEPENDENT_OPERATOR_TYPE like an empty type.

gcc/testsuite/ChangeLog:

	* g++.dg/template/partial-specialization13.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
This patch fixes the second half of 64679.  Here we issue a wrong
"redefinition of 'int x'" for the following:

  struct Bar {
    Bar(int, int, int);
  };

  int x = 1;
  Bar bar(int(x), int(x), int{x}); // #1

cp_parser_parameter_declaration_list does pushdecl every time it sees
a named parameter, so the second "int(x)" causes the error.  That's
premature, since this turns out to be a constructor call after the
third argument!

If the first parameter is parenthesized, we can't push until we've
established we're looking at a function declaration.  Therefore this
could be fixed by some kind of lookahead.  I thought about introducing a
lightweight variant of cp_parser_parameter_declaration_list that would
not have any side effects and would return as soon as it figures out
whether it's looking at a declaration or expression.  Since that would
require fairly nontrivial changes, I wanted something simpler.

Something like delaying the pushdecl until we've reached the ')'
following the parameter-declaration-clause.  But we must push the
parameters before processing a default argument, as in:

  Bar bar(int(a), int(b), int c = sizeof(a));  // valid

Moreover, this code should still be accepted

  Bar f(int(i), decltype(i) j = 42);

so this patch stashes parameters into a vector when parsing tentatively
only when pushdecl-ing a parameter would result in a clash and an error
about redefinition/redeclaration.  The stashed parameters are pushed at
the end of a parameter-declaration-clause if it's followed by a ')', so
that we still diagnose redefining a parameter.

	PR c++/64679

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_parameter_declaration_clause): Maintain
	a vector of parameters that haven't been pushed yet.  Push them at the
	end of a valid parameter-declaration-clause.
	(cp_parser_parameter_declaration_list): Take a new auto_vec parameter.
	Do not pushdecl while parsing tentatively when pushdecl-ing a parameter
	would result in a hard error.
	(cp_parser_cache_defarg): Adjust the call to
	cp_parser_parameter_declaration_list.

gcc/testsuite/ChangeLog:

	* g++.dg/parse/ambig11.C: New test.
	* g++.dg/parse/ambig12.C: New test.
	* g++.dg/parse/ambig13.C: New test.
	* g++.dg/parse/ambig14.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
Consider

  struct A {
    int x;
    int y = x;
  };

  struct B {
    int x = 0;
    int y = A{x}.y; // #1
  };

where for #1 we end up with

  {.x=(&<PLACEHOLDER_EXPR struct B>)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}

that is, two PLACEHOLDER_EXPRs for different types on the same level in
a {}.  This crashes because our CONSTRUCTOR_PLACEHOLDER_BOUNDARY mechanism to
avoid replacing unrelated PLACEHOLDER_EXPRs cannot deal with it.

Here's why we wound up with those PLACEHOLDER_EXPRs: When we're performing
cp_parser_late_parsing_nsdmi for "int y = A{x}.y;" we use finish_compound_literal
on type=A, compound_literal={((struct B *) this)->x}.  When digesting this
initializer, we call get_nsdmi which creates a PLACEHOLDER_EXPR for A -- we don't
have any object to refer to yet.  After digesting, we have

  {.x=((struct B *) this)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}

and since we've created a PLACEHOLDER_EXPR inside it, we marked the whole ctor
CONSTRUCTOR_PLACEHOLDER_BOUNDARY.  f_c_l creates a TARGET_EXPR and returns

  TARGET_EXPR <D.2384, {.x=((struct B *) this)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}>

Then we get to

  B b = {};

and call store_init_value, which digests the {}, which produces

  {.x=NON_LVALUE_EXPR <0>, .y=(TARGET_EXPR <D.2395, {.x=(&<PLACEHOLDER_EXPR struct B>)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}>).y}

lookup_placeholder in constexpr won't find an object to replace the
PLACEHOLDER_EXPR for B, because ctx->object will be D.2395 of type A, and we
cannot search outward from D.2395 to find 'b'.

The call to replace_placeholders in store_init_value will not do anything:
we've marked the inner { } CONSTRUCTOR_PLACEHOLDER_BOUNDARY, and it's only
a sub-expression, so replace_placeholders does nothing, so the <P_E struct B>
stays even though now is the perfect time to replace it because we have an
object for it: 'b'.

Later, in cp_gimplify_init_expr the *expr_p is

  D.2395 = {.x=(&<PLACEHOLDER_EXPR struct B>)->x, .y=(&<PLACEHOLDER_EXPR struct A>)->x}

where D.2395 is of type A, but we crash because we hit <P_E struct B>, which
has a different type.

My idea was to replace <P_E struct A> with D.2384 after creating the
TARGET_EXPR because that means we have an object we can refer to.
Then clear CONSTRUCTOR_PLACEHOLDER_BOUNDARY because we no longer have
a PLACEHOLDER_EXPR in the {}.  Then store_init_value will be able to
replace <P_E struct B> with 'b', and we should be good to go.  We must
be careful not to break guaranteed copy elision, so this replacement
happens in digest_nsdmi_init where we can see the whole initializer,
and avoid replacing any placeholders in TARGET_EXPRs used in the context
of initialization/copy elision.  This is achieved via the new function
called potential_prvalue_result_of.

While fixing this problem, I found PR105550, thus the FIXMEs in the
tests.

	PR c++/100252

gcc/cp/ChangeLog:

	* typeck2.cc (potential_prvalue_result_of): New.
	(replace_placeholders_for_class_temp_r): New.
	(digest_nsdmi_init): Call it.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1y/nsdmi-aggr14.C: New test.
	* g++.dg/cpp1y/nsdmi-aggr15.C: New test.
	* g++.dg/cpp1y/nsdmi-aggr16.C: New test.
	* g++.dg/cpp1y/nsdmi-aggr17.C: New test.
	* g++.dg/cpp1y/nsdmi-aggr18.C: New test.
	* g++.dg/cpp1y/nsdmi-aggr19.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
As explained in r11-4959-gde6f64f9556ae3, the atom cache assumes two
equivalent expressions (according to cp_tree_equal) must use the same
template parameters (according to find_template_parameters).  This
assumption turned out to not hold for TARGET_EXPR, which was addressed
by that commit.

But this assumption apparently doesn't hold for PARM_DECL either:
find_template_parameters walks its DECL_CONTEXT but cp_tree_equal by
default doesn't consider DECL_CONTEXT unless comparing_specializations
is set.  Thus in the first testcase below, the atomic constraints of #1
and #2 are equivalent according to cp_tree_equal, but according to
find_template_parameters the former uses T and the latter uses both T
and U (surprisingly).

We could fix this assumption violation by setting comparing_specializations
in the atom_hasher, which would make cp_tree_equal return false for the
two atoms, but that seems overly pessimistic here.  Ideally the atoms
should continue being considered equivalent and we instead fix
find_template_paremeters to return just T for #2's atom.

To that end this patch makes for_each_template_parm_r stop walking the
DECL_CONTEXT of a PARM_DECL.  This should be safe to do because
tsubst_copy / tsubst_decl only substitutes the TREE_TYPE of a PARM_DECL
and doesn't bother substituting the DECL_CONTEXT, thus the only relevant
template parameters are those used in its type.  any_template_parm_r is
currently responsible for walking its TREE_TYPE, but I suppose it now makes
sense for for_each_template_parm_r to do so instead.

In passing this patch also makes for_each_template_parm_r stop walking
the DECL_CONTEXT of a VAR_/FUNCTION_DECL since doing so after walking
DECL_TI_ARGS is redundant, I think.

I experimented with not walking DECL_CONTEXT for CONST_DECL, but the
second testcase below demonstrates it's necessary to walk it.

	PR c++/105797

gcc/cp/ChangeLog:

	* pt.cc (for_each_template_parm_r) <case FUNCTION_DECL, VAR_DECL>:
	Don't walk DECL_CONTEXT.
	<case PARM_DECL>: Likewise.  Walk TREE_TYPE.
	<case CONST_DECL>: Simplify.
	(any_template_parm_r) <case PARM_DECL>: Don't walk TREE_TYPE.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-decltype4.C: New test.
	* g++.dg/cpp2a/concepts-memfun3.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jun 17, 2022
(In reply to Uroš Bizjak from comment #1)
> Instruction does not accept memory operand for operand 3:
>
> (define_insn_and_split
> "*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_ltint"
>   [(set (match_operand:<ssebytemode> 0 "register_operand" "=Yr,*x,x")
> 	(unspec:<ssebytemode>
> 	  [(match_operand:<ssebytemode> 1 "register_operand" "0,0,x")
> 	   (match_operand:<ssebytemode> 2 "vector_operand" "YrBm,*xBm,xm")
> 	   (subreg:<ssebytemode>
> 	     (lt:VI48_AVX
> 	       (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
> 	       (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> 	  UNSPEC_BLENDV))]
>
> The problematic insn is:
>
> (define_insn_and_split "*avx_cmp<mode>3_ltint_not"
>  [(set (match_operand:VI48_AVX  0 "register_operand")
>        (vec_merge:VI48_AVX
> 	 (match_operand:VI48_AVX 1 "vector_operand")
> 	 (match_operand:VI48_AVX 2 "vector_operand")
> 	 (unspec:<avx512fmaskmode>
> 	   [(subreg:VI48_AVX
> 	    (not:<ssebytemode>
> 	      (match_operand:<ssebytemode> 3 "vector_operand")) 0)
> 	    (match_operand:VI48_AVX 4 "const0_operand")
> 	    (match_operand:SI 5 "const_0_to_7_operand")]
> 	    UNSPEC_PCMP)))]
>
> which gets split to the above pattern.
>
> In the preparation statements we have:
>
>   if (!MEM_P (operands[3]))
>     operands[3] = force_reg (<ssebytemode>mode, operands[3]);
>   operands[3] = lowpart_subreg (<MODE>mode, operands[3], <ssebytemode>mode);
>
> Which won't fly when operand 3 is memory operand...
>

gcc/ChangeLog:

	PR target/105953
	* config/i386/sse.md (*avx_cmp<mode>3_ltint_not): Force_reg
	operands[3].

gcc/testsuite/ChangeLog:

	* g++.target/i386/pr105953.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jul 18, 2022
This patch implements C++23 P2255R2, which adds two new type traits to
detect reference binding to a temporary.  They can be used to detect code
like

  std::tuple<const std::string&> t("meow");

which is incorrect because it always creates a dangling reference, because
the std::string temporary is created inside the selected constructor of
std::tuple, and not outside it.

There are two new compiler builtins, __reference_constructs_from_temporary
and __reference_converts_from_temporary.  The former is used to simulate
direct- and the latter copy-initialization context.  But I had a hard time
finding a test where there's actually a difference.  Under DR 2267, both
of these are invalid:

  struct A { } a;
  struct B { explicit B(const A&); };
  const B &b1{a};
  const B &b2(a);

so I had to peruse [over.match.ref], and eventually realized that the
difference can be seen here:

  struct G {
    operator int(); // #1
    explicit operator int&&(); // #2
  };

int&& r1(G{}); // use #2 (no temporary)
int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&)

The implementation itself was rather straightforward because we already
have the conv_binds_ref_to_prvalue function.  The main function here is
ref_xes_from_temporary.
I've changed the return type of ref_conv_binds_directly to tristate, because
previously the function didn't distinguish between an invalid conversion and
one that binds to a prvalue.  Since it no longer returns a bool, I removed
the _p suffix.

The patch also adds the relevant class and variable templates to <type_traits>.

	PR c++/104477

gcc/c-family/ChangeLog:

	* c-common.cc (c_common_reswords): Add
	__reference_constructs_from_temporary and
	__reference_converts_from_temporary.
	* c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and
	RID_REF_CONVERTS_FROM_TEMPORARY.

gcc/cp/ChangeLog:

	* call.cc (ref_conv_binds_directly_p): Rename to ...
	(ref_conv_binds_directly): ... this.  Add a new bool parameter.  Change
	the return type to tristate.
	* constraint.cc (diagnose_trait_expr): Handle
	CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	* cp-tree.h: Include "tristate.h".
	(enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY
	and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	(ref_conv_binds_directly_p): Rename to ...
	(ref_conv_binds_directly): ... this.
	(ref_xes_from_temporary): Declare.
	* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
	CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	* method.cc (ref_xes_from_temporary): New.
	* parser.cc (cp_parser_primary_expression): Handle
	RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY.
	(cp_parser_trait_expr): Likewise.
	(warn_for_range_copy): Adjust to call ref_conv_binds_directly.
	* semantics.cc (trait_expr_value): Handle
	CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	(finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

	* include/std/type_traits (reference_constructs_from_temporary,
	reference_converts_from_temporary): New class templates.
	(reference_constructs_from_temporary_v,
	reference_converts_from_temporary_v): New variable templates.
	(__cpp_lib_reference_from_temporary): Define for C++23.
	* include/std/version (__cpp_lib_reference_from_temporary): Define for
	C++23.
	* testsuite/20_util/variable_templates_for_traits.cc: Test
	reference_constructs_from_temporary_v and
	reference_converts_from_temporary_v.
	* testsuite/20_util/reference_from_temporary/value.cc: New test.
	* testsuite/20_util/reference_from_temporary/value2.cc: New test.
	* testsuite/20_util/reference_from_temporary/version.cc: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/reference_constructs_from_temporary1.C: New test.
	* g++.dg/ext/reference_converts_from_temporary1.C: New test.
jcmvbkbc pushed a commit that referenced this issue Aug 17, 2022
In my previous patches I've been extending our std::move warnings,
but this tweak actually dials it down a little bit.  As reported in
bug 89780, it's questionable to warn about expressions in templates
that were type-dependent, but aren't anymore because we're instantiating
the template.  As in,

  template <typename T>
  Dest withMove() {
    T x;
    return std::move(x);
  }

  template Dest withMove<Dest>(); // #1
  template Dest withMove<Source>(); // #2

Saying that the std::move is pessimizing for #1 is not incorrect, but
it's not useful, because removing the std::move would then pessimize #2.
So the user can't really win.  At the same time, disabling the warning
just because we're in a template would be going too far, I still want to
warn for

  template <typename>
  Dest withMove() {
    Dest x;
    return std::move(x);
  }

because the std::move therein will be pessimizing for any instantiation.

So I'm using the suppress_warning machinery to that effect.
Problem: I had to add a new group to nowarn_spec_t, otherwise
suppressing the -Wpessimizing-move warning would disable a whole bunch
of other warnings, which we really don't want.

	PR c++/89780

gcc/cp/ChangeLog:

	* pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Maybe suppress
	-Wpessimizing-move.
	* typeck.cc (maybe_warn_pessimizing_move): Don't issue warnings
	if they are suppressed.
	(check_return_expr): Disable -Wpessimizing-move when returning
	a dependent expression.

gcc/ChangeLog:

	* diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Handle
	OPT_Wpessimizing_move and OPT_Wredundant_move.
	* diagnostic-spec.h (nowarn_spec_t): Add NW_REDUNDANT enumerator.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/Wpessimizing-move3.C: Remove dg-warning.
	* g++.dg/cpp0x/Wredundant-move2.C: Likewise.
jcmvbkbc pushed a commit that referenced this issue Oct 14, 2022
To improve compile times, the C++ library could use compiler built-ins
rather than implementing std::is_convertible (and _nothrow) as class
templates.  This patch adds the built-ins.  We already have
__is_constructible and __is_assignable, and the nothrow forms of those.

Microsoft (and clang, for compatibility) also provide an alias called
__is_convertible_to.  I did not add it, but it would be trivial to do
so.

I noticed that our __is_assignable doesn't implement the "Access checks
are performed as if from a context unrelated to either type" requirement,
therefore std::is_assignable / __is_assignable give two different results
here:

  class S {
    operator int();
    friend void g(); // #1
  };

  void
  g ()
  {
    // #1 doesn't matter
    static_assert(std::is_assignable<int&, S>::value, "");
    static_assert(__is_assignable(int&, S), "");
  }

This is not a problem if __is_assignable is not meant to be used by
the users.

This patch doesn't make libstdc++ use the new built-ins, but I had to
rename a class otherwise its name would clash with the new built-in.

	PR c++/106784

gcc/c-family/ChangeLog:

	* c-common.cc (c_common_reswords): Add __is_convertible and
	__is_nothrow_convertible.
	* c-common.h (enum rid): Add RID_IS_CONVERTIBLE and
	RID_IS_NOTHROW_CONVERTIBLE.

gcc/cp/ChangeLog:

	* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONVERTIBLE
	and CPTK_IS_NOTHROW_CONVERTIBLE.
	* cp-objcp-common.cc (names_builtin_p): Handle RID_IS_CONVERTIBLE
	RID_IS_NOTHROW_CONVERTIBLE.
	* cp-tree.h (enum cp_trait_kind): Add CPTK_IS_CONVERTIBLE and
	CPTK_IS_NOTHROW_CONVERTIBLE.
	(is_convertible): Declare.
	(is_nothrow_convertible): Likewise.
	* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
	CPTK_IS_CONVERTIBLE and CPTK_IS_NOTHROW_CONVERTIBLE.
	* method.cc (is_convertible): New.
	(is_nothrow_convertible): Likewise.
	* parser.cc (cp_parser_primary_expression): Handle RID_IS_CONVERTIBLE
	and RID_IS_NOTHROW_CONVERTIBLE.
	(cp_parser_trait_expr): Likewise.
	* semantics.cc (trait_expr_value): Handle CPTK_IS_CONVERTIBLE and
	CPTK_IS_NOTHROW_CONVERTIBLE.
	(finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

	* include/std/type_traits: Rename __is_nothrow_convertible to
	__is_nothrow_convertible_lib.
	* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/has-builtin-1.C: Enhance to test __is_convertible and
	__is_nothrow_convertible.
	* g++.dg/ext/is_convertible1.C: New test.
	* g++.dg/ext/is_convertible2.C: New test.
	* g++.dg/ext/is_nothrow_convertible1.C: New test.
	* g++.dg/ext/is_nothrow_convertible2.C: New test.
jcmvbkbc pushed a commit that referenced this issue Nov 22, 2022
In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.

This patch adds an named function to allow us to emit an optimized sequence
when doing an unsigned division that is equivalent to:

   x = y / (2 ^ (bitsize (y)/2)-1)

For SVE2 this means we generate for:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
    pixel[i] = (pixel[i] * level) / 0xff;
}

the following:

        mov     z3.b, #1
.L3:
        ld1b    z0.h, p0/z, [x0, x3]
        mul     z0.h, p1/m, z0.h, z2.h
        addhnb  z1.b, z0.h, z3.h
        addhnb  z0.b, z0.h, z1.h
        st1b    z0.h, p0, [x0, x3]
        inch    x3
        whilelo p0.h, w3, w2
        b.any   .L3

instead of:

.L3:
        ld1b    z0.h, p1/z, [x0, x3]
        mul     z0.h, p0/m, z0.h, z1.h
        umulh   z0.h, p0/m, z0.h, z2.h
        lsr     z0.h, z0.h, #7
        st1b    z0.h, p1, [x0, x3]
        inch    x3
        whilelo p1.h, w3, w2
        b.any   .L3

Which results in significantly faster code.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.
jcmvbkbc pushed a commit that referenced this issue Nov 22, 2022
A friend declaration can only have constraints if it is defined.  If
multiple instantiations of a class template define the same friend function
signature, it's an error, but that shouldn't happen if it's constrained to
only be declared in one instantiation.

Currently we don't mangle requirements, so the foos all mangle the same and
actually instantiating #1 will break, but for now we can test that they're
considered distinct.

gcc/cp/ChangeLog:

	* pt.cc (tsubst_friend_function): Check satisfaction.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-friend11.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jan 14, 2023
While looking at PR 105549, which is about fixing the ABI break
introduced in GCC 9.1 in parameter alignment with bit-fields, we
noticed that the GCC 9.1 warning is not emitted in all the cases where
it should be.  This patch fixes that and the next patch in the series
fixes the GCC 9.1 break.

We split this into two patches since patch #2 introduces a new ABI
break starting with GCC 13.1.  This way, patch #1 can be back-ported
to release branches if needed to fix the GCC 9.1 warning issue.

The main idea is to add a new global boolean that indicates whether
we're expanding the start of a function, so that aarch64_layout_arg
can emit warnings for callees as well as callers.  This removes the
need for aarch64_function_arg_boundary to warn (with its incomplete
information).  However, in the first patch there are still cases where
we emit warnings were we should not; this is fixed in patch #2 where
we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly.

The fix in aarch64_function_arg_boundary (replacing & with &&) looks
like an oversight of a previous commit in this area which changed
'abi_break' from a boolean to an integer.

We also take the opportunity to fix the comment above
aarch64_function_arg_alignment since the value of the abi_break
parameter was changed in a previous commit, no longer matching the
description.

2022-11-28  Christophe Lyon  <christophe.lyon@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix
	comment.
	(aarch64_layout_arg): Factorize warning conditions.
	(aarch64_function_arg_boundary): Fix typo.
	* function.cc (currently_expanding_function_start): New variable.
	(expand_function_start): Handle
	currently_expanding_function_start.
	* function.h (currently_expanding_function_start): Declare.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test.
	* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New
	test.
	* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test.
	* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New
	test.
	* gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test.
	* gcc.target/aarch64/bitfield-abi-warning.h: New test.
	* g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test.
	* g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New
	test.
	* g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test.
	* g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New
	test.
	* g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test.
	* g++.target/aarch64/bitfield-abi-warning.h: New test.
jcmvbkbc pushed a commit that referenced this issue Jan 17, 2023
This recognises the patterns of the form:
  while (n & 1) { n >>= 1 }

Unfortunately there are currently two issues relating to this patch.

Firstly, simplify_using_initial_conditions does not recognise that
	(n != 0) and ((n & 1) == 0) implies that ((n >> 1) != 0).

This preconditions arise following the loop copy-header pass, and the
assumptions returned by number_of_iterations_exit_assumptions then
prevent final value replacement from using the niter result.

I'm not sure what is the best way to fix this - one approach could be to
modify simplify_using_initial_conditions to handle this sort of case,
but it seems that it basically wants the information that ranger could
give anway, so would something like that be a better option?

The second issue arises in the vectoriser, which is able to determine
that the niter->assumptions are always true.
When building with -march=armv8.4-a+sve -S -O3, we get this codegen:

foo (unsigned int b) {
    int c = 0;

    if (b == 0)
      return PREC;

    while (!(b & (1 << (PREC - 1)))) {
        b <<= 1;
        c++;
    }

    return c;
}

foo:
.LFB0:
        .cfi_startproc
        cmp     w0, 0
        cbz     w0, .L6
        blt     .L7
        lsl     w1, w0, 1
        clz     w2, w1
        cmp     w2, 14
        bls     .L8
        mov     x0, 0
        cntw    x3
        add     w1, w2, 1
        index   z1.s, #0, #1
        whilelo p0.s, wzr, w1
.L4:
        add     x0, x0, x3
        mov     p1.b, p0.b
        mov     z0.d, z1.d
        whilelo p0.s, w0, w1
        incw    z1.s
        b.any   .L4
        add     z0.s, z0.s, #1
        lastb   w0, p1, z0.s
        ret
        .p2align 2,,3
.L8:
        mov     w0, 0
        b       .L3
        .p2align 2,,3
.L13:
        lsl     w1, w1, 1
.L3:
        add     w0, w0, 1
        tbz     w1, #31, .L13
        ret
        .p2align 2,,3
.L6:
        mov     w0, 32
        ret
        .p2align 2,,3
.L7:
        mov     w0, 0
        ret
        .cfi_endproc

In essence, the vectoriser uses the niter information to determine
exactly how many iterations of the loop it needs to run. It then uses
SVE whilelo instructions to run this number of iterations. The original
loop counter is also vectorised, despite only being used in the final
iteration, and then the final value of this counter is used as the
return value (which is the same as the number of iterations it computed
in the first place).

This vectorisation is obviously bad, and I think it exposes a latent
bug in the vectoriser, rather than being an issue caused by this
specific patch.

gcc/ChangeLog:

	* tree-ssa-loop-niter.cc (number_of_iterations_cltz): New.
	(number_of_iterations_bitcount): Add call to the above.
	(number_of_iterations_exit_assumptions): Add EQ_EXPR case for
	c[lt]z idiom recognition.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/cltz-max.c: New test.
	* gcc.dg/tree-ssa/clz-char.c: New test.
	* gcc.dg/tree-ssa/clz-int.c: New test.
	* gcc.dg/tree-ssa/clz-long-long.c: New test.
	* gcc.dg/tree-ssa/clz-long.c: New test.
	* gcc.dg/tree-ssa/ctz-char.c: New test.
	* gcc.dg/tree-ssa/ctz-int.c: New test.
	* gcc.dg/tree-ssa/ctz-long-long.c: New test.
	* gcc.dg/tree-ssa/ctz-long.c: New test.
jcmvbkbc pushed a commit that referenced this issue Feb 13, 2023
Here the ahead-of-time overload set pruning in finish_call_expr is
unintentionally returning a CALL_EXPR whose (pruned) callee is wrapped
in an ADDR_EXPR, despite the original callee not being wrapped in an
ADDR_EXPR.  This ends up causing a bogus declaration mismatch error in
the below testcase because the call to min in #1 gets expressed as a
CALL_EXPR of ADDR_EXPR of FUNCTION_DECL, whereas the level-lowered call
to min in #2 gets expressed instead as a CALL_EXPR of FUNCTION_DECL.

This patch fixes this by stripping the spurious ADDR_EXPR appropriately.
Thus the first call to min now also gets expressed as a CALL_EXPR of
FUNCTION_DECL, matching the behavior before r12-6075-g2decd2cabe5a4f.

	PR c++/107461

gcc/cp/ChangeLog:

	* semantics.cc (finish_call_expr): Strip ADDR_EXPR from
	the selected callee during overload set pruning.

gcc/testsuite/ChangeLog:

	* g++.dg/template/call9.C: New test.
jcmvbkbc pushed a commit that referenced this issue Feb 13, 2023
After r13-5684-g59e0376f607805 the (pruned) callee of a non-dependent
CALL_EXPR is a bare FUNCTION_DECL rather than ADDR_EXPR of FUNCTION_DECL.
This innocent change revealed that cp_tree_equal doesn't first check
dependence of a CALL_EXPR before treating a FUNCTION_DECL callee as a
dependent name, which leads to us incorrectly accepting the first two
testcases below and rejecting the third:

 * In the first testcase, cp_tree_equal incorrectly returns true for
   the two non-dependent CALL_EXPRs f(0) and f(0) (whose CALL_EXPR_FN
   are different FUNCTION_DECLs) which causes us to treat #2 as a
   redeclaration of #1.

 * Same issue in the second testcase, for f<int*>() and f<char>().

 * In the third testcase, cp_tree_equal incorrectly returns true for
   f<int>() and f<void(*)(int)>() which causes us to conflate the two
   dependent specializations A<decltype(f<int>()(U()))> and
   A<decltype(f<void(*)(int)>()(U()))>.

This patch fixes this by making called_fns_equal treat two callees as
dependent names only if the overall CALL_EXPRs are dependent, via a new
convenience function call_expr_dependent_name that is like dependent_name
but also checks dependence of the overall CALL_EXPR.

	PR c++/107461

gcc/cp/ChangeLog:

	* cp-tree.h (call_expr_dependent_name): Declare.
	* pt.cc (iterative_hash_template_arg) <case CALL_EXPR>: Use
	call_expr_dependent_name instead of dependent_name.
	* tree.cc (call_expr_dependent_name): Define.
	(called_fns_equal): Adjust to take two CALL_EXPRs instead of
	CALL_EXPR_FNs thereof.  Use call_expr_dependent_name instead
	of dependent_name.
	(cp_tree_equal) <case CALL_EXPR>: Adjust call to called_fns_equal.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/overload5.C: New test.
	* g++.dg/cpp0x/overload5a.C: New test.
	* g++.dg/cpp0x/overload6.C: New test.
jcmvbkbc pushed a commit that referenced this issue Mar 13, 2023
-Wmismatched-tags warns about the (harmless) struct/class mismatch.
For, e.g.,

  template<typename T> struct A { };
  class A<int> a;

it works by adding A<T> to the class2loc hash table while parsing the
class-head and then, while parsing the elaborate type-specifier, we
add A<int>.  At the end of c_parse_file we go through the table and
warn about the class-key mismatches.  In this PR we crash though; we
have

  template<typename T> struct A {
    template<typename U> struct W { };
  };
  struct A<int>::W<int> w; // #1

where while parsing A and #1 we've stashed
   A<T>
   A<T>::W<U>
   A<int>::W<int>
into class2loc.  Then in class_decl_loc_t::diag_mismatched_tags TYPE
is A<int>::W<int>, and specialization_of gets us A<int>::W<U>, which
is not in class2loc, so we crash on gcc_assert (cdlguide).  But it's
OK not to have found A<int>::W<U>, we should just look one "level" up,
that is, A<T>::W<U>.

It's important to handle class specializations, so e.g.

  template<>
  struct A<char> {
    template<typename U>
    class W { };
  };

where W's class-key is different than in the primary template above,
so we should warn depending on whether we're looking into A<char>
or into a different instantiation.

	PR c++/106259

gcc/cp/ChangeLog:

	* parser.cc (class_decl_loc_t::diag_mismatched_tags): If the first
	lookup of SPEC didn't find anything, try to look for
	most_general_template.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wmismatched-tags-11.C: New test.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
Currently on xstormy16 SImode shifts by a single bit require two
instructions, and shifts by other non-zero integer immediate constants
require five instructions.  This patch implements the obvious optimization
that shifts by two bits can be done in four instructions, by using two
single-bit sequences.

Hence, ashift_2 was previously generated as:
        mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,#14 | or r3,r7
        ret
and with this patch we now generate:
        shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1
        ret

2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.cc (xstormy16_output_shift): Implement
	SImode shifts by two by performing a single bit SImode shift twice.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/shiftsi.c: New test case.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
This patch contains some minor tweak to xstormy16's machine description
most significantly providing a pattern for HImode rotate left by a single
bit that requires only two instructions.

unsigned short foo(unsigned short x)
{
  return (x << 1) | (x >> 15);
}

currently with -O2 generates:
foo:    mov r7,r2
        shr r7,#15
        shl r2,#1
        or r2,r7
        ret

with this patch, GCC now generates:
foo:	shl r2,#1 | adc r2,#0
        ret

Additionally neghi2 is converted to a define_insn (so that the RTL
optimizers see the negation semantics), and HImode rotations by
8-bits can now be recognized and implemented using swpb.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.md (neghi2): Convert from a define_expand
	to a define_insn.
	(*rotatehi_1): New define_insn for efficient 2 insn sequence.
	(*rotatehi_8, *rotaterthi_8): New define_insn to emit a swpb.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/neghi2.c: New test case.
	* gcc.target/xstormy16/rotatehi-1.c: Likewise.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
jcmvbkbc pushed a commit that referenced this issue May 22, 2023
Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrs	r3, FPSCR_nzcvqc
bic	r3, r3, #536870912
orr	r3, r3, r2, lsl #29
vmsr	FPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

	* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
	(__arm_vadcq_u32): Likewise.
	(__arm_vadcq_m_s32): Likewise.
	(__arm_vadcq_m_u32): Likewise.
	(__arm_vsbcq_s32): Likewise.
	(__arm_vsbcq_u32): Likewise.
	(__arm_vsbcq_m_s32): Likewise.
	(__arm_vsbcq_m_u32): Likewise.
	* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.

gcc/testsuite/ChangeLog:
	* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.
jcmvbkbc pushed a commit that referenced this issue May 22, 2023
This patch decreses one machine instruction from "single bit extraction
with shifting" operation, and tries to eliminate the conditional
branch if CST2_POW2 doesn't fit into signed 12 bits with the help
of ifcvt optimization.

    /* example #1 */
    int test0(int x) {
      return (x & 1048576) != 0 ? 1024 : 0;
    }
    extern int foo(void);
    int test1(void) {
      return (foo() & 1048576) != 0 ? 16777216 : 0;
    }

    ;; before
    test0:
	movi	a9, 0x400
	srai	a2, a2, 10
	and	a2, a2, a9
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	extui	a2, a2, 20, 1
	slli	a2, a2, 20
	beqz.n	a2, .L2
	movi.n	a2, 1
	slli	a2, a2, 24
    .L2:
	l32i.n	a0, sp, 12
	addi	sp, sp, 16
	ret.n

    ;; after
    test0:
	extui	a2, a2, 20, 1
	slli	a2, a2, 10
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	l32i.n	a0, sp, 12
	extui	a2, a2, 20, 1
	slli	a2, a2, 24
	addi	sp, sp, 16
	ret.n

In addition, if the left shift amount ('exact_log2(CST2_POW2)') is
between 1 through 3 and a either addition or subtraction with another
register follows, emit a ADDX[248] or SUBX[248] machine instruction
instead of separate left shift and add/subtract ones.

    /* example #2 */
    int test2(int x, int y) {
      return ((x & 1048576) != 0 ? 4 : 0) + y;
    }
    int test3(int x, int y) {
      return ((x & 2) != 0 ? 8 : 0) - y;
    }

    ;; before
    test2:
	movi.n	a9, 4
	srai	a2, a2, 18
	and	a2, a2, a9
	add.n	a2, a2, a3
	ret.n
    test3:
	movi.n	a9, 8
	slli	a2, a2, 2
	and	a2, a2, a9
	sub	a2, a2, a3
	ret.n

    ;; after
    test2:
	extui	a2, a2, 20, 1
	addx4	a2, a2, a3
	ret.n
    test3:
	extui	a2, a2, 1, 1
	subx8	a2, a2, a3
	ret.n

gcc/ChangeLog:

	* config/xtensa/predicates.md (addsub_operator): New.
	* config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3,
	*extzvsi-1bit_addsubx): New insn_and_split patterns.
	* config/xtensa/xtensa.cc (xtensa_rtx_costs):
	Add a special case about ifcvt 'noce_try_cmove()' to handle
	constant loads that do not fit into signed 12 bits in the
	patterns added above.
jcmvbkbc pushed a commit that referenced this issue May 23, 2023
This patch decreses one machine instruction from "single bit extraction
with shifting" operation, and tries to eliminate the conditional
branch if CST2_POW2 doesn't fit into signed 12 bits with the help
of ifcvt optimization.

    /* example #1 */
    int test0(int x) {
      return (x & 1048576) != 0 ? 1024 : 0;
    }
    extern int foo(void);
    int test1(void) {
      return (foo() & 1048576) != 0 ? 16777216 : 0;
    }

    ;; before
    test0:
	movi	a9, 0x400
	srai	a2, a2, 10
	and	a2, a2, a9
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	extui	a2, a2, 20, 1
	slli	a2, a2, 20
	beqz.n	a2, .L2
	movi.n	a2, 1
	slli	a2, a2, 24
    .L2:
	l32i.n	a0, sp, 12
	addi	sp, sp, 16
	ret.n

    ;; after
    test0:
	extui	a2, a2, 20, 1
	slli	a2, a2, 10
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	l32i.n	a0, sp, 12
	extui	a2, a2, 20, 1
	slli	a2, a2, 24
	addi	sp, sp, 16
	ret.n

In addition, if the left shift amount ('exact_log2(CST2_POW2)') is
between 1 through 3 and a either addition or subtraction with another
register follows, emit a ADDX[248] or SUBX[248] machine instruction
instead of separate left shift and add/subtract ones.

    /* example #2 */
    int test2(int x, int y) {
      return ((x & 1048576) != 0 ? 4 : 0) + y;
    }
    int test3(int x, int y) {
      return ((x & 2) != 0 ? 8 : 0) - y;
    }

    ;; before
    test2:
	movi.n	a9, 4
	srai	a2, a2, 18
	and	a2, a2, a9
	add.n	a2, a2, a3
	ret.n
    test3:
	movi.n	a9, 8
	slli	a2, a2, 2
	and	a2, a2, a9
	sub	a2, a2, a3
	ret.n

    ;; after
    test2:
	extui	a2, a2, 20, 1
	addx4	a2, a2, a3
	ret.n
    test3:
	extui	a2, a2, 1, 1
	subx8	a2, a2, a3
	ret.n

gcc/ChangeLog:

	* config/xtensa/predicates.md (addsub_operator): New.
	* config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3,
	*extzvsi-1bit_addsubx): New insn_and_split patterns.
	* config/xtensa/xtensa.cc (xtensa_rtx_costs):
	Add a special case about ifcvt 'noce_try_cmove()' to handle
	constant loads that do not fit into signed 12 bits in the
	patterns added above.
jcmvbkbc pushed a commit that referenced this issue May 23, 2023
This patch decreses one machine instruction from "single bit extraction
with shifting" operation, and tries to eliminate the conditional
branch if CST2_POW2 doesn't fit into signed 12 bits with the help
of ifcvt optimization.

    /* example #1 */
    int test0(int x) {
      return (x & 1048576) != 0 ? 1024 : 0;
    }
    extern int foo(void);
    int test1(void) {
      return (foo() & 1048576) != 0 ? 16777216 : 0;
    }

    ;; before
    test0:
	movi	a9, 0x400
	srai	a2, a2, 10
	and	a2, a2, a9
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	extui	a2, a2, 20, 1
	slli	a2, a2, 20
	beqz.n	a2, .L2
	movi.n	a2, 1
	slli	a2, a2, 24
    .L2:
	l32i.n	a0, sp, 12
	addi	sp, sp, 16
	ret.n

    ;; after
    test0:
	extui	a2, a2, 20, 1
	slli	a2, a2, 10
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	l32i.n	a0, sp, 12
	extui	a2, a2, 20, 1
	slli	a2, a2, 24
	addi	sp, sp, 16
	ret.n

In addition, if the left shift amount ('exact_log2(CST2_POW2)') is
between 1 through 3 and a either addition or subtraction with another
register follows, emit a ADDX[248] or SUBX[248] machine instruction
instead of separate left shift and add/subtract ones.

    /* example #2 */
    int test2(int x, int y) {
      return ((x & 1048576) != 0 ? 4 : 0) + y;
    }
    int test3(int x, int y) {
      return ((x & 2) != 0 ? 8 : 0) - y;
    }

    ;; before
    test2:
	movi.n	a9, 4
	srai	a2, a2, 18
	and	a2, a2, a9
	add.n	a2, a2, a3
	ret.n
    test3:
	movi.n	a9, 8
	slli	a2, a2, 2
	and	a2, a2, a9
	sub	a2, a2, a3
	ret.n

    ;; after
    test2:
	extui	a2, a2, 20, 1
	addx4	a2, a2, a3
	ret.n
    test3:
	extui	a2, a2, 1, 1
	subx8	a2, a2, a3
	ret.n

gcc/ChangeLog:

	* config/xtensa/predicates.md (addsub_operator): New.
	* config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3,
	*extzvsi-1bit_addsubx): New insn_and_split patterns.
	* config/xtensa/xtensa.cc (xtensa_rtx_costs):
	Add a special case about ifcvt 'noce_try_cmove()' to handle
	constant loads that do not fit into signed 12 bits in the
	patterns added above.
jcmvbkbc pushed a commit that referenced this issue Jun 5, 2023
…n S[IF]mode

This patch optimizes the boolean evaluation of EQ/NE against zero
by adding two insn_and_split patterns similar to SImode conditional
store:

"eq_zero":
	op0 = (op1 == 0) ? 1 : 0;
	op0 = clz(op1) >> 5;  /* optimized (requires TARGET_NSA) */

"movsicc_ne0_reg_0":
	op0 = (op1 != 0) ? op2 : 0;
	op0 = op2; if (op1 == 0) ? op0 = op1;  /* optimized */

    /* example #1 */
    int bool_eqSI(int x) {
      return x == 0;
    }
    int bool_neSI(int x) {
      return x != 0;
    }

    ;; after (TARGET_NSA)
    bool_eqSI:
	nsau	a2, a2
	srli	a2, a2, 5
	ret.n
    bool_neSI:
	mov.n	a9, a2
	movi.n	a2, 1
	moveqz	a2, a9, a9
	ret.n

These also work in SFmode by ignoring their sign bits, and further-
more, the branch if EQ/NE against zero in SFmode is also done in the
same manner.

The reasons for this optimization in SFmode are:

  - Only zero values (negative or non-negative) contain no bits of 1
    with both the exponent and the mantissa.
  - EQ/NE comparisons involving NaNs produce no signal even if they
    are signaling.
  - Even if the use of IEEE 754 single-precision floating-point co-
    processor is configured (TARGET_HARD_FLOAT is true):
	1. Load zero value to FP register
        2. Possibly, additional FP move if the comparison target is
	   an address register
	3. FP equality check instruction
	4. Read the boolean register containing the result, or condi-
	   tional branch
    As noted above, a considerable number of instructions are still
    generated.

    /* example #2 */
    int bool_eqSF(float x) {
      return x == 0;
    }
    int bool_neSF(float x) {
      return x != 0;
    }
    int bool_ltSF(float x) {
      return x < 0;
    }
    extern void foo(void);
    void cb_eqSF(float x) {
      if(x != 0)
        foo();
    }
    void cb_neSF(float x) {
      if(x == 0)
        foo();
    }
    void cb_geSF(float x) {
      if(x < 0)
        foo();
    }

    ;; after
    ;; (TARGET_NSA, TARGET_BOOLEANS and TARGET_HARD_FLOAT)
    bool_eqSF:
	add.n	a2, a2, a2
	nsau	a2, a2
	srli	a2, a2, 5
	ret.n
    bool_neSF:
	add.n	a9, a2, a2
	movi.n	a2, 1
	moveqz	a2, a9, a9
	ret.n
    bool_ltSF:
	movi.n	a9, 0
	wfr	f0, a2
	wfr	f1, a9
	olt.s	b0, f0, f1
	movi.n	a9, 0
	movi.n	a2, 1
	movf	a2, a9, b0
	ret.n
    cb_eqSF:
	add.n	a2, a2, a2
	beqz.n	a2, .L6
	j.l	foo, a9
    .L6:
	ret.n
    cb_neSF:
	add.n	a2, a2, a2
	bnez.n	a2, .L8
	j.l	foo, a9
    .L8:
	ret.n
    cb_geSF:
	addi	sp, sp, -16
	movi.n	a3, 0
	s32i.n	a12, sp, 8
	s32i.n	a0, sp, 12
	mov.n	a12, a2
	call0	__unordsf2
	bnez.n	a2, .L10
	movi.n	a3, 0
	mov.n	a2, a12
	call0	__gesf2
	bnei	a2, -1, .L10
	l32i.n	a0, sp, 12
	l32i.n	a12, sp, 8
	addi	sp, sp, 16
	j.l	foo, a9
    .L10:
	l32i.n	a0, sp, 12
	l32i.n	a12, sp, 8
	addi	sp, sp, 16
	ret.n

gcc/ChangeLog:

	* config/xtensa/predicates.md (const_float_0_operand):
	Rename from obsolete "const_float_1_operand" and change the
	constant to compare.
	(cstoresf_cbranchsf_operand, cstoresf_cbranchsf_operator):
	New.
	* config/xtensa/xtensa.cc (xtensa_expand_conditional_branch):
	Add code for EQ/NE comparison with constant zero in SFmode.
	(xtensa_expand_scc): Added code to derive boolean evaluation
	of EQ/NE with constant zero for comparison in SFmode.
	(xtensa_rtx_costs): Change cost of CONST_DOUBLE with value
	zero inside "cbranchsf4" to 0.
	* config/xtensa/xtensa.md (cbranchsf4, cstoresf4):
	Change "match_operator" and the third "match_operand" to the
	ones mentioned above.
	(movsicc_ne0_reg_zero, eq_zero): New.
jcmvbkbc pushed a commit that referenced this issue Sep 5, 2023
This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

  gcc_assert (!ctx->object || !DECL_P (ctx->object)
              || ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

  new_ctx.ctor = build_constructor (elem_type, NULL); // #1

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

  new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

We can fix the problem by always clearing the object, and, as an
optimization, only create/free a new ctor when actually needed.

	PR c++/110382

gcc/cp/ChangeLog:

	* constexpr.cc (cxx_eval_array_reference): Create a new constructor
	only when we don't already have a matching one.  Clear the object
	when the type is non-scalar.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1y/constexpr-110382.C: New test.
jcmvbkbc pushed a commit that referenced this issue Nov 11, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
jcmvbkbc pushed a commit that referenced this issue Dec 31, 2023
Since the last import from upstream libsanitizer, the output has changed
and now looks more like this:

READ of size 6 at 0x7ff7beb2a144 thread T0
    #0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) sanitizer_common_interceptors.inc:813
    #1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840
    #2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f)

so let's adjust the pattern accordingly.

gcc/testsuite/ChangeLog:

	* c-c++-common/asan/memcmp-1.c: Adjust pattern on darwin.
jcmvbkbc pushed a commit that referenced this issue Dec 31, 2023
During partial ordering, we want to look through dependent alias
template specializations within template arguments and otherwise
treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108
and r11-7011-g6e0a231a4aa240).  To that end template_args_equal was
given a partial_order flag that controls this behavior.  This flag
does the right thing when a dependent alias template specialization
appears as template argument of the partial specialization, e.g. in

  template<class T, class...> using first_t = T;
  template<class T> struct traits;
  template<class T> struct traits<first_t<T, T&>> { }; // #1
  template<class T> struct traits<first_t<const T, T&>> { }; // #2

we correctly consider #2 to be more specialized than #1.  But if the
alias specialization appears as a nested template argument of another
class template specialization, e.g. in

  template<class T> struct traits<A<first_t<T, T&>>> { }; // #1
  template<class T> struct traits<A<first_t<const T, T&>>> { }; // #2

then we incorrectly consider #1 and #2 to be unordered.  This is because

  1. we don't propagate the flag to recursive template_args_equal calls
  2. we don't use structural equality for class template specializations
     written in terms of dependent alias template specializations

This patch fixes the first issue by turning the partial_order flag into
a global.  This patch fixes the second issue by making us propagate
structural equality appropriately when building a class template
specialization.  In passing this patch also improves hashing of
specializations that use structural equality.

	PR c++/90679

gcc/cp/ChangeLog:

	* cp-tree.h (comp_template_args): Remove partial_order parameter.
	(template_args_equal): Likewise.
	* pt.cc (comparing_for_partial_ordering): New global flag.
	(iterative_hash_template_arg) <case tcc_type>: Hash the template
	and arguments for specializations that use structural equality.
	(template_args_equal): Remove partial order parameter and
	use comparing_for_partial_ordering instead.
	(comp_template_args): Likewise.
	(comp_template_args_porder): Set comparing_for_partial_ordering
	instead.  Make static.
	(any_template_arguments_need_structural_equality_p): Return true
	for an argument that's a dependent alias template specialization
	or a class template specialization that itself needs structural
	equality.
	* tree.cc (cp_tree_equal) <case TREE_VEC>: Adjust call to
	comp_template_args.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/alias-decl-75a.C: New test.
	* g++.dg/cpp0x/alias-decl-75b.C: New test.
jcmvbkbc pushed a commit that referenced this issue Dec 31, 2023
Hi All,

This patch adds initial support for early break vectorization in GCC. In other
words it implements support for vectorization of loops with multiple exits.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR/Bitwise AND.  This is however only checked
then the comparison would produce multiple statements.

This also fully decouples the vectorizer's notion of exit from the existing loop
infrastructure's exit.  Before this patch the vectorizer always picked the
natural loop latch connected exit as the main exit.

After this patch the vectorizer is free to choose any exit it deems appropriate
as the main exit.  This means that even if the main exit is not countable (i.e.
the termination condition could not be determined) we might still be able to
vectorize should one of the other exits be countable.

In such situations the loop is reflowed which enabled vectorization of many
other loop forms.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   <statements1>
   if (<condition>)
     {
       ...
       <action>;
     }
   <statements2>
 }

where <action> can be:
 - break
 - return
 - goto

Any number of statements can be used before the <action> occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC 15.
- any stores in <statements1> should not be to the same objects as in
  <condition>.  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Peeling code supports it but the code motion code cannot
  find instructions to make the move in the epilog.
- Early breaks are only supported for inner loop vectorization.

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in <action>.

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

That is to say, the first vector loop executes so long as the early exit isn't
needed.  Once the exit is taken, the scalar code will perform at most VF extra
iterations.  The exact number depending on peeling and iteration start and which
exit was taken (natural or early).   For this scalar loop, all early exits are
treated the same.

When we vectorize we move any statement not related to the early break itself
and that would be incorrect to execute before the break (i.e. has side effects)
to after the break.  If this is not possible we decline to vectorize.  The
analysis and code motion also takes into account that it doesn't introduce a RAW
dependency after the move of the stores.

This means that we check at the start of iterations whether we are going to exit
or not.  During the analyis phase we check whether we are allowed to do this
moving of statements.  Also note that we only move the scalar statements, but
only do so after peeling but just before we start transforming statements.

With this the vector flow no longer necessarily needs to match that of the
scalar code.  In addition most of the infrastructure is in place to support
general control flow safely, however we are punting this to GCC 15.

Codegen:

for e.g.

unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
     break;
   vect_a[i] = x;

 }
 return ret;
}

We generate for Adv. SIMD:

test4:
        adrp    x2, .LC0
        adrp    x3, .LANCHOR0
        dup     v2.4s, w0
        add     x3, x3, :lo12:.LANCHOR0
        movi    v4.4s, 0x4
        add     x4, x3, 3216
        ldr     q1, [x2, #:lo12:.LC0]
        mov     x1, 0
        mov     w2, 0
        .p2align 3,,7
.L3:
        ldr     q0, [x3, x1]
        add     v3.4s, v1.4s, v2.4s
        add     v1.4s, v1.4s, v4.4s
        cmhi    v0.4s, v0.4s, v2.4s
        umaxp   v0.4s, v0.4s, v0.4s
        fmov    x5, d0
        cbnz    x5, .L6
        add     w2, w2, 1
        str     q3, [x1, x4]
        str     q2, [x3, x1]
        add     x1, x1, 16
        cmp     w2, 200
        bne     .L3
        mov     w7, 3
.L2:
        lsl     w2, w2, 2
        add     x5, x3, 3216
        add     w6, w2, w0
        sxtw    x4, w2
        ldr     w1, [x3, x4, lsl 2]
        str     w6, [x5, x4, lsl 2]
        cmp     w0, w1
        bcc     .L4
        add     w1, w2, 1
        str     w0, [x3, x4, lsl 2]
        add     w6, w1, w0
        sxtw    x1, w1
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        add     w4, w2, 2
        str     w0, [x3, x1, lsl 2]
        sxtw    x1, w4
        add     w6, w1, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w6, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
        add     w2, w2, 3
        cmp     w7, 3
        beq     .L4
        sxtw    x1, w2
        add     w2, w2, w0
        ldr     w4, [x3, x1, lsl 2]
        str     w2, [x5, x1, lsl 2]
        cmp     w0, w4
        bcc     .L4
        str     w0, [x3, x1, lsl 2]
.L4:
        mov     w0, 0
        ret
        .p2align 2,,3
.L6:
        mov     w7, 4
        b       .L2

and for SVE:

test4:
        adrp    x2, .LANCHOR0
        add     x2, x2, :lo12:.LANCHOR0
        add     x5, x2, 3216
        mov     x3, 0
        mov     w1, 0
        cntw    x4
        mov     z1.s, w0
        index   z0.s, #0, #1
        ptrue   p1.b, all
        ptrue   p0.s, all
        .p2align 3,,7
.L3:
        ld1w    z2.s, p1/z, [x2, x3, lsl 2]
        add     z3.s, z0.s, z1.s
        cmplo   p2.s, p0/z, z1.s, z2.s
        b.any   .L2
        st1w    z3.s, p1, [x5, x3, lsl 2]
        add     w1, w1, 1
        st1w    z1.s, p1, [x2, x3, lsl 2]
        add     x3, x3, x4
        incw    z0.s
        cmp     w3, 803
        bls     .L3
.L5:
        mov     w0, 0
        ret
        .p2align 2,,3
.L2:
        cntw    x5
        mul     w1, w1, w5
        cbz     w5, .L5
        sxtw    x1, w1
        sub     w5, w5, #1
        add     x5, x5, x1
        add     x6, x2, 3216
        b       .L6
        .p2align 2,,3
.L14:
        str     w0, [x2, x1, lsl 2]
        cmp     x1, x5
        beq     .L5
        mov     x1, x4
.L6:
        ldr     w3, [x2, x1, lsl 2]
        add     w4, w0, w1
        str     w4, [x6, x1, lsl 2]
        add     x4, x1, 1
        cmp     w0, w3
        bcs     .L14
        mov     w0, 0
        ret

On the workloads this work is based on we see between 2-3x performance uplift
using this patch.

Follow up plan:
 - Boolean vectorization has several shortcomings.  I've filed PR110223 with the
   bigger ones that cause vectorization to fail with this patch.
 - SLP support.  This is planned for GCC 15 as for majority of the cases build
   SLP itself fails.  This means I'll need to spend time in making this more
   robust first.  Additionally it requires:
     * Adding support for vectorizing CFG (gconds)
     * Support for CFG to differ between vector and scalar loops.
   Both of which would be disruptive to the tree and I suspect I'll be handling
   fallouts from this patch for a while.  So I plan to work on the surrounding
   building blocks first for the remainder of the year.

Additionally it also contains reduced cases from issues found running over
various codebases.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Also regtested with:
 -march=armv8.3-a+sve
 -march=armv8.3-a+nosve
 -march=armv9-a
 -mcpu=neoverse-v1
 -mcpu=neoverse-n2

Bootstrapped Regtested x86_64-pc-linux-gnu and no issues.
Bootstrap and Regtest on arm-none-linux-gnueabihf and no issues.

gcc/ChangeLog:

	* tree-if-conv.cc (idx_within_array_bound): Expose.
	* tree-vect-data-refs.cc (vect_analyze_early_break_dependences): New.
	(vect_analyze_data_ref_dependences): Use it.
	* tree-vect-loop-manip.cc (vect_iv_increment_position): New.
	(vect_set_loop_controls_directly,
	vect_set_loop_condition_partial_vectors,
	vect_set_loop_condition_partial_vectors_avx512,
	vect_set_loop_condition_normal): Support multiple exits.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Support LCSAA peeling for
	multiple exits.
	(slpeel_can_duplicate_loop_p): Change vectorizer from looking at BB
	count and instead look at loop shape.
	(vect_update_ivs_after_vectorizer): Drop asserts.
	(vect_gen_vector_loop_niters_mult_vf): Support peeled vector iterations.
	(vect_do_peeling): Support multiple exits.
	(vect_loop_versioning): Likewise.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialise
	early_breaks.
	(vect_analyze_loop_form): Support loop flows with more than single BB
	loop body.
	(vect_create_loop_vinfo): Support niters analysis for multiple exits.
	(vect_analyze_loop): Likewise.
	(vect_get_vect_def): New.
	(vect_create_epilog_for_reduction): Support early exit reductions.
	(vectorizable_live_operation_1): New.
	(find_connected_edge): New.
	(vectorizable_live_operation): Support early exit live operations.
	(move_early_exit_stmts): New.
	(vect_transform_loop): Use it.
	* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gcond.
	(vect_recog_bitfield_ref_pattern): Support gconds and bools.
	(vect_recog_gcond_pattern): New.
	(possible_vector_mask_operation_p): Support gcond masks.
	(vect_determine_mask_precision): Likewise.
	(vect_mark_pattern_stmts): Set gcond def type.
	(can_vectorize_live_stmts): Force early break inductions to be live.
	* tree-vect-stmts.cc (vect_stmt_relevant_p): Add relevancy analysis for
	early breaks.
	(vect_mark_stmts_to_be_vectorized): Process gcond usage.
	(perm_mask_for_reverse): Expose.
	(vectorizable_comparison_1): New.
	(vectorizable_early_exit): New.
	(vect_analyze_stmt): Support early break and gcond.
	(vect_transform_stmt): Likewise.
	(vect_is_simple_use): Likewise.
	(vect_get_vector_types_for_stmt): Likewise.
	* tree-vectorizer.cc (pass_vectorize::execute): Update exits for value
	numbering.
	* tree-vectorizer.h (enum vect_def_type): Add vect_condition_def.
	(LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_EARLY_BRK_STORES,
	LOOP_VINFO_EARLY_BREAKS_VECT_PEELED, LOOP_VINFO_EARLY_BRK_DEST_BB,
	LOOP_VINFO_EARLY_BRK_VUSES): New.
	(is_loop_header_bb_p): Drop assert.
	(class loop): Add early_breaks, early_break_stores, early_break_dest_bb,
	early_break_vuses.
	(vect_iv_increment_position, perm_mask_for_reverse,
	ref_within_array_bound): New.
	(slpeel_tree_duplicate_loop_to_edge_cfg): Update for early breaks.
jcmvbkbc pushed a commit that referenced this issue Feb 7, 2024
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.

This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board.  Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.

In the bt_find_func and bt_skip_func cases we have  something like this:

> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
>         (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
>      (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
>      (nil))

[ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
>      (nil))

Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split.  But there's a much better approach if we look at fwprop...

(set (reg:DI 142 [ _1 ])
    (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
        (reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)

So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled.  But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG.  That results in the RTL above
having too high a cost and fwprop gives up.

One approach would be to replace the REG_P  with REG_P || SUBREG_P in the
costing code.  I ultimately decided against that and instead check if the
operand in question passes register_operand.

By far the most important case to handle is the DImode PLUS.  But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well.  For
those other cases we're talking about improvements in the .000001% range.

While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector).  So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
	similarly.

gcc/testsuite/

	* gcc.target/riscv/reg_subreg_costs.c: New test.

	Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
jcmvbkbc pushed a commit that referenced this issue May 11, 2024
We evaluate constexpr functions on the original, pre-genericization bodies.
That means that the function body we're evaluating will not have gone
through cp_genericize_r's "Map block scope extern declarations to visible
declarations with the same name and type in outer scopes if any".  Here:

  constexpr bool bar() { return true; } // #1
  constexpr bool foo() {
    constexpr bool bar(void); // #2
    return bar();
  }

it means that we:
1) register_constexpr_fundef (#1)
2) cp_genericize (#1)
   nothing interesting happens
3) register_constexpr_fundef (foo)
   does copy_fn, so we have two copies of the BIND_EXPR
4) cp_genericize (foo)
   this remaps #2 to #1, but only on one copy of the BIND_EXPR
5) retrieve_constexpr_fundef (foo)
   we find it, no problem
6) retrieve_constexpr_fundef (#2)
   and here #2 isn't found in constexpr_fundef_table, because
   we're working on the BIND_EXPR copy where #2 wasn't mapped to #1
   so we fail.  We've only registered #1.

It should work to use DECL_LOCAL_DECL_ALIAS (which used to be
extern_decl_map).  We evaluate constexpr functions on pre-cp_fold
bodies to avoid diagnostic problems, but the remapping I'm proposing
should not interfere with diagnostics.

This is not a problem for a global scope redeclaration; there we go
through duplicate_decls which keeps the DECL_UID:
  DECL_UID (olddecl) = olddecl_uid;
and DECL_UID is what constexpr_fundef_hasher::hash uses.

	PR c++/111132

gcc/cp/ChangeLog:

	* constexpr.cc (get_function_named_in_call): Use
	cp_get_fndecl_from_callee.
	* cvt.cc (cp_get_fndecl_from_callee): If there's a
	DECL_LOCAL_DECL_ALIAS, use it.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/constexpr-redeclaration3.C: New test.
	* g++.dg/cpp0x/constexpr-redeclaration4.C: New test.
jcmvbkbc pushed a commit that referenced this issue May 11, 2024
aarch64-sve.md had a pattern that combined:

	cmpeq	pb.T, pa/z, zc.T, #0
	mov	zd.T, pb/z, #1

into:

	cnot	zd.T, pa/m, zc.T

But this is only valid if pa.T is a ptrue.  In other cases, the
original would set inactive elements of zd.T to 0, whereas the
combined form would copy elements from zc.T.

gcc/
	PR target/114603
	* config/aarch64/aarch64-sve.md (@aarch64_pred_cnot<mode>): Replace
	with...
	(@aarch64_ptrue_cnot<mode>): ...this, requiring operand 1 to be
	a ptrue.
	(*cnot<mode>): Require operand 1 to be a ptrue.
	* config/aarch64/aarch64-sve-builtins-base.cc (svcnot_impl::expand):
	Use aarch64_ptrue_cnot<mode> for _x operations that are predicated
	with a ptrue.  Represent other _x operations as fully-defined _m
	operations.

gcc/testsuite/
	PR target/114603
	* gcc.target/aarch64/sve/acle/general/cnot_1.c: New test.
jcmvbkbc pushed a commit that referenced this issue May 11, 2024
…. [PR114741]

In PR114741 we see that we have a regression in codegen when SVE is enable where
the simple testcase:

void foo(unsigned v, unsigned *p)
{
    *p = v & 1;
}

generates

foo:
        fmov    s31, w0
        and     z31.s, z31.s, #1
        str     s31, [x1]
        ret

instead of:

foo:
        and     w0, w0, 1
        str     w0, [x1]
        ret

This causes an impact it not just codesize but also performance.  This is caused
by the use of the ^ constraint modifier in the pattern <optab><mode>3.

The documentation states that this modifier should only have an effect on the
alternative costing in that a particular alternative is to be preferred unless
a non-psuedo reload is needed.

The pattern was trying to convey that whenever both r and w are required, that
it should prefer r unless a reload is needed.  This is because if a reload is
needed then we can construct the constants more flexibly on the SIMD side.

We were using this so simplify the implementation and to get generic cases such
as:

double negabs (double x)
{
   unsigned long long y;
   memcpy (&y, &x, sizeof(double));
   y = y | (1UL << 63);
   memcpy (&x, &y, sizeof(double));
   return x;
}

which don't go through an expander.
However the implementation of ^ in the register allocator is not according to
the documentation in that it also has an effect during coloring.  During initial
register class selection it applies a penalty to a class, similar to how ? does.

In this example the penalty makes the use of GP regs expensive enough that it no
longer considers them:

    r106: preferred FP_REGS, alternative NO_REGS, allocno FP_REGS
;;        3--> b  0: i   9 r106=r105&0x1
    :cortex_a53_slot_any:GENERAL_REGS+0(-1)FP_REGS+1(1)PR_LO_REGS+0(0)
                         PR_HI_REGS+0(0):model 4

which is not the expected behavior.  For GCC 14 this is a conservative fix.

1. we remove the ^ modifier from the logical optabs.

2. In order not to regress copysign we then move the copysign expansion to
   directly use the SIMD variant.  Since copysign only supports floating point
   modes this is fine and no longer relies on the register allocator to select
   the right alternative.

It once again regresses the general case, but this case wasn't optimized in
earlier GCCs either so it's not a regression in GCC 14.  This change gives
strict better codegen than earlier GCCs and still optimizes the important cases.

gcc/ChangeLog:

	PR target/114741
	* config/aarch64/aarch64.md (<optab><mode>3): Remove ^ from alt 2.
	(copysign<GPF:mode>3): Use SIMD version of IOR directly.

gcc/testsuite/ChangeLog:

	PR target/114741
	* gcc.target/aarch64/fneg-abs_2.c: Update codegen.
	* gcc.target/aarch64/fneg-abs_4.c: xfail for now.
	* gcc.target/aarch64/pr114741.c: New test.
jcmvbkbc pushed a commit that referenced this issue May 30, 2024
The PR complains that

  void do_something(){
    #pragma GCC diagnostic push
    #pragma GCC diagnostic ignored "-Wunused-label"
    start:;
    #pragma GCC diagnostic pop
  } #1

doesn't work.  That's because we warn_for_unused_label only while we're
in finish_function, meaning we're at #1 where we're outside the #pragma
region.  We can use suppress_warning + warning_suppressed_p to fix this.

Note that I'm not using TREE_USED.  Propagating it in tsubst_stmt/LABEL_EXPR
from decl to label would mean that we don't warn in do_something2, but
I think we want the warning there: we're in a template and the goto is
a discarded statement.

	PR c++/113582

gcc/c-family/ChangeLog:

	* c-warn.cc (warn_for_unused_label): Don't warn if -Wunused-label has
	been suppressed for the label.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_label_for_labeled_statement): suppress_warning
	if it's not enabled at input_location.
	* pt.cc (tsubst_stmt): Call copy_warning.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/Wunused-label-4.C: New test.
jcmvbkbc pushed a commit that referenced this issue May 30, 2024
This patch would like to fix below format issue of trailing operator.

=== ERROR type #1: trailing operator (4 error(s)) ===
gcc/config/riscv/riscv-vector-builtins.cc:4641:39:  if ((exts &
RVV_REQUIRE_ELEN_FP_16) &&
gcc/config/riscv/riscv-vector-builtins.cc:4651:39:  if ((exts &
RVV_REQUIRE_ELEN_FP_32) &&
gcc/config/riscv/riscv-vector-builtins.cc:4661:39:  if ((exts &
RVV_REQUIRE_ELEN_FP_64) &&
gcc/config/riscv/riscv-vector-builtins.cc:4670:36:  if ((exts &
RVV_REQUIRE_ELEN_64) &&

Passed the ./contrib/check_GNU_style.sh for this patch,  and double
checked there is no other format issue of the original patch.

Committed as format change.

gcc/ChangeLog:

	* config/riscv/riscv-vector-builtins.cc
	(validate_instance_type_required_extensions): Remove the
	operator from the trailing and put it to new line.

Signed-off-by: Pan Li <pan2.li@intel.com>
jcmvbkbc pushed a commit that referenced this issue Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
jcmvbkbc pushed a commit that referenced this issue Jul 15, 2024
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.

With this patch, we now generate the following for V8HI:
  cnt     v1.16b, v0.16b
  uaddlp  v2.8h, v1.16b

For V4HI, we generate:
  cnt     v1.8b, v0.8b
  uaddlp  v2.4h, v1.8b

For V4SI, we generate:
  cnt     v1.16b, v0.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h

For V4SI with TARGET_DOTPROD, we generate the following instead:
  movi    v0.4s, #0
  movi    v1.16b, #1
  cnt     v3.16b, v2.16b
  udot    v0.4s, v3.16b, v1.16b

For V2SI, we generate:
  cnt     v1.8b, v.8b
  uaddlp  v2.4h, v1.8b
  uaddlp  v3.2s, v2.4h

For V2SI with TARGET_DOTPROD, we generate the following instead:
  movi    v0.8b, #0
  movi    v1.8b, #1
  cnt     v3.8b, v2.8b
  udot    v0.2s, v3.8b, v1.8b

For V2DI, we generate:
  cnt     v1.16b, v.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h
  uaddlp  v4.2d, v3.4s

For V4SI with TARGET_DOTPROD, we generate the following instead:
  movi    v0.4s, #0
  movi    v1.16b, #1
  cnt     v3.16b, v2.16b
  udot    v0.4s, v3.16b, v1.16b
  uaddlp  v0.2d, v0.4s

	PR target/113859

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (aarch64_<su>addlp<mode>): Rename to...
	(@aarch64_<su>addlp<mode>): ... This.
	(popcount<mode>2): New define_expand.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/popcnt-udot.c: New test.
	* gcc.target/aarch64/popcnt-vec.c: New test.

Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
jcmvbkbc pushed a commit that referenced this issue Nov 10, 2024
We currently crash upon the following invalid code (notice the "void
void**" parameter)

=== cut here ===
using size_t = decltype(sizeof(int));
void *operator new(size_t, void void **p) noexcept { return p; }
int x;
void f() {
    int y;
    new (&y) int(x);
}
=== cut here ===

The problem is that in this case, we end up with a NULL_TREE parameter
list for the new operator because of the error, and (1) coerce_new_type
wrongly complains about the first parameter type not being size_t,
(2) std_placement_new_fn_p blindly accesses the parameter list, hence a
crash.

This patch does NOT address #1 since we can't easily distinguish between
a new operator declaration without parameters from one with erroneous
parameters (and it's not worth the risk to refactor and break things for
an error recovery issue) hence a dg-bogus in new52.C, but it does
address #2 and the ICE by simply checking the first parameter against
NULL_TREE.

It also adds a new testcase checking that we complain about new
operators with no or invalid first parameters, since we did not have
any.

	PR c++/117101

gcc/cp/ChangeLog:

	* init.cc (std_placement_new_fn_p): Check first_arg against
	NULL_TREE.

gcc/testsuite/ChangeLog:

	* g++.dg/init/new52.C: New test.
	* g++.dg/init/new53.C: New test.
jcmvbkbc pushed a commit that referenced this issue Nov 10, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
jcmvbkbc pushed a commit that referenced this issue Nov 10, 2024
The second source register of insn "*extzvsi-1bit_addsubx" cannot be the
same as the destination register, because that register will be overwritten
with an intermediate value after insn splitting.

     /* example #1 */
     int test1(int b, int a) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result #1 (incorrect)
     test1:
     	extui	a2, a3, 10, 1	;; overwrites A2 before used
     	addx4	a2, a2, a2
     	ret.n

This patch fixes that.

     ;; result #1 (correct)
     test1:
     	extui	a3, a3, 10, 1	;; uses A3 and then overwrites
     	addx4	a2, a3, a2
     	ret.n

However, it should be noted that the first source register can be the same
as the destination without any problems.

     /* example #2 */
     int test2(int a, int b) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result (correct)
     test2:
     	extui	a2, a2, 10, 1	;; uses A2 and then overwrites
     	addx4	a2, a2, a3
     	ret.n

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*extzvsi-1bit_addsubx):
	Add '&' to the destination register constraint to indicate that
	it is 'earlyclobber', append '0' to the first source register
	constraint to indicate that it can be the same as the destination
	register, and change the split condition from 1 to reload_completed
	so that the insn will be split only after RA in order to obtain
         allocated registers that satisfy the above constraints.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants