Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"register" directive not honored correctly. #6

Closed
cnlohr opened this issue Jul 28, 2016 · 2 comments
Closed

"register" directive not honored correctly. #6

cnlohr opened this issue Jul 28, 2016 · 2 comments

Comments

@cnlohr
Copy link

cnlohr commented Jul 28, 2016

Inside functions, I can declare a "register" directive to a variable. Amazingly, GCC sometimes follows it, but most of the time, it does not. Though it will usually generate code using the variable as a register, it doesn't always seem to work. Additionally, sometimes GCC will clobber my registers with its own code.

Furthermore, GCC does not seem to back up these registers inside a function call. Even if I list a register as clobbering in my volatile asm, it simply ignores that. I may clobber registers that could have serious consequences and GCC won't back them up. Ordinarily I'd just work around this, but the whole not saving registers thing is a pretty big annoyance.

@jcmvbkbc
Copy link
Owner

I can declare a "register" directive to a variable. Amazingly, GCC sometimes follows it, but most of the time, it does not. ... Furthermore, GCC does not seem to back up these registers inside a function call.

It doesn't have to. Please see https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables for more details.
If you feel that you have a case that should work according to the documentation, please provide an example code to reproduce it and description of what exactly was expected and what's wrong.

@cnlohr
Copy link
Author

cnlohr commented Jul 29, 2016

I did not realize it would only be for io operands, I was sticking them in my clobber lists and it was still re-using them, though.

I don't know how much I care about this any more, since I've just taken to writing my function in the assembler since I need to get much more control over my stack and when/how I push things. Thank you for that reference though. I've never seen that page!

P.S. If the issue comes up again, I will re-open ticket.

@cnlohr cnlohr closed this as completed Jul 29, 2016
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
This simple patch implements Richard Biener's suggestion in comment #6
of PR tree-optimization/52171 (from February 2013) that the insn-preds
code generated by genpreds can avoid using strncmp when matching constant
strings of length one.

The effect of this patch is best explained by the diff of insn-preds.cc:
<       if (!strncmp (str + 1, "g", 1))
---
>       if (str[1] == 'g')
3104c3104
<       if (!strncmp (str + 1, "m", 1))
---
>       if (str[1] == 'm')
3106c3106
<       if (!strncmp (str + 1, "c", 1))
---
>       if (str[1] == 'c')
...

The equivalent optimization is performed by GCC (but perhaps not by the
host compiler), but generating simpler/smaller code may encourage further
optimizations (such as use of a switch statement).

2022-05-24  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* genpreds.cc (write_lookup_constraint_1): Avoid generating a call
	to strncmp for strings of length one.
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
This patch fixes both ICE regressions PR middle-end/105853 and
PR target/105856 caused by my recent patch to expand small const structs
as immediate constants.  That patch updated code generation in three
places: two in expr.cc that call store_constructor directly, and the
third in calls.cc's load_register_parameters that expands its CONSTRUCTOR
via expand_expr, as store_constructor is local/static to expr.cc, and
the "public" API, should usually simply forward the constructor to the
appropriate store_constructor function.

Alas, despite the clean regression testing on multiple targets, the above
ICEs show that expand_expr isn't a suitable proxy for store_constructor,
and things that (I'd assumed) shouldn't affect how/whether a struct is
placed in a register [such as whether the struct is considered packed/
aligned or not] actually interfere with the optimization that is being
attempted.

The (proposed) solution is to export store_constructor (and it's helper
function int_expr_size) from expr.cc, by removing their static qualifier
and prototyping both functions in expr.h, so they can be called directly
from load_register_parameters in calls.cc.  This cures both ICEs, but
almost as importantly improves code generation over GCC 12.

For PR 105853, GCC 12 generates:

compose_nd_na_ipv6_src:
	movzx eax, WORD PTR eth_addr_zero[rip+2]
	movzx edx, WORD PTR eth_addr_zero[rip]
	movzx edi, WORD PTR eth_addr_zero[rip+4]
	sal rax, 16
	or rax, rdx
	sal rdi, 32
	or rdi, rax
	xor eax, eax
	jmp packet_set_nd
eth_addr_zero:	.zero 6

where now (with this fix) GCC 13 generates:
compose_nd_na_ipv6_src:
        xorl    %edi, %edi
        xorl    %eax, %eax
        jmp     packet_set_nd

Likewise, for PR 105856 on ARM, we'd previously generate:
g_329_3:
	movw r3, #:lower16:.LANCHOR0
	movt r3, #:upper16:.LANCHOR0
	ldr r0, [r3]
	b func_19

but with this optimization we now generate:
g_329_3:
        mov     r0, #6
        b       func_19

2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR middle-end/105853
	PR target/105856
	* calls.cc (load_register_parameters): Call store_constructor
	and int_expr_size directly instead of expanding via expand_expr.
	* expr.cc (static void store_constructor): Don't prototype here.
	(static HOST_WIDE_INT int_expr_size): Likewise.
	(store_constructor): No longer static.
	(int_expr_size): Likewise, no longer static.
	* expr.h (store_constructor): Prototype here.
	(int_expr_size): Prototype here.

gcc/testsuite/ChangeLog
	PR middle-end/105853
	PR target/105856
	* gcc.dg/pr105853.c: New test case.
	* gcc.dg/pr105856.c: New test case.
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re:	movq    %xmm0, -8(%rsp)
        movss   -8(%rsp), %xmm0
        ret
im:	movq    %xmm0, -8(%rsp)
        movss   -4(%rsp), %xmm0
        ret

with this patch we now generate:

re:	ret
im:	movq    %xmm0, %rax
        shrq    $32, %rax
        movd    %eax, %xmm0
        ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

2020-06-10  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/7061
	* expr.cc (emit_group_store): For groups that consist of a single
	scalar integer register that hold a complex mode value, use
	gen_lowpart to generate a SUBREG to "view_convert" to the complex
	mode.  For modes of different sizes, first convert to an integer
	mode of the appropriate size.

gcc/testsuite/ChangeLog
	PR rtl-optimization/7061
	* gcc.target/i386/pr7061-1.c: New test case.
	* gcc.target/i386/pr7061-2.c: New test case.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
jcmvbkbc pushed a commit that referenced this issue May 11, 2024
One known missing piece in the modules implementation is merging of a
streamed-in local type (class or enum) with the corresponding in-TU
version of the local type.  This missing piece turns out to cause a
hard-to-reduce use-after-free GC issue due to the entity_ary not being
marked as a GC root (deliberately), and manifests as a serialization
error on stream-in as in PR99426 (see comment #6 for a reduction).  It's
also reproducible on trunk when running the xtreme-header tests without
-fno-module-lazy.

This patch implements this missing piece, making us merge such local
types according to their position within the containing function's
definition, analogous to how we merge FIELD_DECLs of a class according
to their index in the TYPE_FIELDS list.

	PR c++/99426

gcc/cp/ChangeLog:

	* module.cc (merge_kind::MK_local_type): New enumerator.
	(merge_kind_name): Update.
	(trees_out::chained_decls): Move BLOCK-specific handling
	of DECL_LOCAL_DECL_P decls to ...
	(trees_out::core_vals) <case BLOCK>: ... here.  Stream
	BLOCK_VARS manually.
	(trees_in::core_vals) <case BLOCK>: Stream BLOCK_VARS
	manually.  Handle deduplicated local types..
	(trees_out::key_local_type): Define.
	(trees_in::key_local_type): Define.
	(trees_out::get_merge_kind) <case FUNCTION_DECL>: Return
	MK_local_type for a local type.
	(trees_out::key_mergeable) <case FUNCTION_DECL>: Use
	key_local_type.
	(trees_in::key_mergeable) <case FUNCTION_DECL>: Likewise.
	(trees_in::is_matching_decl): Be flexible with type mismatches
	for local entities.
	(trees_in::register_duplicate): Also register the
	DECL_TEMPLATE_RESULT of a TEMPLATE_DECL as a duplicate.
	(depset_cmp): Return 0 for equal IDENTIFIER_HASH_VALUEs.

gcc/testsuite/ChangeLog:

	* g++.dg/modules/merge-17.h: New test.
	* g++.dg/modules/merge-17_a.H: New test.
	* g++.dg/modules/merge-17_b.C: New test.
	* g++.dg/modules/xtreme-header-7_a.H: New test.
	* g++.dg/modules/xtreme-header-7_b.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants