-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"register" directive not honored correctly. #6
Comments
It doesn't have to. Please see https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables for more details. |
I did not realize it would only be for io operands, I was sticking them in my clobber lists and it was still re-using them, though. I don't know how much I care about this any more, since I've just taken to writing my function in the assembler since I need to get much more control over my stack and when/how I push things. Thank you for that reference though. I've never seen that page! P.S. If the issue comes up again, I will re-open ticket. |
This simple patch implements Richard Biener's suggestion in comment #6 of PR tree-optimization/52171 (from February 2013) that the insn-preds code generated by genpreds can avoid using strncmp when matching constant strings of length one. The effect of this patch is best explained by the diff of insn-preds.cc: < if (!strncmp (str + 1, "g", 1)) --- > if (str[1] == 'g') 3104c3104 < if (!strncmp (str + 1, "m", 1)) --- > if (str[1] == 'm') 3106c3106 < if (!strncmp (str + 1, "c", 1)) --- > if (str[1] == 'c') ... The equivalent optimization is performed by GCC (but perhaps not by the host compiler), but generating simpler/smaller code may encourage further optimizations (such as use of a switch statement). 2022-05-24 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * genpreds.cc (write_lookup_constraint_1): Avoid generating a call to strncmp for strings of length one.
This patch fixes both ICE regressions PR middle-end/105853 and PR target/105856 caused by my recent patch to expand small const structs as immediate constants. That patch updated code generation in three places: two in expr.cc that call store_constructor directly, and the third in calls.cc's load_register_parameters that expands its CONSTRUCTOR via expand_expr, as store_constructor is local/static to expr.cc, and the "public" API, should usually simply forward the constructor to the appropriate store_constructor function. Alas, despite the clean regression testing on multiple targets, the above ICEs show that expand_expr isn't a suitable proxy for store_constructor, and things that (I'd assumed) shouldn't affect how/whether a struct is placed in a register [such as whether the struct is considered packed/ aligned or not] actually interfere with the optimization that is being attempted. The (proposed) solution is to export store_constructor (and it's helper function int_expr_size) from expr.cc, by removing their static qualifier and prototyping both functions in expr.h, so they can be called directly from load_register_parameters in calls.cc. This cures both ICEs, but almost as importantly improves code generation over GCC 12. For PR 105853, GCC 12 generates: compose_nd_na_ipv6_src: movzx eax, WORD PTR eth_addr_zero[rip+2] movzx edx, WORD PTR eth_addr_zero[rip] movzx edi, WORD PTR eth_addr_zero[rip+4] sal rax, 16 or rax, rdx sal rdi, 32 or rdi, rax xor eax, eax jmp packet_set_nd eth_addr_zero: .zero 6 where now (with this fix) GCC 13 generates: compose_nd_na_ipv6_src: xorl %edi, %edi xorl %eax, %eax jmp packet_set_nd Likewise, for PR 105856 on ARM, we'd previously generate: g_329_3: movw r3, #:lower16:.LANCHOR0 movt r3, #:upper16:.LANCHOR0 ldr r0, [r3] b func_19 but with this optimization we now generate: g_329_3: mov r0, #6 b func_19 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR middle-end/105853 PR target/105856 * calls.cc (load_register_parameters): Call store_constructor and int_expr_size directly instead of expanding via expand_expr. * expr.cc (static void store_constructor): Don't prototype here. (static HOST_WIDE_INT int_expr_size): Likewise. (store_constructor): No longer static. (int_expr_size): Likewise, no longer static. * expr.h (store_constructor): Prototype here. (int_expr_size): Prototype here. gcc/testsuite/ChangeLog PR middle-end/105853 PR target/105856 * gcc.dg/pr105853.c: New test case. * gcc.dg/pr105856.c: New test case.
This patch addresses the issue in comment #6 of PR rtl-optimization/7061 (a four digit PR number) from 2006 where on x86_64 complex number arguments are unconditionally spilled to the stack. For the test cases below: float re(float _Complex a) { return __real__ a; } float im(float _Complex a) { return __imag__ a; } GCC with -O2 currently generates: re: movq %xmm0, -8(%rsp) movss -8(%rsp), %xmm0 ret im: movq %xmm0, -8(%rsp) movss -4(%rsp), %xmm0 ret with this patch we now generate: re: ret im: movq %xmm0, %rax shrq $32, %rax movd %eax, %xmm0 ret [Technically, this shift can be performed on %xmm0 in a single instruction, but the backend needs to be taught to do that, the important bit is that the SCmode argument isn't written to the stack]. The patch itself is to emit_group_store where just before RTL expansion commits to writing to the stack, we check if the store group consists of a single scalar integer register that holds a complex mode value; on x86_64 SCmode arguments are passed in DImode registers. If this is the case, we can use a SUBREG to "view_convert" the integer to the equivalent complex mode. An interesting corner case that showed up during testing is that x86_64 also passes HCmode arguments in DImode registers(!), i.e. using modes of different sizes. This is easily handled/supported by first converting to an integer mode of the correct size, and then generating a complex mode SUBREG of this. This is similar in concept to the patch I proposed here: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html 2020-06-10 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/7061 * expr.cc (emit_group_store): For groups that consist of a single scalar integer register that hold a complex mode value, use gen_lowpart to generate a SUBREG to "view_convert" to the complex mode. For modes of different sizes, first convert to an integer mode of the appropriate size. gcc/testsuite/ChangeLog PR rtl-optimization/7061 * gcc.target/i386/pr7061-1.c: New test case. * gcc.target/i386/pr7061-2.c: New test case.
I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X##5; \ template <class U> struct X##6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
One known missing piece in the modules implementation is merging of a streamed-in local type (class or enum) with the corresponding in-TU version of the local type. This missing piece turns out to cause a hard-to-reduce use-after-free GC issue due to the entity_ary not being marked as a GC root (deliberately), and manifests as a serialization error on stream-in as in PR99426 (see comment #6 for a reduction). It's also reproducible on trunk when running the xtreme-header tests without -fno-module-lazy. This patch implements this missing piece, making us merge such local types according to their position within the containing function's definition, analogous to how we merge FIELD_DECLs of a class according to their index in the TYPE_FIELDS list. PR c++/99426 gcc/cp/ChangeLog: * module.cc (merge_kind::MK_local_type): New enumerator. (merge_kind_name): Update. (trees_out::chained_decls): Move BLOCK-specific handling of DECL_LOCAL_DECL_P decls to ... (trees_out::core_vals) <case BLOCK>: ... here. Stream BLOCK_VARS manually. (trees_in::core_vals) <case BLOCK>: Stream BLOCK_VARS manually. Handle deduplicated local types.. (trees_out::key_local_type): Define. (trees_in::key_local_type): Define. (trees_out::get_merge_kind) <case FUNCTION_DECL>: Return MK_local_type for a local type. (trees_out::key_mergeable) <case FUNCTION_DECL>: Use key_local_type. (trees_in::key_mergeable) <case FUNCTION_DECL>: Likewise. (trees_in::is_matching_decl): Be flexible with type mismatches for local entities. (trees_in::register_duplicate): Also register the DECL_TEMPLATE_RESULT of a TEMPLATE_DECL as a duplicate. (depset_cmp): Return 0 for equal IDENTIFIER_HASH_VALUEs. gcc/testsuite/ChangeLog: * g++.dg/modules/merge-17.h: New test. * g++.dg/modules/merge-17_a.H: New test. * g++.dg/modules/merge-17_b.C: New test. * g++.dg/modules/xtreme-header-7_a.H: New test. * g++.dg/modules/xtreme-header-7_b.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
Inside functions, I can declare a "register" directive to a variable. Amazingly, GCC sometimes follows it, but most of the time, it does not. Though it will usually generate code using the variable as a register, it doesn't always seem to work. Additionally, sometimes GCC will clobber my registers with its own code.
Furthermore, GCC does not seem to back up these registers inside a function call. Even if I list a register as clobbering in my volatile asm, it simply ignores that. I may clobber registers that could have serious consequences and GCC won't back them up. Ordinarily I'd just work around this, but the whole not saving registers thing is a pretty big annoyance.
The text was updated successfully, but these errors were encountered: