Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xtensa ELF info/hints? #3

Closed
pfalcon opened this issue May 14, 2015 · 9 comments
Closed

Xtensa ELF info/hints? #3

pfalcon opened this issue May 14, 2015 · 9 comments

Comments

@pfalcon
Copy link

pfalcon commented May 14, 2015

Another support request regarding Xtensa stuff:

  1. Is there formal Xtensa ELF ABI references, which described what R_XTENSA_SLOT0_OP and friends are? I saw such stuff e.g. for PowerPC, but googling for "R_XTENSA_SLOT0_OP pdf" gives nothing, and for "R_XTENSA_SLOT0_OP" only noise.
  2. Does Xtensa arch support linker-relocated, non-PIC shared libraries? E.g. old good x86 supports that, while x86_64 explicitly don't. Quick try for Xtensa gives: "dangerous relocation: invalid relocation for dynamic symbol: memset", "dangerous relocation: dynamic relocation in read-only section", etc. I still wonder if there's a definitive, formal answer.

Thanks.

Context: well, if you make things like https://github.com/jcmvbkbc/esp-elf-rom yourself, you shouldn't be surprised someone else asks such questions ;-). And did a "@jcmvbkbc" in another project's ticket, so just leaving it here: https://github.com/pfalcon/ScratchABit

@jcmvbkbc
Copy link
Owner

Is there formal Xtensa ELF ABI references, which described what R_XTENSA_SLOT0_OP and friends are?

No, AFAIK. The best description I know of is in the binutils source: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/bfd-in2.h;h=ade49ffc6188210ad2d6484c154853eb6c75613e;hb=HEAD#l5359 and https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf32-xtensa.c;h=25236707dae46e7190c646de1601fb1f6ff088fc;hb=HEAD#l165
Some notes on TLS-specific relocations are here: http://wiki.linux-xtensa.org/index.php/ABI_Interface
I guess I'll spend some time this year developing xtensa support bits for elfutils, looks like it'd be a good time to document these pieces of ABI.

Does Xtensa arch support linker-relocated, non-PIC shared libraries?

No, AFAIK. Can you give an example of such library, I'm curious how linking command looks for it?
OTOH there's overlay support in the xtensa tools, but I don't know anything about it.

if you make things like https://github.com/jcmvbkbc/esp-elf-rom yourself, you shouldn't be surprised someone else asks such questions ;-)

I'm not surprised at all, but that reference doesn't explain much. esp-elf-rom is made to ease debugging with gdb. But from what you're saying it looks like you're developing dynamic loader, right?

And did a "@jcmvbkbc" in another project's ticket, so just leaving it here:

-ENOPARSE. Can't find anything related by your link.

@pfalcon
Copy link
Author

pfalcon commented May 14, 2015

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf32-xtensa.c;h=25236707dae46e7190c646de1601fb1f6ff088fc;hb=HEAD#l165

Thanks. So, does R_XTENSA_ASM_EXPAND's (https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf32-xtensa.c;h=25236707dae46e7190c646de1601fb1f6ff088fc;hb=HEAD#l1965) purpose for example to only serve as a place of linker to check, not really change instruction's args? Also, is meaning of R_XTENSA_NONE "there was a relocation needed, but now it's done somehow" or "void entry, don't assume there was a relocation needed at all"? (See below for argumentation.)

No, AFAIK. Can you give an example of such library, I'm curious how linking command looks for it?
OTOH there's overlay support in the xtensa tools, but I don't know anything about it.

This gives an example: http://stackoverflow.com/a/6570000/496009 . Again, only few archs support relocatable (vs PIC) shlibs, like i386.

I'm not surprised at all, but that reference doesn't explain much. esp-elf-rom is made to ease debugging with gdb. But from what you're saying it looks like you're developing dynamic loader, right?

Well, so I'm looking for a way to automatically tell which instruction operands are addresses and which are not. One way to do that is by using relocs. At the same time, I need the code to be linked already (all xref's resolved, and all addresses are in the code). That's done by applying relocations, and they're no longer needed after that and discarded. So, I was looking for a way to get both ;-). ld -r doesn't work as it explicitly produces an object, not executable file, and then 2nd idea was to cheat by producing shlib instead of executable. That doesn't appear work, so looks like I'll need to write a kind of linker ;-).

-ENOPARSE. Can't find anything related by your link.

It was this: tommie/lx106-hal#1 (comment)

@jcmvbkbc
Copy link
Owner

does R_XTENSA_ASM_EXPAND's purpose for example to only serve as a place of linker to check, not really change instruction's args?

Yes, it marks the places for link-time relaxation.

Also, is meaning of R_XTENSA_NONE "there was a relocation needed, but now it's done somehow" or "void entry, don't assume there was a relocation needed at all"? (See below for argumentation.)

I think R_XTENSA_NONE should never appear in objects/executables. If it does it's most likely a bug.

Well, so I'm looking for a way to automatically tell which instruction operands are addresses and which are not. One way to do that is by using relocs.

Not sure I understand. The instruction defines how its operand is used, e.g. in l32r a0, x x is always an address. You probably care if the value loaded from x is an address, right?

If so then I don't see why having PIC shared object is bad: addresses will anyway be represented as literals with relocations against them, and when you disassemble an instruction you'd be able to see that it refers to such literal.

If for some other reason GOT and PLT need to be avoided it still may be easier to relax ld restrictions on relocation placement and allow leaving R_XTENSA_SLOT*_OP type relocations in the linked shared object. One of the reasons it's not allowed now is that these relocation types don't describe relocation completely, the instruction where relocation points must be analyzed in order to understand, how its immediate subfield must be changed. That'd be very expensive for dynamic linker, but doesn't matter for static analysis.

@pfalcon
Copy link
Author

pfalcon commented May 15, 2015

The instruction defines how its operand is used, e.g. in l32r a0, x x is always an address.

Well, yeah, the beauty of the RISC. But that's not true in general case, e.g. if something is linked at address 0, N in "movi aX, N" can be either literal numeric value or address. For arch where "move immediate" is full-range, or for RISCs, which emulate it with l32r-like, the issue is also apparent.

I think R_XTENSA_NONE should never appear in objects/executables. If it does it's most likely a bug.

In an object file produced by "ld -r"ing together all objects from exploded esp8266 sdk libs:

$ readelf --all blob.o | grep R_XTENSA_NONE | wc
   8167   32668  413452

And generally, if those mark place which was already fixed up (e.g. SLOT0_OP which was undefined in a single object, but which was fixed up with relative addressing), it's better to have (for my usecase) at least NONE, than nothing at all.

If so then I don't see why having PIC shared object is bad

It's not bad. The question was whether non-PIC objects can be put a shared lib: I just took an esp8266 which produces ELF (from which actual ROM image is to be extracted), and added --shared option, leading to bunch of errors quoted above, so I just wondered if something could be done about that, but I assume not.

From Linux point of view, requiring shlib to be always PIC makes good sense, given that it simplifies dynamic linker and gives 100% sharable image w/o need for pages dirtied by relocations.

Well, thanks for discussion, it was helpful, as I mentioned, I started writing kind of load-linker for scratchabit, even if it will be just proof of concept.

@pfalcon pfalcon closed this as completed May 15, 2015
@jcmvbkbc
Copy link
Owner

I think R_XTENSA_NONE should never appear in objects/executables. If it does it's most likely a bug.

In an object file produced by "ld -r"ing together all objects from exploded esp8266 sdk libs

Interesting. I looked at the produced object file and saw that

  • some of them are pure garbage, e.g.:
     ee0:       f0c112          addi    a1, a1, -16
                        ee0: R_XTENSA_NONE      *ABS*
  • most (all?) others are accompanied with valid relocations, e.g.:
    101f:       0074c5          call0   176c <system_rtc_mem_read>
                        101f: R_XTENSA_NONE     *ABS*+0xa8
                        101f: R_XTENSA_SLOT0_OP system_rtc_mem_read

I still think that these are bugs.

BTW, have you tried linker options

`-q'
`--emit-relocs'
     Leave relocation sections and contents in fully linked executables.
     Post link analysis and optimization tools may need this
     information in order to perform correct modifications of
     executables.  This results in larger executables.

@pfalcon
Copy link
Author

pfalcon commented May 15, 2015

--emit-relocs

Great, exactly what I need! I tried to look thru ld --help, but apparently quit that too early switching to google instead. Thanks for the hint!

@pfalcon
Copy link
Author

pfalcon commented May 16, 2015

Another question, not directly related to the above, but to not create another ticket:

Reading Xtensa ISA RefMan, s.8.3.1:

The assembler substitutes a different instruction when an operand is out of range.
For example, it turns MOVI into L32R when the immediate is outside the range
-2048 to 2047.

Suppose I want to perform reverse transform - turn L32R into MOVI, but want to make it distinguishable from real MOVI - what naming would you suggest? So far I use "movi*", but maybe some form would be more "Xtensa-ic", e.g. "movi.l"?

@jcmvbkbc
Copy link
Owner

Suppose I want to perform reverse transform - turn L32R into MOVI, but want to make it distinguishable from real MOVI - what naming would you suggest?

Make it distinguishable in what context? You mean disassembling l32r into movi? Don't know. To my taste literal disassembly with loaded value in comment is the best.

maybe some form would be more "Xtensa-ic"

No, AFAIK: we only make opcode substitution at assembly time, not at disassembly. And if you write in assembly you usually just use movi regardless of the immediate value.

@pfalcon
Copy link
Author

pfalcon commented May 16, 2015

Make it distinguishable in what context? You mean disassembling l32r into movi? Don't know. To my taste literal disassembly with loaded value in comment is the best.

Yes, in the context of producing human-readable disassembly (which is a context of ScratchABit mentioned above). You prefer that because you use Xtensa asm daily, for other people it's nuisance to remember difference between l32i & l32r ;-). Also, comments are just that - sequence of chars, while arguments are objects and have type (numeric value/address at least). So, in a current prototype of this feature for ida-xtensa I have argument vs comment the other way around:

4000011f   movi*           a2, 0x4000e328 ; via 0x40000098

4000011f   movi*           a2, _rom_store_table ; via 0x40000098 

So, if you don't have better suggestions than "movi*", let it stay that ;-).

jcmvbkbc pushed a commit that referenced this issue May 30, 2017
	* cp-tree.h (build_min_nt_call_vec): Declare.
	* decl.c (build_offset_ref_call_from_tree): Call it.
	* parser.c (cp_parser_postfix_expression): Likewise.
	* pt.c (tsubst_copy_and_build): Likewise.
	* semantics.c (finish_call_expr): Likewise.
	* tree.c (build_min_nt_loc): Keep unresolved lookups.
	(build_min): Likewise.
	(build_min_non_dep): Likewise.
	(build_min_non_dep_call_vec): Likewise.
	(build_min_nt_call_vec): New.

	PR c++/80891 (#3)
	* g++.dg/lookup/pr80891-3.C: New.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248571 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jun 18, 2018
When -fcf-protection -mcet is used, I got

FAIL: g++.dg/eh/sighandle.C

(gdb) bt
 #0  _Unwind_RaiseException (exc=exc@entry=0x416ed0)
    at /export/gnu/import/git/sources/gcc/libgcc/unwind.inc:140
 #1  0x00007ffff7d9936b in __cxxabiv1::__cxa_throw (obj=<optimized out>,
    tinfo=0x403dd0 <typeinfo for int@@CXXABI_1.3>, dest=0x0)
    at /export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
 #2  0x0000000000401255 in sighandler (signo=11, si=0x7fffffffd6f8,
    uc=0x7fffffffd5c0)
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:9
 #3  <signal handler called> <<<< Signal frame which isn't on shadow stack
 #4  dosegv ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:14
 #5  0x00000000004012e3 in main ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:30
(gdb) p frames
$6 = 5
(gdb)

frame count should be 4, not 5.  This patch skips signal frames when
unwinding shadow stack.

gcc/testsuite/

	PR libgcc/85334
	* g++.dg/torture/pr85334.C: New test.

libgcc/

	PR libgcc/85334
	* unwind-generic.h (_Unwind_Frames_Increment): New.
	* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
	Likewise.
	* unwind.inc (_Unwind_RaiseException_Phase2): Increment frame
	count with _Unwind_Frames_Increment.
	(_Unwind_ForcedUnwind_Phase2): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259502 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
Move pr83660.C to g++.target.  As comment #3 of PR83660,
rename it to c isn't one option.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr83660.C: Moved to...
	* g++.target/powerpc/pr83660.C: ...here.
jcmvbkbc pushed a commit that referenced this issue Jun 11, 2022
This patch makes us avoid substituting into the TEMPLATE_PARM_CONSTRAINTS
of each template parameter except as necessary for declaration matching,
like we already do for the other constituent constraints of a declaration.

This patch also improves the CA104 implementation of explicit
specialization matching of a constrained function template inside a
class template, by considering the function's combined constraints
instead of just its trailing constraints.  This allows us to correctly
handle the first three explicit specializations in concepts-spec2.C
below, but because we compare the constraints as a whole, it means we
incorrectly accept the fourth explicit specialization which writes #3's
constraints in a different way.  For complete correctness here,
determine_specialization should use tsubst_each_template_parm_constraints
and template_parameter_heads_equivalent_p.

	PR c++/100374

gcc/cp/ChangeLog:

	* pt.cc (determine_specialization): Compare overall constraints
	not just the trailing constraints.
	(tsubst_each_template_parm_constraints): Define.
	(tsubst_friend_function): Use it.
	(tsubst_friend_class): Use it.
	(tsubst_template_parm): Don't substitute TEMPLATE_PARM_CONSTRAINTS.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-spec2.C: New test.
	* g++.dg/cpp2a/concepts-template-parm11.C: New test.
jcmvbkbc pushed a commit that referenced this issue Apr 19, 2023
This is a regression present on the mainline and 12 branch at -O2, but the
issue is related to vectorization so was present at -O3 in earlier versions.

The vcondu expander that was added for VIS 3 more than a decade ago does not
fully work, because it does not filter out the unsigned condition codes (the
instruction is an UNSPEC that accepts only signed condition codes).

While I was at it, I also added the missing vcond and vcondu expanders for
the new comparison instructions that were added in VIS 4.

gcc/
	PR target/109140
	* config/sparc/sparc.cc (sparc_expand_vcond): Call signed_condition
	on operand #3 to get the final condition code.  Use std::swap.
	* config/sparc/sparc.md (vcondv8qiv8qi): New VIS 4 expander.
	(fucmp<gcond:code>8<P:mode>_vis): Move around.
	(fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Likewise.
	(vcondu<GCM:mode><GCM:mode>): New VIS 4 expander.

gcc/testsuite/
	* gcc.target/sparc/20230328-1.c: New test.
	* gcc.target/sparc/20230328-2.c: Likewise.
	* gcc.target/sparc/20230328-3.c: Likewise.
	* gcc.target/sparc/20230328-4.c: Likewise.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
jcmvbkbc pushed a commit that referenced this issue Sep 5, 2023
Hi, Richard and Richi.

Base on the suggestions from Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html

This patch choose (1) approach that Richard provided, meaning:

RVV implements cond_* optabs as expanders.  RVV therefore supports
both IFN_COND_ADD and IFN_COND_LEN_ADD.  No dummy length arguments
are needed at the gimple level.

Such approach can make codes much cleaner and reasonable.

Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n)
{
  for (int i = 0; i < n; i++)
    if (cond[i])
      a[i] = b[i] + a[i];
}

Output of RISC-V (32-bits) gcc (trunk) (Compiler #3)
<source>:5:21: missed: couldn't vectorize loop
<source>:5:21: missed: not vectorized: control flow in loop.

ARM SVE:

...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vec_mask_and_55 = loop_mask_49 & mask__27.10_51;
...
vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56);

For RVV, we want IR as follows:

...
_68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]);
...
mask__27.10_51 = vect__4.9_49 != { 0, ... };
...
vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0);
...

Both len and mask of COND_LEN_ADD are real not dummy.

This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*.

And also, Bootstrap and Regression on X86 passed.

OK for trunk?

gcc/ChangeLog:

	* internal-fn.cc (get_len_internal_fn): New function.
	(DEF_INTERNAL_COND_FN): Ditto.
	(DEF_INTERNAL_SIGNED_COND_FN): Ditto.
	* internal-fn.h (get_len_internal_fn): Ditto.
	* tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization.
jcmvbkbc pushed a commit that referenced this issue Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants