Xtensa ELF info/hints? #3

pfalcon · 2015-05-14T20:05:17Z

Another support request regarding Xtensa stuff:

Is there formal Xtensa ELF ABI references, which described what R_XTENSA_SLOT0_OP and friends are? I saw such stuff e.g. for PowerPC, but googling for "R_XTENSA_SLOT0_OP pdf" gives nothing, and for "R_XTENSA_SLOT0_OP" only noise.
Does Xtensa arch support linker-relocated, non-PIC shared libraries? E.g. old good x86 supports that, while x86_64 explicitly don't. Quick try for Xtensa gives: "dangerous relocation: invalid relocation for dynamic symbol: memset", "dangerous relocation: dynamic relocation in read-only section", etc. I still wonder if there's a definitive, formal answer.

Thanks.

Context: well, if you make things like https://github.com/jcmvbkbc/esp-elf-rom yourself, you shouldn't be surprised someone else asks such questions ;-). And did a "@jcmvbkbc" in another project's ticket, so just leaving it here: https://github.com/pfalcon/ScratchABit

jcmvbkbc · 2015-05-14T21:22:10Z

Is there formal Xtensa ELF ABI references, which described what R_XTENSA_SLOT0_OP and friends are?

No, AFAIK. The best description I know of is in the binutils source: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/bfd-in2.h;h=ade49ffc6188210ad2d6484c154853eb6c75613e;hb=HEAD#l5359 and https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf32-xtensa.c;h=25236707dae46e7190c646de1601fb1f6ff088fc;hb=HEAD#l165
Some notes on TLS-specific relocations are here: http://wiki.linux-xtensa.org/index.php/ABI_Interface
I guess I'll spend some time this year developing xtensa support bits for elfutils, looks like it'd be a good time to document these pieces of ABI.

Does Xtensa arch support linker-relocated, non-PIC shared libraries?

No, AFAIK. Can you give an example of such library, I'm curious how linking command looks for it?
OTOH there's overlay support in the xtensa tools, but I don't know anything about it.

if you make things like https://github.com/jcmvbkbc/esp-elf-rom yourself, you shouldn't be surprised someone else asks such questions ;-)

I'm not surprised at all, but that reference doesn't explain much. esp-elf-rom is made to ease debugging with gdb. But from what you're saying it looks like you're developing dynamic loader, right?

And did a "@jcmvbkbc" in another project's ticket, so just leaving it here:

-ENOPARSE. Can't find anything related by your link.

pfalcon · 2015-05-14T21:59:24Z

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf32-xtensa.c;h=25236707dae46e7190c646de1601fb1f6ff088fc;hb=HEAD#l165

Thanks. So, does R_XTENSA_ASM_EXPAND's (https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf32-xtensa.c;h=25236707dae46e7190c646de1601fb1f6ff088fc;hb=HEAD#l1965) purpose for example to only serve as a place of linker to check, not really change instruction's args? Also, is meaning of R_XTENSA_NONE "there was a relocation needed, but now it's done somehow" or "void entry, don't assume there was a relocation needed at all"? (See below for argumentation.)

No, AFAIK. Can you give an example of such library, I'm curious how linking command looks for it?
OTOH there's overlay support in the xtensa tools, but I don't know anything about it.

This gives an example: http://stackoverflow.com/a/6570000/496009 . Again, only few archs support relocatable (vs PIC) shlibs, like i386.

I'm not surprised at all, but that reference doesn't explain much. esp-elf-rom is made to ease debugging with gdb. But from what you're saying it looks like you're developing dynamic loader, right?

Well, so I'm looking for a way to automatically tell which instruction operands are addresses and which are not. One way to do that is by using relocs. At the same time, I need the code to be linked already (all xref's resolved, and all addresses are in the code). That's done by applying relocations, and they're no longer needed after that and discarded. So, I was looking for a way to get both ;-). ld -r doesn't work as it explicitly produces an object, not executable file, and then 2nd idea was to cheat by producing shlib instead of executable. That doesn't appear work, so looks like I'll need to write a kind of linker ;-).

-ENOPARSE. Can't find anything related by your link.

It was this: tommie/lx106-hal#1 (comment)

jcmvbkbc · 2015-05-14T23:16:00Z

does R_XTENSA_ASM_EXPAND's purpose for example to only serve as a place of linker to check, not really change instruction's args?

Yes, it marks the places for link-time relaxation.

Also, is meaning of R_XTENSA_NONE "there was a relocation needed, but now it's done somehow" or "void entry, don't assume there was a relocation needed at all"? (See below for argumentation.)

I think R_XTENSA_NONE should never appear in objects/executables. If it does it's most likely a bug.

Well, so I'm looking for a way to automatically tell which instruction operands are addresses and which are not. One way to do that is by using relocs.

Not sure I understand. The instruction defines how its operand is used, e.g. in l32r a0, x x is always an address. You probably care if the value loaded from x is an address, right?

If so then I don't see why having PIC shared object is bad: addresses will anyway be represented as literals with relocations against them, and when you disassemble an instruction you'd be able to see that it refers to such literal.

If for some other reason GOT and PLT need to be avoided it still may be easier to relax ld restrictions on relocation placement and allow leaving R_XTENSA_SLOT*_OP type relocations in the linked shared object. One of the reasons it's not allowed now is that these relocation types don't describe relocation completely, the instruction where relocation points must be analyzed in order to understand, how its immediate subfield must be changed. That'd be very expensive for dynamic linker, but doesn't matter for static analysis.

pfalcon · 2015-05-15T16:25:01Z

The instruction defines how its operand is used, e.g. in l32r a0, x x is always an address.

Well, yeah, the beauty of the RISC. But that's not true in general case, e.g. if something is linked at address 0, N in "movi aX, N" can be either literal numeric value or address. For arch where "move immediate" is full-range, or for RISCs, which emulate it with l32r-like, the issue is also apparent.

I think R_XTENSA_NONE should never appear in objects/executables. If it does it's most likely a bug.

In an object file produced by "ld -r"ing together all objects from exploded esp8266 sdk libs:

$ readelf --all blob.o | grep R_XTENSA_NONE | wc
   8167   32668  413452

And generally, if those mark place which was already fixed up (e.g. SLOT0_OP which was undefined in a single object, but which was fixed up with relative addressing), it's better to have (for my usecase) at least NONE, than nothing at all.

If so then I don't see why having PIC shared object is bad

It's not bad. The question was whether non-PIC objects can be put a shared lib: I just took an esp8266 which produces ELF (from which actual ROM image is to be extracted), and added --shared option, leading to bunch of errors quoted above, so I just wondered if something could be done about that, but I assume not.

From Linux point of view, requiring shlib to be always PIC makes good sense, given that it simplifies dynamic linker and gives 100% sharable image w/o need for pages dirtied by relocations.

Well, thanks for discussion, it was helpful, as I mentioned, I started writing kind of load-linker for scratchabit, even if it will be just proof of concept.

jcmvbkbc · 2015-05-15T16:54:52Z

I think R_XTENSA_NONE should never appear in objects/executables. If it does it's most likely a bug.

In an object file produced by "ld -r"ing together all objects from exploded esp8266 sdk libs

Interesting. I looked at the produced object file and saw that

some of them are pure garbage, e.g.:

     ee0:       f0c112          addi    a1, a1, -16
                        ee0: R_XTENSA_NONE      *ABS*

most (all?) others are accompanied with valid relocations, e.g.:

    101f:       0074c5          call0   176c <system_rtc_mem_read>
                        101f: R_XTENSA_NONE     *ABS*+0xa8
                        101f: R_XTENSA_SLOT0_OP system_rtc_mem_read

I still think that these are bugs.

BTW, have you tried linker options

`-q'
`--emit-relocs'
     Leave relocation sections and contents in fully linked executables.
     Post link analysis and optimization tools may need this
     information in order to perform correct modifications of
     executables.  This results in larger executables.

pfalcon · 2015-05-15T17:31:56Z

--emit-relocs

Great, exactly what I need! I tried to look thru ld --help, but apparently quit that too early switching to google instead. Thanks for the hint!

pfalcon · 2015-05-16T00:33:34Z

Another question, not directly related to the above, but to not create another ticket:

Reading Xtensa ISA RefMan, s.8.3.1:

The assembler substitutes a different instruction when an operand is out of range.
For example, it turns MOVI into L32R when the immediate is outside the range
-2048 to 2047.

Suppose I want to perform reverse transform - turn L32R into MOVI, but want to make it distinguishable from real MOVI - what naming would you suggest? So far I use "movi*", but maybe some form would be more "Xtensa-ic", e.g. "movi.l"?

jcmvbkbc · 2015-05-16T00:59:08Z

Suppose I want to perform reverse transform - turn L32R into MOVI, but want to make it distinguishable from real MOVI - what naming would you suggest?

Make it distinguishable in what context? You mean disassembling l32r into movi? Don't know. To my taste literal disassembly with loaded value in comment is the best.

maybe some form would be more "Xtensa-ic"

No, AFAIK: we only make opcode substitution at assembly time, not at disassembly. And if you write in assembly you usually just use movi regardless of the immediate value.

pfalcon · 2015-05-16T14:34:58Z

Make it distinguishable in what context? You mean disassembling l32r into movi? Don't know. To my taste literal disassembly with loaded value in comment is the best.

Yes, in the context of producing human-readable disassembly (which is a context of ScratchABit mentioned above). You prefer that because you use Xtensa asm daily, for other people it's nuisance to remember difference between l32i & l32r ;-). Also, comments are just that - sequence of chars, while arguments are objects and have type (numeric value/address at least). So, in a current prototype of this feature for ida-xtensa I have argument vs comment the other way around:

4000011f   movi*           a2, 0x4000e328 ; via 0x40000098

4000011f   movi*           a2, _rom_store_table ; via 0x40000098

So, if you don't have better suggestions than "movi*", let it stay that ;-).

* cp-tree.h (build_min_nt_call_vec): Declare. * decl.c (build_offset_ref_call_from_tree): Call it. * parser.c (cp_parser_postfix_expression): Likewise. * pt.c (tsubst_copy_and_build): Likewise. * semantics.c (finish_call_expr): Likewise. * tree.c (build_min_nt_loc): Keep unresolved lookups. (build_min): Likewise. (build_min_non_dep): Likewise. (build_min_non_dep_call_vec): Likewise. (build_min_nt_call_vec): New. PR c++/80891 (#3) * g++.dg/lookup/pr80891-3.C: New. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248571 138bc75d-0d04-0410-961f-82ee72b054a4

When -fcf-protection -mcet is used, I got FAIL: g++.dg/eh/sighandle.C (gdb) bt #0 _Unwind_RaiseException (exc=exc@entry=0x416ed0) at /export/gnu/import/git/sources/gcc/libgcc/unwind.inc:140 #1 0x00007ffff7d9936b in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x403dd0 <typeinfo for int@@CXXABI_1.3>, dest=0x0) at /export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90 #2 0x0000000000401255 in sighandler (signo=11, si=0x7fffffffd6f8, uc=0x7fffffffd5c0) at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:9 #3 <signal handler called> <<<< Signal frame which isn't on shadow stack #4 dosegv () at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:14 #5 0x00000000004012e3 in main () at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:30 (gdb) p frames $6 = 5 (gdb) frame count should be 4, not 5. This patch skips signal frames when unwinding shadow stack. gcc/testsuite/ PR libgcc/85334 * g++.dg/torture/pr85334.C: New test. libgcc/ PR libgcc/85334 * unwind-generic.h (_Unwind_Frames_Increment): New. * config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment): Likewise. * unwind.inc (_Unwind_RaiseException_Phase2): Increment frame count with _Unwind_Frames_Increment. (_Unwind_ForcedUnwind_Phase2): Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259502 138bc75d-0d04-0410-961f-82ee72b054a4

Move pr83660.C to g++.target. As comment #3 of PR83660, rename it to c isn't one option. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr83660.C: Moved to... * g++.target/powerpc/pr83660.C: ...here.

This patch makes us avoid substituting into the TEMPLATE_PARM_CONSTRAINTS of each template parameter except as necessary for declaration matching, like we already do for the other constituent constraints of a declaration. This patch also improves the CA104 implementation of explicit specialization matching of a constrained function template inside a class template, by considering the function's combined constraints instead of just its trailing constraints. This allows us to correctly handle the first three explicit specializations in concepts-spec2.C below, but because we compare the constraints as a whole, it means we incorrectly accept the fourth explicit specialization which writes #3's constraints in a different way. For complete correctness here, determine_specialization should use tsubst_each_template_parm_constraints and template_parameter_heads_equivalent_p. PR c++/100374 gcc/cp/ChangeLog: * pt.cc (determine_specialization): Compare overall constraints not just the trailing constraints. (tsubst_each_template_parm_constraints): Define. (tsubst_friend_function): Use it. (tsubst_friend_class): Use it. (tsubst_template_parm): Don't substitute TEMPLATE_PARM_CONSTRAINTS. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-spec2.C: New test. * g++.dg/cpp2a/concepts-template-parm11.C: New test.

This is a regression present on the mainline and 12 branch at -O2, but the issue is related to vectorization so was present at -O3 in earlier versions. The vcondu expander that was added for VIS 3 more than a decade ago does not fully work, because it does not filter out the unsigned condition codes (the instruction is an UNSPEC that accepts only signed condition codes). While I was at it, I also added the missing vcond and vcondu expanders for the new comparison instructions that were added in VIS 4. gcc/ PR target/109140 * config/sparc/sparc.cc (sparc_expand_vcond): Call signed_condition on operand #3 to get the final condition code. Use std::swap. * config/sparc/sparc.md (vcondv8qiv8qi): New VIS 4 expander. (fucmp<gcond:code>8<P:mode>_vis): Move around. (fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Likewise. (vcondu<GCM:mode><GCM:mode>): New VIS 4 expander. gcc/testsuite/ * gcc.target/sparc/20230328-1.c: New test. * gcc.target/sparc/20230328-2.c: Likewise. * gcc.target/sparc/20230328-3.c: Likewise. * gcc.target/sparc/20230328-4.c: Likewise.

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X##5; \ template <class U> struct X##6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

Hi, Richard and Richi. Base on the suggestions from Richard: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html This patch choose (1) approach that Richard provided, meaning: RVV implements cond_* optabs as expanders. RVV therefore supports both IFN_COND_ADD and IFN_COND_LEN_ADD. No dummy length arguments are needed at the gimple level. Such approach can make codes much cleaner and reasonable. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Output of RISC-V (32-bits) gcc (trunk) (Compiler #3) <source>:5:21: missed: couldn't vectorize loop <source>:5:21: missed: not vectorized: control flow in loop. ARM SVE: ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vec_mask_and_55 = loop_mask_49 & mask__27.10_51; ... vect__9.17_62 = .COND_ADD (vec_mask_and_55, vect__6.13_56, vect__8.16_60, vect__6.13_56); For RVV, we want IR as follows: ... _68 = .SELECT_VL (ivtmp_66, POLY_INT_CST [4, 4]); ... mask__27.10_51 = vect__4.9_49 != { 0, ... }; ... vect__9.17_60 = .COND_LEN_ADD (mask__27.10_51, vect__6.13_55, vect__8.16_59, vect__6.13_55, _68, 0); ... Both len and mask of COND_LEN_ADD are real not dummy. This patch has been fully tested in RISC-V port with supporting both COND_* and COND_LEN_*. And also, Bootstrap and Regression on X86 passed. OK for trunk? gcc/ChangeLog: * internal-fn.cc (get_len_internal_fn): New function. (DEF_INTERNAL_COND_FN): Ditto. (DEF_INTERNAL_SIGNED_COND_FN): Ditto. * internal-fn.h (get_len_internal_fn): Ditto. * tree-vect-stmts.cc (vectorizable_call): Add CALL auto-vectorization.

Here during overload resolution we have two strictly viable ambiguous candidates #1 and #2, and two non-strictly viable candidates #3 and #4 which we hold on to ever since r14-6522. These latter candidates have an empty second arg conversion since the first arg conversion was deemed bad, and this trips up joust when called on #3 and #4 which assumes all arg conversions are there. We can fix this by making joust robust to empty arg conversions, but in this situation we shouldn't need to compare #3 and #4 at all given that we have a strictly viable candidate. To that end, this patch makes tourney shortcut considering non-strictly viable candidates upon encountering ambiguity between two strictly viable candidates (taking advantage of the fact that the candidates list is sorted according to viability via splice_viable). PR c++/115239 gcc/cp/ChangeLog: * call.cc (tourney): Don't consider a non-strictly viable candidate as the champ if there was ambiguity between two strictly viable candidates. gcc/testsuite/ChangeLog: * g++.dg/overload/error7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>

pfalcon closed this as completed May 15, 2015

khcnz mentioned this issue Jun 5, 2017

Building an Espressif OTA (user{1,2}.bin) compatible Tasmota FW image arendst/Tasmota#476

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xtensa ELF info/hints? #3

Xtensa ELF info/hints? #3

pfalcon commented May 14, 2015

jcmvbkbc commented May 14, 2015

pfalcon commented May 14, 2015

jcmvbkbc commented May 14, 2015

pfalcon commented May 15, 2015

jcmvbkbc commented May 15, 2015

pfalcon commented May 15, 2015

pfalcon commented May 16, 2015

jcmvbkbc commented May 16, 2015

pfalcon commented May 16, 2015

Xtensa ELF info/hints? #3

Xtensa ELF info/hints? #3

Comments

pfalcon commented May 14, 2015

jcmvbkbc commented May 14, 2015

pfalcon commented May 14, 2015

jcmvbkbc commented May 14, 2015

pfalcon commented May 15, 2015

jcmvbkbc commented May 15, 2015

pfalcon commented May 15, 2015

pfalcon commented May 16, 2015

jcmvbkbc commented May 16, 2015

pfalcon commented May 16, 2015