GDB needs a sane default set of CSRs in the absence of a target description #1

jeremybennett · 2020-08-17T19:53:05Z

The set of default CSRs used by GDB in the absence of an XML target description is not appropriate for CORE-V. It will cause the GDB server (e.g. OpenOCD) to time out.

A minimal set of mandatory CSRs is appropriate as the default in these circumstances.

WIP: Added support for PULP hardware loop

zcee: add option flag & testcases

Fedora Rawhide is now using gcc-12.0. As part of updating to the gcc-12.0 package set, Rawhide is also now using a version of libgcc_s which lacks a .data section. This causes gdb to fail in the following fashion while debugging a program (such as gdb) which uses libgcc_s: (top-gdb) run Starting program: rawhide-master/bld/gdb/gdb ... objfiles.h:467: internal-error: sect_index_data not initialized A problem internal to GDB has been detected, further debugging may prove unreliable. ... I snipped the backtrace from the above output. Instead, here's a portion of a backtrace obtained using GDB's backtrace command. (Obviously, in order to obtain it, I used a GDB which has been patched with this commit.) #0 internal_error ( file=0xc6a508 "gdb/objfiles.h", line=467, fmt=0xc6a4e8 "sect_index_data not initialized") at gdbsupport/errors.cc:51 openhwgroup#1 0x00000000005f9651 in objfile::data_section_offset (this=0x4fa48f0) at gdb/objfiles.h:467 openhwgroup#2 0x000000000097c5f8 in relocate_address (address=0x17244, objfile=0x4fa48f0) at gdb/stap-probe.c:1333 openhwgroup#3 0x000000000097c630 in stap_probe::get_relocated_address (this=0xa1a17a0, objfile=0x4fa48f0) at gdb/stap-probe.c:1341 openhwgroup#4 0x00000000004d7025 in create_exception_master_breakpoint_probe ( objfile=0x4fa48f0) at gdb/breakpoint.c:3505 openhwgroup#5 0x00000000004d7426 in create_exception_master_breakpoint () at gdb/breakpoint.c:3575 openhwgroup#6 0x00000000004efcc1 in breakpoint_re_set () at gdb/breakpoint.c:13407 openhwgroup#7 0x0000000000956998 in solib_add (pattern=0x0, from_tty=0, readsyms=1) at gdb/solib.c:1001 openhwgroup#8 0x00000000009576a8 in handle_solib_event () at gdb/solib.c:1269 ... The function 'relocate_address' in gdb/stap-probe.c attempts to do its "relocation" by using objfile->data_section_offset(). That method, data_section_offset() is defined as follows in objfiles.h: CORE_ADDR data_section_offset () const { return section_offsets[SECT_OFF_DATA (this)]; } The internal error occurs when the SECT_OFF_DATA macro finds that the 'sect_index_data' field is -1: #define SECT_OFF_DATA(objfile) \ ((objfile->sect_index_data == -1) \ ? (internal_error (__FILE__, __LINE__, \ _("sect_index_data not initialized")), -1) \ : objfile->sect_index_data) relocate_address() is obtaining the section offset in order to compute a relocated address. For some ABIs, such as the System V ABI, the section offsets will all be the same. So for those ABIs, it doesn't matter which offset is used. However, other ABIs, such as the FDPIC ABI, will have different offsets for the various sections. Thus, for those ABIs, it is vital that this and other relocation code use the correct offset. In stap_probe::get_relocated_address, the address to which to add the offset (thus forming the relocated address) is obtained via this->get_address (); get_address is a getter for m_address in probe.h. It's documented/defined as follows (also in probe.h): /* The address where the probe is inserted, relative to SECT_OFF_TEXT. */ CORE_ADDR m_address; (Thanks to Tom Tromey for this observation.) So, based on this, the current use of data_section_offset / SECT_OFF_DATA is wrong. This relocation code should have been using text_section_offset / SECT_OFF_TEXT all along. That being the case, I've adjusted the stap-probe.c relocation code accordingly. Searching the sources turned up one other use of data_section_offset, in gdb/dtrace-probe.c, so I've updated that code as well. The same reasoning presented above applies to this case too. Summary: * gdb/dtrace-probe.c (dtrace_probe::get_relocated_address): Use method text_section_offset instead of data_section_offset. * gdb/stap-probe.c (relocate_address): Likewise.

g++ 11.1.0 has a bug where it will emit a negative DW_AT_data_member_location in some cases: $ cat test.cpp #include <memory> int main() { std::unique_ptr<int> ptr; } $ g++ -g test.cpp $ llvm-dwarfdump -F a.out ... 0x00000964: DW_TAG_member DW_AT_name [DW_FORM_strp] ("_M_head_impl") DW_AT_decl_file [DW_FORM_data1] ("/usr/include/c++/11.1.0/tuple") DW_AT_decl_line [DW_FORM_data1] (125) DW_AT_decl_column [DW_FORM_data1] (0x27) DW_AT_type [DW_FORM_ref4] (0x0000067a "default_delete<int>") DW_AT_data_member_location [DW_FORM_sdata] (-1) ... This leads to a GDB crash (when built with ASan, otherwise probably garbage results), since it tries to read just before (to the left, in ASan speak) of the value's buffer: ==888645==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000c52af at pc 0x7f711b239f4b bp 0x7fff356bd470 sp 0x7fff356bcc18 READ of size 1 at 0x6020000c52af thread T0 #0 0x7f711b239f4a in __interceptor_memcpy /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 openhwgroup#1 0x555c4977efa1 in value_contents_copy_raw /home/simark/src/binutils-gdb/gdb/value.c:1347 openhwgroup#2 0x555c497909cd in value_primitive_field(value*, long, int, type*) /home/simark/src/binutils-gdb/gdb/value.c:3126 openhwgroup#3 0x555c478f2eaa in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:333 openhwgroup#4 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#5 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#6 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#7 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#8 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#9 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#10 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 openhwgroup#11 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 openhwgroup#12 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 openhwgroup#13 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 openhwgroup#14 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 openhwgroup#15 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 openhwgroup#16 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#17 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#18 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 openhwgroup#19 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 openhwgroup#20 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 openhwgroup#21 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 openhwgroup#22 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 openhwgroup#23 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 openhwgroup#24 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 openhwgroup#25 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 openhwgroup#26 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 openhwgroup#27 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 openhwgroup#28 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 openhwgroup#29 0x555c4760f04c in c_value_print(value*, ui_file*, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:587 openhwgroup#30 0x555c483ff954 in language_defn::value_print(value*, ui_file*, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:614 openhwgroup#31 0x555c49759f61 in value_print(value*, ui_file*, value_print_options const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1189 openhwgroup#32 0x555c48950f70 in print_formatted /home/simark/src/binutils-gdb/gdb/printcmd.c:337 openhwgroup#33 0x555c48958eda in print_value(value*, value_print_options const&) /home/simark/src/binutils-gdb/gdb/printcmd.c:1258 openhwgroup#34 0x555c48959891 in print_command_1 /home/simark/src/binutils-gdb/gdb/printcmd.c:1367 openhwgroup#35 0x555c4895a3df in print_command /home/simark/src/binutils-gdb/gdb/printcmd.c:1458 openhwgroup#36 0x555c4767f974 in do_simple_func /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:97 openhwgroup#37 0x555c47692e25 in cmd_func(cmd_list_element*, char const*, int) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:2475 openhwgroup#38 0x555c4936107e in execute_command(char const*, int) /home/simark/src/binutils-gdb/gdb/top.c:670 openhwgroup#39 0x555c485f1bff in catch_command_errors /home/simark/src/binutils-gdb/gdb/main.c:523 openhwgroup#40 0x555c485f249c in execute_cmdargs /home/simark/src/binutils-gdb/gdb/main.c:618 openhwgroup#41 0x555c485f6677 in captured_main_1 /home/simark/src/binutils-gdb/gdb/main.c:1317 openhwgroup#42 0x555c485f6c83 in captured_main /home/simark/src/binutils-gdb/gdb/main.c:1338 openhwgroup#43 0x555c485f6d65 in gdb_main(captured_main_args*) /home/simark/src/binutils-gdb/gdb/main.c:1363 openhwgroup#44 0x555c46e41ba8 in main /home/simark/src/binutils-gdb/gdb/gdb.c:32 openhwgroup#45 0x7f71198bcb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24) openhwgroup#46 0x555c46e4197d in _start (/home/simark/build/binutils-gdb-one-target/gdb/gdb+0x77f197d) 0x6020000c52af is located 1 bytes to the left of 8-byte region [0x6020000c52b0,0x6020000c52b8) allocated by thread T0 here: #0 0x7f711b2b7459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154 openhwgroup#1 0x555c470acdc9 in xcalloc /home/simark/src/binutils-gdb/gdb/alloc.c:100 openhwgroup#2 0x555c49b775cd in xzalloc(unsigned long) /home/simark/src/binutils-gdb/gdbsupport/common-utils.cc:29 openhwgroup#3 0x555c4977bdeb in allocate_value_contents /home/simark/src/binutils-gdb/gdb/value.c:1029 openhwgroup#4 0x555c4977be25 in allocate_value(type*) /home/simark/src/binutils-gdb/gdb/value.c:1040 openhwgroup#5 0x555c4979030d in value_primitive_field(value*, long, int, type*) /home/simark/src/binutils-gdb/gdb/value.c:3092 openhwgroup#6 0x555c478f6280 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:501 openhwgroup#7 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#8 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#9 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#10 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#11 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#12 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 openhwgroup#13 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 openhwgroup#14 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 openhwgroup#15 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 openhwgroup#16 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 openhwgroup#17 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 openhwgroup#18 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513 openhwgroup#19 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161 openhwgroup#20 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 openhwgroup#21 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 openhwgroup#22 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 openhwgroup#23 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 openhwgroup#24 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151 openhwgroup#25 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335 openhwgroup#26 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383 openhwgroup#27 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438 openhwgroup#28 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632 openhwgroup#29 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048 Since there are some binaries with this in the wild, I think it would be useful for GDB to work around this. I did the obvious simple thing, if the DW_AT_data_member_location's value is -1, replace it with 0. I added a producer check to only apply this fixup for GCC 11. The idea is that if some other compiler ever uses a DW_AT_data_member_location value of -1 by mistake, we don't know (before analyzing the bug at least) if they did mean 0 or some other value. So I wouldn't want to apply the fixup in that case. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28063 Change-Id: Ieef3459b0b9bbce8bdad838ba83b4b64e7269d42

Starting with commit commit 1da5d0e Date: Tue Jan 4 08:02:24 2022 -0700 Change how Python architecture and language are handled we see a failure in gdb.threads/killed-outside.exp: ... Executing on target: kill -9 16622 (timeout = 300) builtin_spawn -ignore SIGHUP kill -9 16622 continue Continuing. Couldn't get registers: No such process. (gdb) [Thread 0x7ffff77c2700 (LWP 16626) exited] Program terminated with signal SIGKILL, Killed. The program no longer exists. FAIL: gdb.threads/killed-outside.exp: prompt after first continue (timeout) This is not a regression but a failure due to a change in GDB's output. Prior to the aforementioned commit, GDB has been printing the "Couldn't get registers: No such process." message twice. The second one came from (top-gdb) bt #0 amd64_linux_nat_target::fetch_registers (this=0x555557f31440 <the_amd64_linux_nat_target>, regcache=0x555558805ce0, regnum=16) at /gdb-up/gdb/amd64-linux-nat.c:225 openhwgroup#1 0x000055555640ac5f in target_ops::fetch_registers (this=0x555557d636d0 <the_thread_db_target>, arg0=0x555558805ce0, arg1=16) at /gdb-up/gdb/target-delegates.c:502 openhwgroup#2 0x000055555641a647 in target_fetch_registers (regcache=0x555558805ce0, regno=16) at /gdb-up/gdb/target.c:3945 openhwgroup#3 0x0000555556278e68 in regcache::raw_update (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:587 openhwgroup#4 0x0000555556278f14 in readable_regcache::raw_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:601 openhwgroup#5 0x00005555562792aa in readable_regcache::cooked_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:690 openhwgroup#6 0x000055555627965e in readable_regcache::cooked_read_value (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:748 openhwgroup#7 0x0000555556352a37 in sentinel_frame_prev_register (this_frame=0x555558181090, this_prologue_cache=0x5555581810a8, regnum=16) at /gdb-up/gdb/sentinel-frame.c:53 openhwgroup#8 0x0000555555fa4773 in frame_unwind_register_value (next_frame=0x555558181090, regnum=16) at /gdb-up/gdb/frame.c:1235 openhwgroup#9 0x0000555555fa420d in frame_register_unwind (next_frame=0x555558181090, regnum=16, optimizedp=0x7fffffffd570, unavailablep=0x7fffffffd574, lvalp=0x7fffffffd57c, addrp=0x7fffffffd580, realnump=0x7fffffffd578, bufferp=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1143 openhwgroup#10 0x0000555555fa455f in frame_unwind_register (next_frame=0x555558181090, regnum=16, buf=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1199 openhwgroup#11 0x00005555560178e2 in i386_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/i386-tdep.c:1972 openhwgroup#12 0x0000555555cd2b9d in gdbarch_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/gdbarch.c:3007 openhwgroup#13 0x0000555555fa3a5b in frame_unwind_pc (this_frame=0x555558181090) at /gdb-up/gdb/frame.c:948 openhwgroup#14 0x0000555555fa7621 in get_frame_pc (frame=0x555558181160) at /gdb-up/gdb/frame.c:2572 openhwgroup#15 0x0000555555fa7706 in get_frame_address_in_block (this_frame=0x555558181160) at /gdb-up/gdb/frame.c:2602 openhwgroup#16 0x0000555555fa77d0 in get_frame_address_in_block_if_available (this_frame=0x555558181160, pc=0x7fffffffd708) at /gdb-up/gdb/frame.c:2665 openhwgroup#17 0x0000555555fa5f8d in select_frame (fi=0x555558181160) at /gdb-up/gdb/frame.c:1890 openhwgroup#18 0x0000555555fa5bab in lookup_selected_frame (a_frame_id=..., frame_level=-1) at /gdb-up/gdb/frame.c:1720 openhwgroup#19 0x0000555555fa5e47 in get_selected_frame (message=0x0) at /gdb-up/gdb/frame.c:1810 openhwgroup#20 0x0000555555cc9c6e in get_current_arch () at /gdb-up/gdb/arch-utils.c:848 openhwgroup#21 0x000055555625b239 in gdbpy_before_prompt_hook (extlang=0x555557451f20 <extension_language_python>, current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/python/python.c:1063 openhwgroup#22 0x0000555555f7cfbb in ext_lang_before_prompt (current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/extension.c:922 openhwgroup#23 0x0000555555f7d442 in std::_Function_handler<void (char const*), void (*)(char const*)>::_M_invoke(std::_Any_data const&, char const*&&) (__functor=..., __args#0=@0x7fffffffd900: 0x555557f4d890 <top_prompt+16> "(gdb) ") at /usr/include/c++/7/bits/std_function.h:316 openhwgroup#24 0x0000555555f752dd in std::function<void (char const*)>::operator()(char const*) const (this=0x55555817d838, __args#0=0x555557f4d890 <top_prompt+16> "(gdb) ") at /usr/include/c++/7/bits/std_function.h:706 openhwgroup#25 0x0000555555f75100 in gdb::observers::observable<char const*>::notify (this=0x555557f49060 <gdb::observers::before_prompt>, args#0=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/../gdbsupport/observable.h:150 openhwgroup#26 0x0000555555f736dc in top_level_prompt () at /gdb-up/gdb/event-top.c:444 openhwgroup#27 0x0000555555f735ba in display_gdb_prompt (new_prompt=0x0) at /gdb-up/gdb/event-top.c:411 openhwgroup#28 0x00005555564611a7 in tui_on_command_error () at /gdb-up/gdb/tui/tui-interp.c:205 openhwgroup#29 0x0000555555c2173f in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316 openhwgroup#30 0x0000555555e10c20 in std::function<void ()>::operator()() const (this=0x5555580f9028) at /usr/include/c++/7/bits/std_function.h:706 openhwgroup#31 0x0000555555e10973 in gdb::observers::observable<>::notify() const (this=0x555557f48d20 <gdb::observers::command_error>) at /gdb-up/gdb/../gdbsupport/observable.h:150 openhwgroup#32 0x00005555560e9b3f in start_event_loop () at /gdb-up/gdb/main.c:438 openhwgroup#33 0x00005555560e9bcc in captured_command_loop () at /gdb-up/gdb/main.c:481 openhwgroup#34 0x00005555560eb616 in captured_main (data=0x7fffffffddd0) at /gdb-up/gdb/main.c:1348 openhwgroup#35 0x00005555560eb67c in gdb_main (args=0x7fffffffddd0) at /gdb-up/gdb/main.c:1363 openhwgroup#36 0x0000555555c1b6b3 in main (argc=12, argv=0x7fffffffded8) at /gdb-up/gdb/gdb.c:32 Commit 1da5d0e eliminated the call to 'get_current_arch' in 'gdbpy_before_prompt_hook'. Hence, the second instance of "Couldn't get registers: No such process." does not appear anymore. Fix the failure by updating the regular expression in the test.

…ync." Commit 14b3360 ("do_target_wait_1: Clear TARGET_WNOHANG if the target isn't async.") broke some multi-target tests, such as gdb.multi/multi-target-info-inferiors.exp. The symptom is that execution just hangs at some point. What happens is: 1. One remote inferior is started, and now sits stopped at a breakpoint. It is not "async" at this point (but it "can async"). 2. We run a native inferior, the event loop gets woken up by the native target's fd. 3. In do_target_wait, we randomly choose an inferior to call target_wait on first, it happens to be the remote inferior. 4. Because the target is currently not "async", we clear TARGET_WNOHANG, resulting in synchronous wait. We therefore block here: #0 0x00007fe9540dbb4d in select () from /usr/lib/libc.so.6 openhwgroup#1 0x000055fc7e821da7 in gdb_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/posix-hdep.c:31 openhwgroup#2 0x000055fc7ddef905 in interruptible_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/event-top.c:1134 openhwgroup#3 0x000055fc7eda58e4 in ser_base_wait_for (scb=0x6250002e4100, timeout=1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:240 openhwgroup#4 0x000055fc7eda66ba in do_ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:365 openhwgroup#5 0x000055fc7eda6ff6 in generic_readchar (scb=0x6250002e4100, timeout=-1, do_readchar=0x55fc7eda663c <do_ser_base_readchar(serial*, int)>) at /home/simark/src/binutils-gdb/gdb/ser-base.c:444 openhwgroup#6 0x000055fc7eda718a in ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:471 openhwgroup#7 0x000055fc7edb1ecd in serial_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/serial.c:393 openhwgroup#8 0x000055fc7ec48b8f in remote_target::readchar (this=0x617000038780, timeout=-1) at /home/simark/src/binutils-gdb/gdb/remote.c:9446 openhwgroup#9 0x000055fc7ec4da82 in remote_target::getpkt_or_notif_sane_1 (this=0x617000038780, buf=0x6170000387a8, forever=1, expecting_notif=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:9928 openhwgroup#10 0x000055fc7ec4f045 in remote_target::getpkt_or_notif_sane (this=0x617000038780, buf=0x6170000387a8, forever=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:10037 openhwgroup#11 0x000055fc7ec354d4 in remote_target::wait_ns (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8147 openhwgroup#12 0x000055fc7ec38aa1 in remote_target::wait (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8337 openhwgroup#13 0x000055fc7f1409ce in target_wait (ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2612 openhwgroup#14 0x000055fc7e19da98 in do_target_wait_1 (inf=0x617000038080, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3636 openhwgroup#15 0x000055fc7e19e26b in operator() (__closure=0x7ffdb77c2f90, inf=0x617000038080) at /home/simark/src/binutils-gdb/gdb/infrun.c:3697 openhwgroup#16 0x000055fc7e19f0c4 in do_target_wait (ecs=0x7ffdb77c33a0, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3716 openhwgroup#17 0x000055fc7e1a31f7 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4061 Before the aforementioned commit, we would not have cleared TARGET_WNOHANG, the remote target's wait would have returned nothing, and we would have consumed the native target's event. After applying this revert, the testsuite state looks as good as before for me on Ubuntu 20.04 amd64. Change-Id: Ic17a1642935cabcc16c25cb6899d52e12c2f5c3f

The current zombie leader detection code in linux-nat.c has a race -- if a multi-threaded inferior exits just before check_zombie_leaders finds that the leader is now zombie via checking /proc/PID/status, check_zombie_leaders deletes the leader, assuming we won't get an event for that exit (which we won't in some scenarios, but not in this one). That might seem mostly harmless, but it has some downsides: - later when we continue pulling events out of the kernel, we will collect the exit event of the non-leader threads, and once we see the last lwp in our list exit, we return _that_ lwp's exit code as whole-process exit code to infrun, instead of the leader's exit code. - this can cause a hang in stop_all_threads in infrun.c. Say there are 2 threads in the process. stop_all_threads stops each of those threads, and then waits for two stop or exit events, one for each thread. If the whole process exits, and check_zombie_leaders hits the false-positive case, linux-nat.c will only return one event to GDB (the whole-process exit returned when we see the last thread, the non-leader thread, exit), making stop_all_threads hang forever waiting for a second event that will never come. However, in this false-positive scenario, where the whole process is exiting, as opposed to just the leader (with pthread_exit(), for example), we _will_ get an exit event shortly for the leader, after we collect the exit event of all the other non-leader threads. Or put another way, we _always_ get an event for the leader after we see it become zombie. I tried a number of approaches to fix this: openhwgroup#1 - My first thought to address the race was to make GDB always report the whole-process exit status for the leader thread, not for whatever is the last lwp in the list. We _always_ get a final exit (or exec) event for the leader, and when the race triggers, we're not collecting it. openhwgroup#2 - My second thought was to try to plug the race in the first place. I thought of making GDB call waitpid/WNOHANG for all non-leader threads immediately when the zombie leader is detected, assuming there would be an exit event pending for each of them waiting to be collected. Turns out that that doesn't work -- you can see the leader become zombie _before_ the kernel kills all other threads. Waitpid in that small time window returns 0, indicating no-event. Thankfully we hit that race window all the time, which avoided trading one race for another. Looking at the non-leader thread's status in /proc doesn't help either, the threads are still in running state for a bit, for the same reason. openhwgroup#3 - My next attempt, which seemed promising, was to synchronously stop and wait for the stop for each of the non-leader threads. For the scenario in question, this will collect all the exit statuses of the non-leader threads. Then, if we are left with only the zombie leader in the lwp list, it means we either have a normal while-process exit or an exec, in which case we should not delete the leader. If _only_ the leader exited, like in gdb.threads/leader-exit.exp, then after pausing threads, we will still have at least one live non-leader thread in the list, and so we delete the leader lwp. I got this working and polished, and it was only after staring at the kernel code to convince myself that this would really work (and it would, for the scenario I considered), that I realized I had failed to account for one scenario -- if any non-leader thread is _already_ stopped when some thread triggers a group exit, like e.g., if you have some threads stopped and then resume just one thread with scheduler-locking or non-stop, and that thread exits the process. I also played with PTRACE_EVENT_EXIT, see if it would help in any way to plug the race, and I couldn't find a way that it would result in any practical difference compared to looking at /proc/PID/status, with respect to having a race. So I concluded that there's no way to plug the race, we just have to deal with it. Which means, going back to approach openhwgroup#1. That is the approach taken by this patch. Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979

…pported When parsing the ptid out of a reply package, if the multi-process extensions are not supported, use current_inferior's pid as the pid of the reported thread, instead of inferior_ptid. This is needed because the inferior_ptid may be null_ptid although a legit context exists, due to a prior context switch via switch_to_inferior_no_thread. Below is a scenario that illustrates what could go wrong. First, setup a multi-target scenario. This is needed, because in a multi-target setting, the inferior_ptid is cleared out before waiting on targets. The second inferior below sits on top of a remote target. Multi-process packets are disabled. $ # First, spawn a process with PID 26253 to attach to later. $ gdb-up a.out Reading symbols from a.out... (gdb) maint set target-non-stop on (gdb) set remote multiprocess-feature-packet off (gdb) start ... (gdb) add-inferior -no-connection [New inferior 2] Added inferior 2 (gdb) inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] (gdb) target extended-remote | gdbserver --multi - Remote debugging using | gdbserver --multi - Remote debugging using stdio (gdb) attach 26253 Attaching to Remote target Attached; pid = 26253 [New Thread 26253] [New inferior 3] Reading /tmp/a.out from remote target... ... [New Thread 26253] ... Reading /usr/local/lib/debug/....debug from remote target... >>> GDB seems to hang here. After attaching to a process and reading some library files, GDB seems to hang. One interesting thing to note is that [New Thread 26253] appears twice. We also see [New inferior 3] Running the same scenario with "debug infrun on" reveals more details. ... (gdb) attach 26253 [infrun] scoped_disable_commit_resumed: reason=attaching Attaching to Remote target Attached; pid = 26253 [New Thread 26253] [infrun] infrun_async: enable=1 [infrun] attach_command: immediately after attach: [infrun] attach_command: thread 26253.26253.0, executing = 1, resumed = 0, state = RUNNING [infrun] clear_proceed_status_thread: 26253.26253.0 [infrun] reset: reason=attaching [infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at openhwgroup#1 [infrun] random_pending_event_thread: None found. [infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) = [infrun] print_target_wait_results: 26253.26253.0 [Thread 26253], [infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_0 [infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_0 [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] context_switch: Switching context from 0.0.0 to 26253.26253.0 [infrun] handle_signal_stop: stop_pc=0x7f849d8cf151 [infrun] stop_waiting: stop_waiting [infrun] stop_all_threads: starting [infrun] stop_all_threads: pass=0, iterations=0 [New inferior 3] Reading /tmp/a.out from remote target... warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. Reading /tmp/a.out from remote target... Reading symbols from target:/tmp/a.out... [New Thread 26253] [infrun] stop_all_threads: 4723.4723.0 not executing [infrun] stop_all_threads: 26253.26253.0 not executing [infrun] stop_all_threads: 42000.26253.0 executing, need stop [infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) = [infrun] print_target_wait_results: -1.0.0 [Thread 0], [infrun] print_target_wait_results: status->kind = IGNORE [infrun] print_target_wait_results: target_wait (-1.0.0 [Thread 0], status) = [infrun] print_target_wait_results: -1.0.0 [Thread 0], [infrun] print_target_wait_results: status->kind = IGNORE GDB tried to stop Thread 42000.26253.0, which does not exist, and we are waiting for a stop event that will never happen. The PID in '42000.26253.0', namely 42000, is the PID of magic_null_ptid. It comes from gdb/remote.c:read_ptid: /* Since the stub is not sending a process id, then default to what's in inferior_ptid, unless it's null at this point. If so, then since there's no way to know the pid of the reported threads, use the magic number. */ if (inferior_ptid == null_ptid) pid = magic_null_ptid.pid (); else pid = inferior_ptid.pid (); if (obuf) *obuf = pp; return ptid_t (pid, tid); Because multi-process was turned off, GDB did not parse an explicitly specified PID. Furthermore, inferior_ptid == null_ptid, and eventually GDB picked the PID from magic_null_ptid. If target-non-stop is not turned on at the beginning, the same bug reveals itself as a duplicated thread as shown below. # Same setup as above, without 'maint set target-non-stop on'. ... (gdb) attach 26253 Attaching to Remote target Attached; pid = 26253 [New inferior 3] ... [New Thread 26253] ... (gdb) info threads Id Target Id Frame 1.1 process 13517 "a.out" main () at test.c:3 * 2.1 Thread 26253 "a.out" 0x00007f12750c5151 in read () from target:/lib/x86_64-linux-gnu/libc.so.6 3.1 Thread 26253 "a.out" Remote 'g' packet reply is too long (expected 560 bytes, got 2496 bytes): 00feffffffffffff000a3a75127f000051510c75127f0000000400000000000060d24ef6af5500000000000000000000680d000000000000b85b31e3fc7f0000c0283a75127f000000e55b75127f000010d04ef6af550000460200000000000060c73975127f0000a0d23975127f0000000a3a75127f0000000000000000000051510c75127f000046020000330000002b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000ffff0000000000000000000000000000000000000000000080143a75127f000080143a75127f000040143a75127f000040143a75127f00007d0000007e0000007f00000080000000300c3a75127f0000300c3a75127f00000e000000000000000e0000000000000000000000000000000000000000000000ffffffffffffffffffffffffffffffff0400000004000000040000000400000020143a75127f000020143a75127f000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000801f0000000000000000000000e55b75127f0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (gdb) Fix the problem by preferring current_inferior()'s pid instead of magic_null_ptid. Regression-tested on X86-64 Linux. Co-authored-by: Aleksandar Paunovic <aleksandar.paunovic@intel.com>

I see some failures, at least in gdb.multi/multi-re-run.exp and gdb.threads/interrupted-hand-call.exp. Running `stress -C $(nproc)` at the same time as the test makes those tests relatively frequent. Let's take gdb.multi/multi-re-run.exp as an example. The failure looks like this, an unexpected "no resumed": continue Continuing. No unwaited-for children left. (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=2: iter=1: continue until exit The situation is: - Inferior 1 is stopped somewhere, it won't really play a role here. - Inferior 2 has 2 threads, both stopped. - We resume inferior 2, the leader thread is expected to exit, making the process exit. From GDB's perspective, a failing run looks like this: [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at openhwgroup#1 [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: T0506:20dcffffff7f0000;07:20dcffffff7f0000;10:9551555555550000;thread:pae4cd.ae4cd;core:e; [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: 713933.713933.0 [Thread 713933.713933], [infrun] print_target_wait_results: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP [infrun] handle_inferior_event: status->kind = STOPPED, sig = GDB_SIGNAL_TRAP [infrun] clear_step_over_info: clearing step over info [infrun] context_switch: Switching context from 0.0.0 to 713933.713933.0 [infrun] handle_signal_stop: stop_pc=0x555555555195 [infrun] start_step_over: enter [infrun] start_step_over: stealing global queue of threads to step, length = 0 [infrun] operator(): step-over queue now empty [infrun] start_step_over: exit [infrun] process_event_stop_test: no stepping, continue [remote] Sending packet: $Z0,555555555194,1#8e [remote] Packet received: OK [infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [713933.713933.0] at 0x555555555195 [remote] Sending packet: $QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;2c;4c;97;#0a [remote] Packet received: OK [remote] Sending packet: $vCont;c:pae4cd.-1#9f [infrun] prepare_to_wait: prepare_to_wait [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] fetch_inferior_event: exit [infrun] fetch_inferior_event: enter [infrun] scoped_disable_commit_resumed: reason=handling event [infrun] do_target_wait: Found 2 inferiors, starting at #0 [infrun] random_pending_event_thread: None found. [remote] wait: enter [remote] Packet received: N [remote] wait: exit [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) = [infrun] print_target_wait_results: -1.0.0 [process -1], [infrun] print_target_wait_results: status->kind = NO_RESUMED [infrun] handle_inferior_event: status->kind = NO_RESUMED [remote] Sending packet: $Hgp0.0#ad [remote] Packet received: OK [remote] Sending packet: $qXfer:threads:read::0,1000#92 [remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n [infrun] stop_waiting: stop_waiting [remote] Sending packet: $qXfer:threads:read::0,1000#92 [remote] Packet received: l<threads>\n<thread id="pae4cb.ae4cb" core="3" name="multi-re-run-1" handle="40c7c6f7ff7f0000"/>\n<thread id="pae4cb.ae4cc" core="2" name="multi-re-run-1" handle="40b6c6f7ff7f0000"/>\n<thread id="pae4cd.ae4ce" core="1" name="multi-re-run-2" handle="40b6c6f7ff7f0000"/>\n</threads>\n [infrun] infrun_async: enable=0 [infrun] reset: reason=handling event [infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target extended-remote [infrun] fetch_inferior_event: exit We can see that we resume the inferior with vCont;c, but got NO_RESUMED. When the test passes, we get an EXITED status to indicate the process has exited. From GDBserver's point of view, it looks like this. The logs contain some logging I added and that are part of this patch. [remote] getpkt: getpkt ("vCont;c:pae4cf.-1"); [no ack sent] [threads] resume: enter [threads] thread_needs_step_over: Need step over [LWP 713931]? Ignoring, should remain stopped [threads] thread_needs_step_over: Need step over [LWP 713932]? Ignoring, should remain stopped [threads] get_pc: pc is 0x555555555195 [threads] thread_needs_step_over: Need step over [LWP 713935]? No, no breakpoint found at 0x555555555195 [threads] get_pc: pc is 0x7ffff7d35a95 [threads] thread_needs_step_over: Need step over [LWP 713936]? No, no breakpoint found at 0x7ffff7d35a95 [threads] resume: Resuming, no pending status or step over needed [threads] resume_one_thread: resuming LWP 713935 [threads] proceed_one_lwp: lwp 713935 [threads] resume_one_lwp_throw: continue from pc 0x555555555195 [threads] resume_one_lwp_throw: Resuming lwp 713935 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: NOW ptid=713935.713935.0 stopped=0 resumed=0 [threads] resume_one_thread: resuming LWP 713936 [threads] proceed_one_lwp: lwp 713936 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] resume: exit [threads] wait_1: enter [threads] wait_1: [<all threads>] [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK [threads] resume_stopped_resumed_lwps: resuming stopped-resumed LWP LWP 713935.713936 at 7ffff7d35a95: step=0 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 713936 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] operator(): check_zombie_leaders: leader_pid=713931, leader_lp!=NULL=1, num_lwps=2, zombie=0 [threads] operator(): check_zombie_leaders: leader_pid=713935, leader_lp!=NULL=1, num_lwps=2, zombie=1 [threads] operator(): Thread group leader 713935 zombie (it exited, or another thread execd). [threads] delete_lwp: deleting 713935 [threads] wait_for_event_filtered: exit (no unwaited-for LWP) sigchld_handler [threads] wait_1: ret = null_ptid, TARGET_WAITKIND_NO_RESUMED [threads] wait_1: exit What happens is: - We resume the leader (713935) successfully. - The leader exits. - We resume the secondary thread (713936), we get ESRCH. This is expected this the leader has exited. - resume_one_lwp_throw throws, it's caught by resume_one_lwp. - resume_one_lwp checks with check_ptrace_stopped_lwp_gone that the failure can be explained by the LWP becoming zombie, and swallows the error. - Note that this means that the secondary lwp still has stopped==1. - wait_1 is called, probably because linux_process_target::resume marks the async pipe at the end. - The exit event isn't ready yet, probably because the machine is under load, so waitpid returns nothing. - check_zombie_leaders detects that the leader is zombie and deletes - We try to find a resumed (non-stopped) LWP to get an event from, there's none since the leader (that was resumed) is now deleted, and the secondary thread is still marked stopped. wait_for_event_filtered returns -1, causing wait_1 to return NO_RESUMED. What I notice here is that there is some kind of race between the availability of the process' exit notification and the call to wait_1 that results from marking the async pipe at the end of resume. I think what we want from this wait_1 invocation is to keep waiting, as we will eventually get thread exit notifications for both of our threads. The fix I came up with is to mark the secondary thread as !stopped (or resumed) when we fail to resume it. This makes wait_1 see that there is at least one resume lwp, so it won't return NO_RESUMED. I think this makes sense to consider it resumed, because we are going to receive an exit event for it. Here's the GDBserver logs with the fix applied: [threads] resume: enter [threads] thread_needs_step_over: Need step over [LWP 724595]? Ignoring, should remain stopped [threads] thread_needs_step_over: Need step over [LWP 724596]? Ignoring, should remain stopped [threads] get_pc: pc is 0x555555555195 [threads] thread_needs_step_over: Need step over [LWP 724597]? No, no breakpoint found at 0x555555555195 [threads] get_pc: pc is 0x7ffff7d35a95 [threads] thread_needs_step_over: Need step over [LWP 724598]? No, no breakpoint found at 0x7ffff7d35a95 [threads] resume: Resuming, no pending status or step over needed [threads] resume_one_thread: resuming LWP 724597 [threads] proceed_one_lwp: lwp 724597 [threads] resume_one_lwp_throw: continue from pc 0x555555555195 [threads] resume_one_lwp_throw: Resuming lwp 724597 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: NOW ptid=724597.724597.0 stopped=0 resumed=0 [threads] resume_one_thread: resuming LWP 724598 [threads] proceed_one_lwp: lwp 724598 [threads] resume_one_lwp_throw: continue from pc 0x7ffff7d35a95 [threads] resume_one_lwp_throw: Resuming lwp 724598 (continue, signal 0, stop not expected) [threads] resume_one_lwp_throw: ptrace errno = 3 (No such process) [threads] resume: exit [threads] wait_1: enter [threads] wait_1: [<all threads>] sigchld_handler [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK [threads] operator(): check_zombie_leaders: leader_pid=724595, leader_lp!=NULL=1, num_lwps=2, zombie=0 [threads] operator(): check_zombie_leaders: leader_pid=724597, leader_lp!=NULL=1, num_lwps=2, zombie=1 [threads] operator(): Thread group leader 724597 zombie (it exited, or another thread execd). [threads] delete_lwp: deleting 724597 [threads] wait_for_event_filtered: sigsuspend'ing sigchld_handler [threads] wait_for_event_filtered: waitpid(-1, ...) returned 724598, ERRNO-OK [threads] wait_for_event_filtered: waitpid 724598 received 0 (exited) [threads] filter_event: 724598 exited [threads] wait_for_event_filtered: waitpid(-1, ...) returned 724597, ERRNO-OK [threads] wait_for_event_filtered: waitpid 724597 received 0 (exited) [threads] wait_for_event_filtered: waitpid(-1, ...) returned 0, ERRNO-OK sigchld_handler [threads] wait_1: ret = LWP 724597.724598, exited with retcode 0 [threads] wait_1: exit Change-Id: Idf0bdb4cb0313f1b49e4864071650cc83fb3c100

Bug 28980 shows that trying to value_copy an entirely optimized out value causes an internal error. The original bug report involves MI and some Python pretty printer, and is quite difficult to reproduce, but another easy way to reproduce (that is believed to be equivalent) was proposed: $ ./gdb -q -nx --data-directory=data-directory -ex "py print(gdb.Value(gdb.Value(5).type.optimized_out()))" /home/smarchi/src/binutils-gdb/gdb/value.c:1731: internal-error: value_copy: Assertion `arg->contents != nullptr' failed. This is caused by 5f8ab46 ("gdb: constify parameter of value_copy"). It added an assertion that the contents buffer is allocated if the value is not lazy: if (!value_lazy (val)) { gdb_assert (arg->contents != nullptr); This was based on the comment on value::contents, which suggest that this is the case: /* Actual contents of the value. Target byte-order. NULL or not valid if lazy is nonzero. */ gdb::unique_xmalloc_ptr<gdb_byte> contents; However, it turns out that it can also be nullptr also if the value is entirely optimized out, for example on exit of allocate_optimized_out_value. That function creates a lazy value, marks the entire value as optimized out, and then clears the lazy flag. But contents remains nullptr. This wasn't a problem for value_copy before, because it was calling value_contents_all_raw on the input value, which caused contents to be allocated before doing the copy. This means that the input value to value_copy did not have its contents allocated on entry, but had it allocated on exit. The result value had it allocated on exit. And that we copied bytes for an entirely optimized out value (i.e. meaningless bytes). From here I see two choices: 1. respect the documented invariant that contents is nullptr only and only if the value is lazy, which means making allocate_optimized_out_value allocate contents 2. extend the cases where contents can be nullptr to also include values that are entirely optimized out (note that you could still have some entirely optimized out values that do have contents allocated, it depends on how they were created) and adjust value_copy accordingly Choice openhwgroup#1 is safe, but less efficient: it's not very useful to allocate a buffer for an entirely optimized out value. It's even a bit less efficient than what we had initially, because values coming out of allocate_optimized_out_value would now always get their contents allocated. Choice openhwgroup#2 would be more efficient than what we had before: giving an optimized out value without allocated contents to value_copy would result in an optimized out value without allocated contents (and the input value would still be without allocated contents on exit). But it's more risky, since it's difficult to ensure that all users of the contents (through the various_contents* accessors) are all fine with that new invariant. In this patch, I opt for choice openhwgroup#2, since I think it is a better direction than choice openhwgroup#1. openhwgroup#1 would be a pessimization, and if we go this way, I doubt that it will ever be revisited, it will just stay that way forever. Add a selftest to test this. I initially started to write it as a Python test (since the reproducer is in Python), but a selftest is more straightforward. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28980 Change-Id: I6e2f5c0ea804fafa041fcc4345d47064b5900ed7

While working on a different patch, I triggered an assertion from the initialize_current_architecture code, specifically from one of the *_gdbarch_init functions in a *-tdep.c file. This exposes a couple of issues with GDB. This is easy enough to reproduce by adding 'gdb_assert (false)' into a suitable function. For example, I added a line into i386_gdbarch_init and can see the following issue. I start GDB and immediately hit the assert, the output is as you'd expect, except for the very last line: $ ./gdb/gdb --data-directory ./gdb/data-directory/ ../../src.dev-1/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- ... snip ... --------------------- ../../src.dev-1/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) ../../src.dev-1/gdb/ser-event.c:212:16: runtime error: member access within null pointer of type 'struct serial' Something goes wrong when we try to query the user. Note, I configured GDB with --enable-ubsan, I suspect that without this the above "error" would actually just be a crash. The backtrace from ser-event.c:212 looks like this: (gdb) bt 10 #0 serial_event_clear (event=0x675c020) at ../../src/gdb/ser-event.c:212 openhwgroup#1 0x0000000000769456 in invoke_async_signal_handlers () at ../../src/gdb/async-event.c:211 openhwgroup#2 0x000000000295049b in gdb_do_one_event () at ../../src/gdbsupport/event-loop.cc:194 openhwgroup#3 0x0000000001f015f8 in gdb_readline_wrapper ( prompt=0x67135c0 "../../src/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable.\nQuit this debugg"...) at ../../src/gdb/top.c:1141 openhwgroup#4 0x0000000002118b64 in defaulted_query(const char *, char, typedef __va_list_tag __va_list_tag *) ( ctlstr=0x2e4eb68 "%s\nQuit this debugging session? ", defchar=0 '\000', args=0x7fffffffa6e0) at ../../src/gdb/utils.c:934 openhwgroup#5 0x0000000002118f72 in query (ctlstr=0x2e4eb68 "%s\nQuit this debugging session? ") at ../../src/gdb/utils.c:1026 openhwgroup#6 0x00000000021170f6 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x6107bc0 <internal_error_problem>, file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.", ap=0x7fffffffa8e8) at ../../src/gdb/utils.c:417 openhwgroup#7 0x00000000021175a0 in internal_verror (file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.", ap=0x7fffffffa8e8) at ../../src/gdb/utils.c:485 openhwgroup#8 0x00000000029503b3 in internal_error (file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.") at ../../src/gdbsupport/errors.cc:55 openhwgroup#9 0x000000000122d5b6 in i386_gdbarch_init (info=..., arches=0x0) at ../../src/gdb/i386-tdep.c:8455 (More stack frames follow...) It turns out that the problem is that the async event handler mechanism has been invoked, but this has not yet been initialized. If we look at gdb_init (in gdb/top.c) we can indeed see the call to gdb_init_signals is after the call to initialize_current_architecture. If I reorder the calls, moving gdb_init_signals earlier, then the initial error is resolved, however, things are still broken. I now see the same "Quit this debugging session? (y or n)" prompt, but when I provide an answer and press return GDB immediately crashes. So what's going on now? The next problem is that the call_readline field within the current_ui structure is not initialized, and this callback is invoked to process the reply I entered. The problem is that call_readline is setup as a result of calling set_top_level_interpreter, which is called from captured_main_1. Unfortunately, set_top_level_interpreter is called after gdb_init is called. I wondered how to solve this problem for a while, however, I don't know if there's an easy "just reorder some lines" solution here. Looking through captured_main_1 there seems to be a bunch of dependencies between printing various things, parsing config files, and setting up the interpreter. I'm sure there is a solution hiding in there somewhere.... I'm just not sure I want to spend any longer looking for it. So. I propose a simpler solution, more of a hack/work-around. In utils.c we already have a function filtered_printing_initialized, this is checked in a few places within internal_vproblem. In some of these cases the call gates whether or not GDB will query the user. My proposal is to add a new readline_initialized function, which checks if the current_ui has had readline initialized yet. If this is not the case then we should not attempt to query the user. After this change GDB prints the error message, the backtrace, and then aborts (including dumping core). This actually seems pretty sane as, if GDB has not yet made it through the initialization then it doesn't make much sense to allow the user to say "no, I don't want to quit the debug session" (I think).

The variable right_lib_flags is not being set correctly to define RIGHT. The value RIGHT is needed to force the address of the library functions lib1_func3 and lib2_func4 to occur at different address in the wrong and right libraries. With RIGHT defined correctly, functions lib1_func3 and lib2_func4 occur at different addresses the test runs correctly on Powerpc. The test needs the lib2 addresses to be different in the right and wrong cases. That is the point of introducing function lib2_spacer with the ifdef RIGHT compiler directive. On Intel, the ARRAY_SIZE of 1 versus 8192 is sufficient to get the dynamic linker to move the addresses of the library. You can also get the same effect on PowerPC but you must use a value much larger than 8192. The key thing is that the test was not properly setting RIGHT to defined to get the lib2_spacer function on Intel and Powerpc. Without the patch, we have the Intel backtrace for the bad libraries: backtrace #0 break_here () at /home/ ... /gdb/testsuite/gdb.base/solib-search.c:30 openhwgroup#1 0x00007ffff7fae156 in ?? () openhwgroup#2 0x00007fffffffc150 in ?? () openhwgroup#3 0x00007ffff7fbb156 in ?? () openhwgroup#4 0x00007fffffffc160 in ?? () openhwgroup#5 0x00007ffff7fae146 in ?? () openhwgroup#6 0x00007fffffffc170 in ?? () openhwgroup#7 0x00007ffff7fbb146 in ?? () openhwgroup#8 0x00007fffffffc180 in ?? () openhwgroup#9 0x0000555555555156 in main () at /home/ ... /binutils-gdb/gdb/testsuite/gdb.base/solib-search.c:23 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) PASS: gdb.base/solib-search.exp: backtrace (with wrong libs) (data collection) The backtrace on Intel with the good libraries is: backtrace #0 break_here () at /.../binutils-gdb/gdb/testsuite/gdb.base/solib-search.c:30 openhwgroup#1 0x00007ffff7fae156 in lib2_func4 () at /.../binutils-gdb/gdb/testsuite/gdb.base/solib-search-lib2.c:49 openhwgroup#2 0x00007ffff7fbb156 in lib1_func3 () at /.../gdb.base/solib-search-lib1.c:49 openhwgroup#3 0x00007ffff7fae146 in lib2_func2 () at /.../testsuite/gdb.base/solib-search-lib2.c:30 openhwgroup#4 0x00007ffff7fbb146 in lib1_func1 () at /.../gdb.base/solib-search-lib1.c:30 openhwgroup#5 0x0000555555555156 in main () at /...solib-search.c:23 (gdb) PASS: gdb.base/solib-search.exp: backtrace (with right libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with right libs) In one case the backtrace is correct and the other it is wrong on Intel. This is due to the fact that the ARRAY_SIZE caused the dynamic linker to move the library function addresses around. I believe it has to do with the default size of the data and code sections used by the dynamic linker. So without the patch the backtrace on PowerPC looks like: backtrace #0 break_here () at /.../solib-search.c:30 openhwgroup#1 0x00007ffff7f007f4 in lib2_func4 () at /.../solib-search-lib2.c:49 openhwgroup#2 0x00007ffff7f307f4 in lib1_func3 () at /.../solib-search-lib1.c:49 openhwgroup#3 0x00007ffff7f007ac in lib2_func2 () at /.../solib-search-lib2.c:30 openhwgroup#4 0x00007ffff7f307ac in lib1_func1 () at /.../solib-search-lib1.c:30 openhwgroup#5 0x000000001000074c in main () at /.../solib-search.c:23 for both the good and bad libraries. The patch fixes defining RIGHT in solib-search-lib1.c and solib-search- lib2.c. Note, without the patch the lib1_spacer and lib2_spacer functions do not show up in the object dump of the Intel or Powerpc libraries as it should. The patch fixes that by making sure RIGHT gets defined. Now with the patch the backtrace for the bad library on PowerPC looks like: backtrace #0 break_here () at /.../solib-search.c:30 openhwgroup#1 0x00007ffff7f0083c in __glink_PLTresolve () from /.../solib-search-lib2.so Backtrace stopped: frame did not save the PC And the backtrace for the good libraries on PowerPC looks like: backtrace #0 break_here () at /.../solib-search.c:30 openhwgroup#1 0x00007ffff7f0083c in lib2_func4 () at /.../solib-search-lib2.c:49 openhwgroup#2 0x00007ffff7f3083c in lib1_func3 () at /.../solib-search-lib1.c:49 openhwgroup#3 0x00007ffff7f007cc in lib2_func2 () at /.../solib-search-lib2.c:30 openhwgroup#4 0x00007ffff7f307cc in lib1_func1 () at /.../solib-search-lib1.c:30 openhwgroup#5 0x000000001000074c in main () at /.../solib-search.c:23 (gdb) PASS: gdb.base/solib-search.exp: backtrace (with right libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with right libs) The issue then is on Power where the ARRAY_SIZE of 1 versus 8192 is not sufficient to cause the dymanic linker to allocate the libraries at different addresses. I don't claim to understand the specifics of how the dynamic linker works and what the default size is for the data and code sections are. My guess is by default PowerPC allocates a larger data size by default, which is large enough to hold array[8192]. The default size of the data section allocated by the dynamic linker on Intel is not large enough to hold array[8192] thus causing the code section on Intel to have to move when the large array is defined. Note on PowerPC, if you make ARRAY_SIZE big enough, then you will cause the library addresses to occur at different addresses as the larger data section forces the code section to a different address. That was actually my original fix for the program until I spoke with Doug Evans who originally wrote the test. Doug noticed that RIGHT was not getting defined as he originally intended in the test. With the patch to fix the definition of RIGHT, PowerPC has a bad and a good backtrace because the address of lib1_func3 and lib2_func4 both move because lib1_spacer and lib2_spacer are now defined before lib1_func3 and lib2_func4. Without the patch, the lib1_spacer and lib2_spacer function doesn't show up in the binary for the correct or incorrect library on Intel or PowerPC. With the patch, RIGHT gets defined as originally intended for the test on both architectures and lib1_spacer and lib2_spacer function show up in the binaries on both architectures changing the other function addresses as intended thus causing the test work as intended on PowerPC.

In some cases GDB will fail when attempting to complete a command that involves a rust symbol, the failure can manifest as a crash. The problem is caused by the completion_match_for_lcd object being left containing invalid data during calls to cp_symbol_name_matches_1. The first question to address is why we are calling a C++ support function when handling a rust symbol. That's due to GDB's auto language detection for msymbols, in some cases GDB can't tell if a symbol is a rust symbol, or a C++ symbol. The test application contains symbols for functions which are statically linked in from various rust support libraries. There's no DWARF for these symbols, so all GDB has is the msymbols built from the ELF symbol table. Here's the problematic symbol that leads to our crash: mangled: _ZN4core3str21_$LT$impl$u20$str$GT$5parse17h5111d2d6a50d22bdE demangled: core::str::<impl str>::parse As an msymbol this is initially created with language auto, then GDB eventually calls symbol_find_demangled_name, which loops over all languages calling language_defn::sniff_from_mangled_name, the first language that can demangle the symbol gets assigned as the language for that symbol. Unfortunately, there's overlap in the mangled symbol names, some (legacy) rust symbols can be demangled as both rust and C++, see cplus_demangle in libiberty/cplus-dem.c where this is mentioned. And so, because we check the C++ language before we check for rust, then the msymbol is (incorrectly) given the C++ language. Now it's true that is some cases we might be able to figure out that a demangled symbol is not actually a valid C++ symbol, for example, in our case, the construct '::<impl str>::' is not, I believe, valid in a C++ symbol, we could look for ':<' and '>:' and refuse to accept this as a C++ symbol. However, I'm not sure it is always possible to tell that a demangled symbol is rust or C++, so, I think, we have to accept that some times we will get this language detection wrong. If we accept that we can't fix the symbol language detection 100% of the time, then we should make sure that GDB doesn't crash when it gets the language wrong, that is what this commit addresses. In our test case the user tries to complete a symbol name like this: (gdb) complete break pars This results in GDB trying to find all symbols that match 'pars', eventually we consider our problematic symbol, and we end up with a call stack that looks like this: #0 0x0000000000f3c6bd in strncmp_iw_with_mode openhwgroup#1 0x0000000000706d8d in cp_symbol_name_matches_1 openhwgroup#2 0x0000000000706fa4 in cp_symbol_name_matches openhwgroup#3 0x0000000000df3c45 in compare_symbol_name openhwgroup#4 0x0000000000df3c91 in completion_list_add_name openhwgroup#5 0x0000000000df3f1d in completion_list_add_msymbol openhwgroup#6 0x0000000000df4c94 in default_collect_symbol_completion_matches_break_on openhwgroup#7 0x0000000000658c08 in language_defn::collect_symbol_completion_matches openhwgroup#8 0x0000000000df54c9 in collect_symbol_completion_matches openhwgroup#9 0x00000000009d98fb in linespec_complete_function openhwgroup#10 0x00000000009d99f0 in complete_linespec_component openhwgroup#11 0x00000000009da200 in linespec_complete openhwgroup#12 0x00000000006e4132 in complete_address_and_linespec_locations openhwgroup#13 0x00000000006e4ac3 in location_completer In cp_symbol_name_matches_1 we enter a loop, this loop repeatedly tries to match the demangled problematic symbol name against the user supplied text ('pars'). Each time around the loop another component of the symbol name is stripped off, thus, we check 'pars' against these options: core::str::<impl str>::parse str::<impl str>::parse <impl str>::parse parse As soon as we get a match the cp_symbol_name_matches_1 exits its loop and returns. In our case, when we're looking for 'pars', the match occurs on the last iteration of the loop, when we are comparing to 'parse'. Now the problem here is that cp_symbol_name_matches_1 uses the strncmp_iw_with_mode, and inside strncmp_iw_with_mode we allow for skipping over template parameters. This allows GDB to match the symbol name 'foo<int>(int,int)' if the user supplies 'foo(int,'. Inside strncmp_iw_with_mode GDB will record any template arguments that it has skipped over inside the completion_match_for_lcd object that is passed in as an argument. And so, when GDB tries to match against '<impl str>::parse', the first thing it sees is '<impl str>', GDB assumes this is a template argument and records this as a skipped region within the completion_match_for_lcd object. After '<impl str>' GDB sees a ':' character, which doesn't match with the 'pars' the user supplied, so strncmp_iw_with_mode returns a value indicating a non-match. GDB then removes the '<impl str>' component from the symbol name and tries again, this time comparing to 'parse', which does match. Having found a match, then in cp_symbol_name_matches_1 we record the match string, and the full symbol name within the completion_match_result object, and return. The problem here is that the skipped region, the '<impl str>' that we recorded in the penultimate loop iteration was never discarded, its still there in our returned result. If we look at what the pointers held in the completion_match_result that cp_symbol_name_matches_1 returns, this is what we see: core::str::<impl str>::parse | \________/ | | | '--- completion match string | '---skip range '--- full symbol name When GDB calls completion_match_for_lcd::finish, GDB tries to create a string using the completion match string (parse), but excluding the skip range, as the stored skip range is before the start of the completion match string, then GDB tries to do some weird string creation, which will cause GDB to crash. The reason we don't often see this problem in C++ is that for C++ symbols there is always some non-template text before the template argument. This non-template text means GDB is likely to either match the symbol, or reject the symbol without storing a skip range. However, notice, I did say, we don't often see this problem. Once I understood the issue, I was able to reproduce the crash using a pure C++ example: template<typename S> struct foo { template<typename T> foo (int p1, T a) { s = 0; } S s; }; int main () { foo<int> obj (2.3, 0); return 0; } Then in GDB: (gdb) complete break foo(int The problem here is that the C++ symbol for the constructor looks like this: foo<int>::foo<double>(int, double) When GDB enters cp_symbol_name_matches_1 the symbols it examines are: foo<int>::foo<double>(int, double) foo<double>(int, double) The first iteration of the loop will match the 'foo', then add the '<int>' template argument will be added as a skip range. When GDB find the ':' after the '<int>' the first iteration of the loop fails to match, GDB removes the 'foo<int>::' component, and starts the second iteration of the loop. Again, GDB matches the 'foo', and now adds '<double>' as a skip region. After that the '(int' successfully matches, and so the second iteration of the loop succeeds, but, once again we left the '<int>' in place as a skip region, even though this occurs before the start of our match string, and this will cause GDB to crash. This problem was reported to the mailing list, and a solution discussed in this thread: https://sourceware.org/pipermail/gdb-patches/2023-January/195166.html The solution proposed here is similar to one proposed by the original bug reported, but implemented in a different location within GDB. Instead of placing the fix in strncmp_iw_with_mode, I place the fix in cp_symbol_name_matches_1. I believe this is a better location as it is this function that implements the loop, and it is this loop, which repeatedly calls strncmp_iw_with_mode, that should be resetting the result object state (I believe). What I have done is add an assert to strncmp_iw_with_mode that the incoming result object is empty. I've also added some other asserts in related code, in completion_match_for_lcd::mark_ignored_range, I make some basic assertions about the incoming range pointers, and in completion_match_for_lcd::finish I also make some assertions about how the skip ranges relate to the match pointer. There's two new tests. The original rust example that was used in the initial bug report, and a C++ test. The rust example depends on which symbols are pulled in from the rust libraries, so it is possible that, at some future date, the problematic symbol will disappear from this test program. The C++ test should be more reliable, as this only depends on symbols from within the C++ source code. Since I originally posted this patch to the mailing list, the following patch has been merged: commit 6e7eef7 Date: Sun Mar 19 09:13:10 2023 -0600 Use rust_demangle to fix a crash This solves the problem of a rust symbol ending up in the C++ specific code by changing the order languages are sorted. However, this new commit doesn't address the issue in the C++ code which was fixed with this commit. Given that the C++ issue is real, and has a reproducer, I'm still going to merge this fix. I've left the discussion of rust in this commit message as I originally wrote it, but it should be read within the context of GDB prior to commit 6e7eef7. Co-Authored-By: Zheng Zhan <zzlossdev@163.com>

jeremybennett · 2023-04-28T08:10:36Z

This issue is no longer causing a problem with either OpenOCD or with Embdebug, so the issue can be closed.

jeremybennett · 2023-04-28T08:11:12Z

(Fixed upstream by a rewrite of the RISC-V TDEP file by Andrew Burgess)

It was pointed out on the mailing list[1] that after this commit: commit b1e0126 Date: Wed Jun 21 14:18:54 2023 +0100 gdb: don't resume vfork parent while child is still running the test gdb.base/vfork-follow-parent.exp now has some failures when run with the native-gdbserver or native-extended-gdbserver boards: FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout) The reason that these failures don't show up when run on the standard unix board is that the test is only run in the default operating mode, so for Linux this will be all-stop on top of non-stop. If we adjust the test script so that it runs in the default mode and with target-non-stop turned off, then we see the same failures on the unix board. This commit includes this change. The way that the test is written means that it is not (currently) possible to turn on non-stop mode and have the test still work, so this commit does not do that. I have also updated the test script so that the vfork child performs an exec as well as the current exit. Exec and exit are the two ways in which a vfork child can release the vfork parent, so testing both of these cases is useful I think. In this test the inferior performs a vfork and the vfork-child immediately exits. The vfork-parent will wait for the vfork-child and then blocks waiting for gdb. Once gdb has released the vfork-parent, the vfork-parent also exits. In the test that fails, GDB sets 'detach-on-fork off' and then runs to the vfork. At this point the test tries to just "continue", but this fails as the vfork-parent is still selected, and the parent can't continue until the vfork-child completes. As the vfork-child is stopped by GDB the parent will never stop once resumed, so GDB refuses to resume it. The test script then sets 'schedule-multiple on' and once again continues. This time GDB, in theory, resumes both the parent and the child, the parent will be held blocked by the kernel, but the child will run until it exits, and which point GDB stops again, this time with inferior 2, the newly exited vfork-child, selected. What happens after this in the test script is irrelevant as far as this failure is concerned. To understand why the test started failing we should consider the behaviour of four different cases: 1. All-stop-on-non-stop before commit b1e0126, 2. All-stop-on-non-stop after commit b1e0126, 3. All-stop-on-all-stop before commit b1e0126, and 4. All-stop-on-all-stop after commit b1e0126. Only case openhwgroup#4 is failing after commit b1e0126, but I think the other cases are interesting because, (a) they inform how we might fix the regression, and (b) it turns out the behaviour of openhwgroup#2 changed too with the commit, but the change was harmless. For #1 All-stop-on-non-stop before commit b1e0126, what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-non-stop, every thread is resumed individually, so GDB tries to resume both the vfork-parent and the vfork-child, both of which succeed, 3. The vfork-parent is held stopped by the kernel, 4. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 5. At this point we might take two paths depending on which event GDB handles first, if GDB handles the VFORK_DONE first then: (a) As GDB is controlling both parent and child the VFORK_DONE is ignored (see handle_vfork_done), the vfork-parent will be resumed, (b) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. Alternatively, if GDB selects the EXITED event first then: (c) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. (d) At some future time the user resumes the vfork-parent, at which point the VFORK_DONE is reported to GDB, however, GDB is ignoring the VFORK_DONE (see handle_vfork_done), so the parent is resumed. For case openhwgroup#2, all-stop-on-non-stop after commit b1e0126, the important difference is in step (2) above, now, instead of resuming both the vfork-parent and the vfork-child, only the vfork-child is resumed. As such, when we get to step (5), only a single event, the EXITED event is reported. GDB handles the EXITED just as in (5)(c), then, later, when the user resumes the vfork-parent, the VFORKED_DONE is immediately delivered from the kernel, but this is ignored just as in (5)(d), and so, though the pattern of when the vfork-parent is resumed changes, the overall pattern of which events are reported and when, doesn't actually change. In fact, by not resuming the vfork-parent, the order of events (in this test) is now deterministic, which (maybe?) is a good thing. If we now consider case openhwgroup#3, all-stop-on-all-stop before commit b1e0126, then what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-all-stop, the resume is passed down to the linux-nat target, the vfork-parent is the event thread, while the vfork-child is a sibling of the event thread, 3. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this causes the vfork-child to be resumed. Then in linux_nat_target::resume, the event thread, the vfork-parent, is also resumed. 4. The vfork-parent is held stopped by the kernel, 5. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 6. We are now in a situation identical to step (5) as for all-stop-on-non-stop above, GDB selects one of the events to handle, and whichever we select the user sees the correct behaviour. And so, finally, we can consider openhwgroup#4, all-stop-on-all-stop after commit b1e0126, this is the case that started failing. We start out just like above, in proceed, the resume_ptid is -1 (resume everything), due to schedule multiple being on. And just like above, due to the target being all-stop, we call proceed_resume_thread_checked just once, for the current thread, which, remember, is the vfork-parent thread. The change in commit b1e0126 was to avoid resuming a vfork-parent thread, read the commit message for the justification for this change. However, this means that GDB now rejects resuming the vfork-parent in this case, which means that nothing gets resumed! Obviously, if nothing resumes, then nothing will ever stop, and so GDB appears to hang. I considered a couple of solutions which, in the end, I didn't go with, these were: 1. Move the vfork-parent check out of proceed_resume_thread_checked, and place it in proceed, but only on the all-stop-on-non-stop path, this should still address the issue seen in b1e0126, but would avoid the issue seen here. I rejected this just because it didn't feel great to split the checks that exist in proceed_resume_thread_checked like this, 2. Extend the condition in proceed_resume_thread_checked by adding a target_is_non_stop_p check. This would have the same effect as idea 1, but leaves all the checks in the same place, which I think would be better, but this still just didn't feel right to me, and so, What I noticed was that for the all-stop-on-non-stop, after commit b1e0126, we only resumed the vfork-child, and this seems fine. The vfork-parent isn't going to run anyway (the kernel will hold it back), so if feels like we there's no harm in just waiting for the child to complete, and then resuming the parent. So then I started looking at follow_fork, which is called from the top of proceed. This function already has the task of switching between the parent and child based on which the user wishes to follow. So, I wondered, could we use this to switch to the vfork-child in the case that we are attached to both? Turns out this is pretty simple to do. Having done that, now the process is for all-stop-on-all-stop after commit b1e0126, and with this new fix is: 1. GDB calls proceed with the vfork-parent selected, but, 2. In follow_fork, and follow_fork_inferior, GDB switches the selected thread to be that of the vfork-child, 3. Back in proceed user_visible_resume_ptid returns -1 (everything) as the resume_ptid still, but now, 4. When GDB calls proceed_resume_thread_checked, the vfork-child is the current selected thread, this is not a vfork-parent, and so GDB allows the proceed to continue to the linux-nat target, 5. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this does not resume the vfork-parent (because it is a vfork-parent), and then the vfork-child is resumed as this is the event thread, At this point we are back in the same situation as for all-stop-on-non-stop after commit b1e0126, that is, the vfork-child is resumed, while the vfork-parent is held stopped by GDB. Eventually the vfork-child will exit or exec, at which point the vfork-parent will be resumed. [1] https://inbox.sourceware.org/gdb-patches/3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca/

After running a number of programs under Windows gdb and detaching them, I typed run in gdb, and got a hang, here: (top-gdb) bt #0 sharing_input_terminal (pid=4672) at /home/pedro/gdb/src/gdb/mingw-hdep.c:388 #1 0x00007ff71a2d8678 in sharing_input_terminal (inf=0x23bf23dafb0) at /home/pedro/gdb/src/gdb/inflow.c:269 openhwgroup#2 0x00007ff71a2d887b in child_terminal_save_inferior (self=0x23bf23de060) at /home/pedro/gdb/src/gdb/inflow.c:423 openhwgroup#3 0x00007ff71a2c80c0 in inf_child_target::terminal_save_inferior (this=0x23bf23de060) at /home/pedro/gdb/src/gdb/inf-child.c:111 openhwgroup#4 0x00007ff71a429c0f in target_terminal_is_ours_kind (desired_state=target_terminal_state::is_ours_for_output) at /home/pedro/gdb/src/gdb/target.c:1037 openhwgroup#5 0x00007ff71a429e02 in target_terminal::ours_for_output () at /home/pedro/gdb/src/gdb/target.c:1094 openhwgroup#6 0x00007ff71a2ccc8e in post_create_inferior (from_tty=0) at /home/pedro/gdb/src/gdb/infcmd.c:245 openhwgroup#7 0x00007ff71a2cd431 in run_command_1 (args=0x0, from_tty=0, run_how=RUN_NORMAL) at /home/pedro/gdb/src/gdb/infcmd.c:502 openhwgroup#8 0x00007ff71a2cd58b in run_command (args=0x0, from_tty=0) at /home/pedro/gdb/src/gdb/infcmd.c:527 The problem is that the loop around GetConsoleProcessList looped forever, because there were exactly 10 processes to return. GetConsoleProcessList's documentation says: If the buffer is too small to hold all the valid process identifiers, the return value is the required number of array elements. The function will have stored no identifiers in the buffer. In this situation, use the return value to allocate a buffer that is large enough to store the entire list and call the function again. In this case, the buffer wasn't too small, it was exactly the right size, so we should have broken out of the loop. We didn't due to a "<" check that should have been "<=". That is fixed by this patch. Approved-By: Tom Tromey <tom@tromey.com> Reviewed-By: Eli Zaretskii <eliz@gnu.org> Change-Id: I14e4909f2ac2fa83d0d9b6e64418b5831ac4e4e3

When running test-case gdb.base/add-symbol-file-attach.exp with target board unix/-m32, we run into: ... (gdb) attach 3955^M Attaching to process 3955^M Load new symbol table from "add-symbol-file-attach"? (y or n) y^M Reading symbols from add-symbol-file-attach/add-symbol-file-attach...^M Reading symbols from /lib/libm.so.6...^M Reading symbols from /usr/lib/debug/lib/libm-2.31.so-i386.debug...^M Reading symbols from /lib/libc.so.6...^M Reading symbols from /usr/lib/debug/lib/libc-2.31.so-i386.debug...^M Reading symbols from /lib/ld-linux.so.2...^M Reading symbols from /usr/lib/debug/lib/ld-2.31.so-i386.debug...^M 0xf7f53549 in __kernel_vsyscall ()^M (gdb) FAIL: gdb.base/add-symbol-file-attach.exp: attach ... The test fails because this regexp is used: ... -re ".*in \[_A-Za-z0-9\]*pause.*$gdb_prompt $" { ... The regexp attempts to detect that the exec is somewhere in pause (): ... int main (int argc, char **argv) { pause (); return 0; } ... but when the exec is blocked in pause, the backtrace is: ... (gdb) bt #0 0xf7fd2549 in __kernel_vsyscall () #1 0xf7d84966 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 openhwgroup#2 0x0804844c in main (argc=1, argv=0xffffce84) at /data/vries/gdb/src/gdb/testsuite/gdb.base/add-symbol-file-attach.c:26 ... We could simply extend the regexp to also match __kernel_vsyscall, but the more fundamental problem is that the test is racy. The attach can happen before the exec is blocked in pause (), somewhere in the dynamic linker resolving the call to pause, in main or even earlier. Note that for the test-case to be effective, the exec is not required to be in pause (). I added a "while (1);" loop at the start of main, reverted the patch fixing the corresponding PR and reproduced the problem it's supposed to detect. Fix this by simply matching the "Reading symbols from" line, similar to what an earlier test is doing. While we're at it, rewrite the earlier test to also use the -wrap idiom. Tested on x86_64-linux.

In the case where a Fortran program has a program name of "main" and there is also a minimal symbol called main, such as with programs built with GCC version 4.4.7 or below, the backtrace will erroneously stop at the minimal symbol rather than the user specified main, e.g.: (gdb) bt #0 bar () at .../gdb/testsuite/gdb.fortran/backtrace.f90:17 #1 0x0000000000402556 in foo () at .../gdb/testsuite/gdb.fortran/backtrace.f90:21 #2 0x0000000000402575 in main () at .../gdb/testsuite/gdb.fortran/backtrace.f90:31 #3 0x00000000004025aa in main () (gdb) This patch fixes this issue by increasing the precedence of the full symbol when the language of the current frame is Fortran. Newer versions of GCC transform the program name to "MAIN__" in this case, avoiding the problem. Co-Authored-By: Maciej W. Rozycki <macro@embecosm.com>

GDB expected PC should point right after the SVC instruction when the syscall is active. But some active syscalls keep PC pointing to the SVC instruction itself. This leads to a broken backtrace like: Backtrace stopped: previous frame identical to this frame (corrupt stack?) #0 0xb6f8681c in pthread_cond_timedwait@@GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0 #1 0xb6e21f80 in ?? () The reason is that .ARM.exidx unwinder gives up if PC does not point right after the SVC (syscall) instruction. I did not investigate why but some syscalls will point PC to the SVC instruction itself. This happens for the "futex" syscall used by pthread_cond_timedwait. That normally does not matter as ARM prologue unwinder gets called instead of the .ARM.exidx one. Unfortunately some glibc calls have more complicated prologue where the GDB unwinder fails to properly determine the return address (that is in fact an orthogonal GDB bug). I expect it is due to the "vpush" there in this case but I did not investigate it more: Dump of assembler code for function pthread_cond_timedwait@@GLIBC_2.4: 0xb6f8757c <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0xb6f87580 <+4>: mov r10, r2 0xb6f87584 <+8>: vpush {d8} Regression tested on armv7l kernel 5.15.32-v7l+ (Raspbian 11). Approved-By: Luis Machado <luis.machado@arm.com>

When GDB fails to test the condition of a conditional breakpoint, for whatever reason, the error message looks like this: (gdb) break foo if (*(int *) 0) == 1 Breakpoint 1 at 0x40111e: file bpcond.c, line 11. (gdb) r Starting program: /tmp/bpcond Error in testing breakpoint condition: Cannot access memory at address 0x0 Breakpoint 1, foo () at bpcond.c:11 11 int a = 32; (gdb) The line I'm interested in for this commit is this one: Error in testing breakpoint condition: In the case above we can figure out that the problematic breakpoint was #1 because in the final line of the message GDB reports the stop at breakpoint #1. However, in the next few patches I plan to change this. In some cases I don't think it makes sense for GDB to report the stop as being at breakpoint #1, consider this case: (gdb) list some_func 1 int 2 some_func () 3 { 4 int *p = 0; 5 return *p; 6 } 7 8 void 9 foo () 10 { (gdb) break foo if (some_func ()) Breakpoint 1 at 0x40111e: file bpcond.c, line 11. (gdb) r Starting program: /tmp/bpcond Program received signal SIGSEGV, Segmentation fault. 0x0000000000401116 in some_func () at bpcond.c:5 5 return *p; Error in testing breakpoint condition: The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on". Evaluation of the expression containing the function (some_func) will be abandoned. When the function is done executing, GDB will silently stop. Program received signal SIGSEGV, Segmentation fault. Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5 5 return *p; (gdb) Notice that, the final lines of output reports the stop as being at breakpoint #1, even though the inferior in not located within some_func, and it's certainly not located at the breakpoint location. I find this behaviour confusing, and propose that this should be changed. However, if I make that change then every reference to breakpoint #1 will be lost from the error message. So, in this commit, in preparation for the later commits, I propose to change the 'Error in testing breakpoint condition:' line to this: Error in testing condition for breakpoint NUMBER: where NUMBER will be filled in as appropriate. Here's the first example with the updated error: (gdb) break foo if (*(int *) 0) == 0 Breakpoint 1 at 0x40111e: file bpcond.c, line 11. (gdb) r Starting program: /tmp/bpcond Error in testing condition for breakpoint 1: Cannot access memory at address 0x0 Breakpoint 1, foo () at bpcond.c:11 11 int a = 32; (gdb) The breakpoint number does now appear twice in the output, but I don't see that as a negative. This commit just changes the one line of the error, and updates the few tests that either included the old error in comments, or actually checked for the error in the expected output. As the only test that checked the line I modified is a Python test, I've added a new test that doesn't rely on Python that checks the error message in detail. While working on the new test, I spotted that it would fail when run with native-gdbserver and native-extended-gdbserver target boards. This turns out to be due to a gdbserver bug. To avoid cluttering this commit I've added a work around to the new test script so that the test passes for the remote boards, in the next few commits I will fix gdbserver, and update the test script to remove the work around.

Commit 7a8de0c ("Remove ALL_BREAKPOINTS_SAFE") introduced a use-after-free in the breakpoints iterations (see below for full ASan report). This makes gdb.base/stale-infcall.exp fail when GDB is build with ASan. check_longjmp_breakpoint_for_call_dummy iterates on all breakpoints, possibly deleting the current breakpoint as well as related breakpoints. The problem arises when a breakpoint in the B->related_breakpoint chain is also B->next. In that case, deleting that related breakpoint frees the breakpoint that all_breakpoints_safe has saved. The old code worked around that by manually changing B_TMP, which was the next breakpoint saved by the "safe iterator": while (b->related_breakpoint != b) { if (b_tmp == b->related_breakpoint) b_tmp = b->related_breakpoint->next; delete_breakpoint (b->related_breakpoint); } (Note that this seemed to assume that b->related_breakpoint->next was the same as b->next->next, not sure this is guaranteed.) The new code kept the B_TMP variable, but it's not useful in that context. We can't go change the next breakpoint as saved by the safe iterator, like we did before. I suggest fixing that by saving the breakpoints to delete in a map and deleting them all at the end. Here's the full ASan report: (gdb) PASS: gdb.base/stale-infcall.exp: continue to breakpoint: break-run1 print infcall () ================================================================= ==47472==ERROR: AddressSanitizer: heap-use-after-free on address 0x611000034980 at pc 0x563f7012c7bc bp 0x7ffdf3804d70 sp 0x7ffdf3804d60 READ of size 8 at 0x611000034980 thread T0 #0 0x563f7012c7bb in next_iterator<breakpoint>::operator++() /home/smarchi/src/binutils-gdb/gdb/../gdbsupport/next-iterator.h:66 #1 0x563f702ce8c0 in basic_safe_iterator<next_iterator<breakpoint> >::operator++() /home/smarchi/src/binutils-gdb/gdb/../gdbsupport/safe-iterator.h:84 #2 0x563f7021522a in check_longjmp_breakpoint_for_call_dummy(thread_info*) /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:7611 #3 0x563f714567b1 in process_event_stop_test /home/smarchi/src/binutils-gdb/gdb/infrun.c:6881 #4 0x563f71454e07 in handle_signal_stop /home/smarchi/src/binutils-gdb/gdb/infrun.c:6769 #5 0x563f7144b680 in handle_inferior_event /home/smarchi/src/binutils-gdb/gdb/infrun.c:6023 #6 0x563f71436165 in fetch_inferior_event() /home/smarchi/src/binutils-gdb/gdb/infrun.c:4387 #7 0x563f7136ff51 in inferior_event_handler(inferior_event_type) /home/smarchi/src/binutils-gdb/gdb/inf-loop.c:42 #8 0x563f7168038d in handle_target_event /home/smarchi/src/binutils-gdb/gdb/linux-nat.c:4219 #9 0x563f72fccb6d in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573 #10 0x563f72fcd503 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694 #11 0x563f72fcaf2b in gdb_do_one_event(int) /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:217 #12 0x563f7262b9bb in wait_sync_command_done() /home/smarchi/src/binutils-gdb/gdb/top.c:426 #13 0x563f7137a7c3 in run_inferior_call /home/smarchi/src/binutils-gdb/gdb/infcall.c:650 #14 0x563f71381295 in call_function_by_hand_dummy(value*, type*, gdb::array_view<value*>, void (*)(void*, int), void*) /home/smarchi/src/binutils-gdb/gdb/infcall.c:1332 #15 0x563f7137c0e2 in call_function_by_hand(value*, type*, gdb::array_view<value*>) /home/smarchi/src/binutils-gdb/gdb/infcall.c:780 #16 0x563f70fe5960 in evaluate_subexp_do_call(expression*, noside, value*, gdb::array_view<value*>, char const*, type*) /home/smarchi/src/binutils-gdb/gdb/eval.c:649 #17 0x563f70fe6617 in expr::operation::evaluate_funcall(type*, expression*, noside, char const*, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/eval.c:677 #18 0x563f6fd19668 in expr::operation::evaluate_funcall(type*, expression*, noside, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/expression.h:136 #19 0x563f70fe6bba in expr::var_value_operation::evaluate_funcall(type*, expression*, noside, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/eval.c:689 #20 0x563f704b71dc in expr::funcall_operation::evaluate(type*, expression*, noside) /home/smarchi/src/binutils-gdb/gdb/expop.h:2219 #21 0x563f70fe0f02 in expression::evaluate(type*, noside) /home/smarchi/src/binutils-gdb/gdb/eval.c:110 #22 0x563f71b1373e in process_print_command_args /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1319 #23 0x563f71b1391b in print_command_1 /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1332 #24 0x563f71b147ec in print_command /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1465 #25 0x563f706029b8 in do_simple_func /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95 #26 0x563f7061972a in cmd_func(cmd_list_element*, char const*, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735 #27 0x563f7262d0ef in execute_command(char const*, int) /home/smarchi/src/binutils-gdb/gdb/top.c:572 #28 0x563f7100ed9c in command_handler(char const*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:543 #29 0x563f7101014b in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /home/smarchi/src/binutils-gdb/gdb/event-top.c:779 #30 0x563f72777942 in tui_command_line_handler /home/smarchi/src/binutils-gdb/gdb/tui/tui-interp.c:104 #31 0x563f7100d059 in gdb_rl_callback_handler /home/smarchi/src/binutils-gdb/gdb/event-top.c:250 #32 0x7f5a80418246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb) #33 0x563f7100ca06 in gdb_rl_callback_read_char_wrapper_noexcept /home/smarchi/src/binutils-gdb/gdb/event-top.c:192 #34 0x563f7100cc5e in gdb_rl_callback_read_char_wrapper /home/smarchi/src/binutils-gdb/gdb/event-top.c:225 #35 0x563f728c70db in stdin_event_handler /home/smarchi/src/binutils-gdb/gdb/ui.c:155 #36 0x563f72fccb6d in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573 #37 0x563f72fcd503 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694 #38 0x563f72fcb15c in gdb_do_one_event(int) /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:264 #39 0x563f7177ec1c in start_event_loop /home/smarchi/src/binutils-gdb/gdb/main.c:412 #40 0x563f7177f12e in captured_command_loop /home/smarchi/src/binutils-gdb/gdb/main.c:476 #41 0x563f717846e4 in captured_main /home/smarchi/src/binutils-gdb/gdb/main.c:1320 #42 0x563f71784821 in gdb_main(captured_main_args*) /home/smarchi/src/binutils-gdb/gdb/main.c:1339 #43 0x563f6fcedfbd in main /home/smarchi/src/binutils-gdb/gdb/gdb.c:32 #44 0x7f5a7e43984f (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e) #45 0x7f5a7e439909 in __libc_start_main (/usr/lib/libc.so.6+0x23909) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e) #46 0x563f6fcedd84 in _start (/home/smarchi/build/binutils-gdb/gdb/gdb+0xafb0d84) (BuildId: 50bd32e6e9d5e84543e9897b8faca34858ca3995) 0x611000034980 is located 0 bytes inside of 208-byte region [0x611000034980,0x611000034a50) freed by thread T0 here: #0 0x7f5a7fce312a in operator delete(void*, unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:164 #1 0x563f702bd1fa in momentary_breakpoint::~momentary_breakpoint() /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:304 #2 0x563f702771c5 in delete_breakpoint(breakpoint*) /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:12404 #3 0x563f702150a7 in check_longjmp_breakpoint_for_call_dummy(thread_info*) /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:7673 #4 0x563f714567b1 in process_event_stop_test /home/smarchi/src/binutils-gdb/gdb/infrun.c:6881 #5 0x563f71454e07 in handle_signal_stop /home/smarchi/src/binutils-gdb/gdb/infrun.c:6769 #6 0x563f7144b680 in handle_inferior_event /home/smarchi/src/binutils-gdb/gdb/infrun.c:6023 #7 0x563f71436165 in fetch_inferior_event() /home/smarchi/src/binutils-gdb/gdb/infrun.c:4387 #8 0x563f7136ff51 in inferior_event_handler(inferior_event_type) /home/smarchi/src/binutils-gdb/gdb/inf-loop.c:42 #9 0x563f7168038d in handle_target_event /home/smarchi/src/binutils-gdb/gdb/linux-nat.c:4219 #10 0x563f72fccb6d in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:573 #11 0x563f72fcd503 in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:694 #12 0x563f72fcaf2b in gdb_do_one_event(int) /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:217 #13 0x563f7262b9bb in wait_sync_command_done() /home/smarchi/src/binutils-gdb/gdb/top.c:426 #14 0x563f7137a7c3 in run_inferior_call /home/smarchi/src/binutils-gdb/gdb/infcall.c:650 #15 0x563f71381295 in call_function_by_hand_dummy(value*, type*, gdb::array_view<value*>, void (*)(void*, int), void*) /home/smarchi/src/binutils-gdb/gdb/infcall.c:1332 #16 0x563f7137c0e2 in call_function_by_hand(value*, type*, gdb::array_view<value*>) /home/smarchi/src/binutils-gdb/gdb/infcall.c:780 #17 0x563f70fe5960 in evaluate_subexp_do_call(expression*, noside, value*, gdb::array_view<value*>, char const*, type*) /home/smarchi/src/binutils-gdb/gdb/eval.c:649 #18 0x563f70fe6617 in expr::operation::evaluate_funcall(type*, expression*, noside, char const*, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/eval.c:677 #19 0x563f6fd19668 in expr::operation::evaluate_funcall(type*, expression*, noside, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/expression.h:136 #20 0x563f70fe6bba in expr::var_value_operation::evaluate_funcall(type*, expression*, noside, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/eval.c:689 #21 0x563f704b71dc in expr::funcall_operation::evaluate(type*, expression*, noside) /home/smarchi/src/binutils-gdb/gdb/expop.h:2219 #22 0x563f70fe0f02 in expression::evaluate(type*, noside) /home/smarchi/src/binutils-gdb/gdb/eval.c:110 #23 0x563f71b1373e in process_print_command_args /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1319 #24 0x563f71b1391b in print_command_1 /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1332 #25 0x563f71b147ec in print_command /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1465 #26 0x563f706029b8 in do_simple_func /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95 #27 0x563f7061972a in cmd_func(cmd_list_element*, char const*, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735 #28 0x563f7262d0ef in execute_command(char const*, int) /home/smarchi/src/binutils-gdb/gdb/top.c:572 #29 0x563f7100ed9c in command_handler(char const*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:543 previously allocated by thread T0 here: #0 0x7f5a7fce2012 in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:95 #1 0x563f7029a9a3 in new_momentary_breakpoint<program_space*&, frame_id&, int&> /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:8129 #2 0x563f702212f6 in momentary_breakpoint_from_master /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:8169 #3 0x563f70212db1 in set_longjmp_breakpoint_for_call_dummy() /home/smarchi/src/binutils-gdb/gdb/breakpoint.c:7582 #4 0x563f713804db in call_function_by_hand_dummy(value*, type*, gdb::array_view<value*>, void (*)(void*, int), void*) /home/smarchi/src/binutils-gdb/gdb/infcall.c:1260 #5 0x563f7137c0e2 in call_function_by_hand(value*, type*, gdb::array_view<value*>) /home/smarchi/src/binutils-gdb/gdb/infcall.c:780 #6 0x563f70fe5960 in evaluate_subexp_do_call(expression*, noside, value*, gdb::array_view<value*>, char const*, type*) /home/smarchi/src/binutils-gdb/gdb/eval.c:649 #7 0x563f70fe6617 in expr::operation::evaluate_funcall(type*, expression*, noside, char const*, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/eval.c:677 #8 0x563f6fd19668 in expr::operation::evaluate_funcall(type*, expression*, noside, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/expression.h:136 #9 0x563f70fe6bba in expr::var_value_operation::evaluate_funcall(type*, expression*, noside, std::__debug::vector<std::unique_ptr<expr::operation, std::default_delete<expr::operation> >, std::allocator<std::unique_ptr<expr::operation, std::default_delete<expr::operation> > > > const&) /home/smarchi/src/binutils-gdb/gdb/eval.c:689 #10 0x563f704b71dc in expr::funcall_operation::evaluate(type*, expression*, noside) /home/smarchi/src/binutils-gdb/gdb/expop.h:2219 #11 0x563f70fe0f02 in expression::evaluate(type*, noside) /home/smarchi/src/binutils-gdb/gdb/eval.c:110 #12 0x563f71b1373e in process_print_command_args /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1319 #13 0x563f71b1391b in print_command_1 /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1332 #14 0x563f71b147ec in print_command /home/smarchi/src/binutils-gdb/gdb/printcmd.c:1465 #15 0x563f706029b8 in do_simple_func /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:95 #16 0x563f7061972a in cmd_func(cmd_list_element*, char const*, int) /home/smarchi/src/binutils-gdb/gdb/cli/cli-decode.c:2735 #17 0x563f7262d0ef in execute_command(char const*, int) /home/smarchi/src/binutils-gdb/gdb/top.c:572 #18 0x563f7100ed9c in command_handler(char const*) /home/smarchi/src/binutils-gdb/gdb/event-top.c:543 #19 0x563f7101014b in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /home/smarchi/src/binutils-gdb/gdb/event-top.c:779 #20 0x563f72777942 in tui_command_line_handler /home/smarchi/src/binutils-gdb/gdb/tui/tui-interp.c:104 #21 0x563f7100d059 in gdb_rl_callback_handler /home/smarchi/src/binutils-gdb/gdb/event-top.c:250 #22 0x7f5a80418246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb) Change-Id: Id00c17ab677f847fbf4efdf0f4038373668d3d88 Approved-By: Tom Tromey <tom@tromey.com>

Commit b5661ff ("gdb: fix possible use-after-free when executing commands") attempted to fix possible use-after-free in case command redefines itself. Commit 37e5833 ("gdb: fix command lookup in execute_command ()") updated the previous fix to handle subcommands as well by using the original command string to lookup the command again after its execution. This fixed the test in gdb.base/define.exp but it turned out that it does not work (at least) for "target remote" and "target extended-remote". The problem is that the command buffer P passed to execute_command () gets overwritten in dont_repeat () while executing "target remote" command itself: #0 dont_repeat () at top.c:822 #1 0x000055555730982a in target_preopen (from_tty=1) at target.c:2483 #2 0x000055555711e911 in remote_target::open_1 (name=0x55555881c7fe ":1234", from_tty=1, extended_p=0) at remote.c:5946 #3 0x000055555711d577 in remote_target::open (name=0x55555881c7fe ":1234", from_tty=1) at remote.c:5272 #4 0x00005555573062f2 in open_target (args=0x55555881c7fe ":1234", from_tty=1, command=0x5555589d0490) at target.c:853 #5 0x0000555556ad22fa in cmd_func (cmd=0x5555589d0490, args=0x55555881c7fe ":1234", from_tty=1) at cli/cli-decode.c:2737 #6 0x00005555573487fd in execute_command (p=0x55555881c802 "4", from_tty=1) at top.c:688 Therefore the second call to lookup_cmd () at line 697 fails to find command because the original command string is gone. This commit addresses this particular problem by creating a *copy* of original command string for the sole purpose of using it after command execution to lookup the command again. It may not be the most efficient way but it's safer given that command buffer is shared and overwritten in hard-to-foresee situations. Tested on x86_64-linux. PR 30249 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30249 Approved-By: Tom Tromey <tom@tromey.com>

This commit adds a new format for the printf and dprintf commands: '%V'. This new format takes any GDB expression and formats it as a string, just as GDB would for a 'print' command, e.g.: (gdb) print a1 $a = {2, 4, 6, 8, 10, 12, 14, 16, 18, 20} (gdb) printf "%V\n", a1 {2, 4, 6, 8, 10, 12, 14, 16, 18, 20} (gdb) It is also possible to pass the same options to %V as you might pass to the print command, e.g.: (gdb) print -elements 3 -- a1 $4 = {2, 4, 6...} (gdb) printf "%V[-elements 3]\n", a1 {2, 4, 6...} (gdb) This new feature would effectively replace an existing feature of GDB, the $_as_string builtin convenience function. However, the $_as_string function has a few problems which this new feature solves: 1. $_as_string doesn't currently work when the inferior is not running, e.g: (gdb) printf "%s", $_as_string(a1) You can't do that without a process to debug. (gdb) The reason for this is that $_as_string returns a value object with string type. When we try to print this we call value_as_address, which ends up trying to push the string into the inferior's address space. Clearly we could solve this problem, the string data exists in GDB, so there's no reason why we have to push it into the inferior, but this is an existing problem that would need solving. 2. $_as_string suffers from the fact that C degrades arrays to pointers, e.g.: (gdb) printf "%s\n", $_as_string(a1) 0x404260 <a1> (gdb) The implementation of $_as_string is passed a gdb.Value object that is a pointer, it doesn't understand that it's actually an array. Solving this would be harder than issue #1 I think. The whole array to pointer transformation is part of our expression evaluation. And in most cases this is exactly what we want. It's not clear to me how we'd (easily) tell GDB that we didn't want this reduction in _some_ cases. But I'm sure this is solvable if we really wanted to. 3. $_as_string is a gdb.Function sub-class, and as such is passed gdb.Value objects. There's no super convenient way to pass formatting options to $_as_string. By this I mean that the new %V feature supports print formatting options. Ideally, we might want to add this feature to $_as_string, we might imagine it working something like: (gdb) printf "%s\n", $_as_string(a1, elements = 3, array_indexes = True) where the first item is the value to print, while the remaining options are the print formatting options. However, this relies on Python calling syntax, which isn't something that convenience functions handle. We could possibly rely on strictly positional arguments, like: (gdb) printf "%s\n", $_as_string(a1, 3, 1) But that's clearly terrible as there's far more print formatting options, and if you needed to set the 9th option you'd need to fill in all the previous options. And right now, the only way to pass these options to a gdb.Function is to have GDB first convert them all into gdb.Value objects, which is really overkill for what we want. The new %V format solves all these problems: the string is computed and printed entirely on the GDB side, we are able to print arrays as actual arrays rather than pointers, and we can pass named format arguments. Finally, the $_as_string is sold in the manual as allowing users to print the string representation of flag enums, so given: enum flags { FLAG_A = (1 << 0), FLAG_B = (1 << 1), FLAG_C = (1 << 1) }; enum flags ff = FLAG_B; We can: (gdb) printf "%s\n", $_as_string(ff) FLAG_B This works just fine with %V too: (gdb) printf "%V\n", ff FLAG_B So all functionality of $_as_string is replaced by %V. I'm not proposing to remove $_as_string, there might be users currently depending on it, but I am proposing that we don't push $_as_string in the documentation. As %V is a feature of printf, GDB's dprintf breakpoints naturally gain access to this feature too. dprintf breakpoints can be operated in three different styles 'gdb' (use GDB's printf), 'call' (call a function in the inferior), or 'agent' (perform the dprintf on the remote). The use of '%V' will work just fine when dprintf-style is 'gdb'. When dprintf-style is 'call' the format string and arguments are passed to an inferior function (printf by default). In this case GDB doesn't prevent use of '%V', but the documentation makes it clear that support for '%V' will depend on the inferior function being called. I chose this approach because the current implementation doesn't place any restrictions on the format string when operating in 'call' style. That is, the user might already be calling a function that supports custom print format specifiers (maybe including '%V') so, I claim, it would be wrong to block use of '%V' in this case. The documentation does make it clear that users shouldn't expect this to "just work" though. When dprintf-style is 'agent' then GDB does no support the use of '%V' (right now). This is handled at the point when GDB tries to process the format string and send the dprintf command to the remote, here's an example: Reading symbols from /tmp/hello.x... (gdb) dprintf call_me, "%V", a1 Dprintf 1 at 0x401152: file /tmp/hello.c, line 8. (gdb) set sysroot / (gdb) target remote | gdbserver --once - /tmp/hello.x Remote debugging using | gdbserver --once - /tmp/hello.x stdin/stdout redirected Process /tmp/hello.x created; pid = 3088822 Remote debugging using stdio Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007ffff7fd3110 in _start () from /lib64/ld-linux-x86-64.so.2 (gdb) set dprintf-style agent (gdb) c Continuing. Unrecognized format specifier 'V' in printf Command aborted. (gdb) This is exactly how GDB would handle any other invalid format specifier, for example: Reading symbols from /tmp/hello.x... (gdb) dprintf call_me, "%Q", a1 Dprintf 1 at 0x401152: file /tmp/hello.c, line 8. (gdb) set sysroot / (gdb) target remote | gdbserver --once - /tmp/hello.x Remote debugging using | gdbserver --once - /tmp/hello.x stdin/stdout redirected Process /tmp/hello.x created; pid = 3089193 Remote debugging using stdio Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007ffff7fd3110 in _start () from /lib64/ld-linux-x86-64.so.2 (gdb) set dprintf-style agent (gdb) c Continuing. Unrecognized format specifier 'Q' in printf Command aborted. (gdb) The error message isn't the greatest, but improving that can be put off for another day I hope. Reviewed-By: Eli Zaretskii <eliz@gnu.org> Acked-By: Simon Marchi <simon.marchi@efficios.com>

After this commit: commit baab375 Date: Tue Jul 13 14:44:27 2021 -0400 gdb: building inferior strings from within GDB It was pointed out that a new ASan failure had been introduced which was triggered by gdb.base/internal-string-values.exp: (gdb) PASS: gdb.base/internal-string-values.exp: test_setting: all langs: lang=ada: ptype "foo" print $_gdb_maint_setting("test-settings string") ================================================================= ==80377==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000068034 at pc 0x564785cba682 bp 0x7ffd20644620 sp 0x7ffd20644610 READ of size 1 at 0x603000068034 thread T0 #0 0x564785cba681 in find_command_name_length(char const*) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2129 #1 0x564785cbacb2 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2186 #2 0x564785cbb539 in lookup_cmd_1(char const**, cmd_list_element*, cmd_list_element**, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, bool) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2248 #3 0x564785cbbcf3 in lookup_cmd(char const**, cmd_list_element*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, int, int) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2339 #4 0x564785c82df2 in setting_cmd /tmp/src/binutils-gdb/gdb/cli/cli-cmds.c:2219 #5 0x564785c84274 in gdb_maint_setting_internal_fn /tmp/src/binutils-gdb/gdb/cli/cli-cmds.c:2348 #6 0x564788167b3b in call_internal_function(gdbarch*, language_defn const*, value*, int, value**) /tmp/src/binutils-gdb/gdb/value.c:2321 #7 0x5647854b6ebd in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /tmp/src/binutils-gdb/gdb/ada-lang.c:11254 #8 0x564786658266 in expression::evaluate(type*, noside) /tmp/src/binutils-gdb/gdb/eval.c:111 #9 0x5647871242d6 in process_print_command_args /tmp/src/binutils-gdb/gdb/printcmd.c:1322 #10 0x5647871244b3 in print_command_1 /tmp/src/binutils-gdb/gdb/printcmd.c:1335 #11 0x564787125384 in print_command /tmp/src/binutils-gdb/gdb/printcmd.c:1468 #12 0x564785caac44 in do_simple_func /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:95 #13 0x564785cc18f0 in cmd_func(cmd_list_element*, char const*, int) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2735 #14 0x564787c70c68 in execute_command(char const*, int) /tmp/src/binutils-gdb/gdb/top.c:574 #15 0x564786686180 in command_handler(char const*) /tmp/src/binutils-gdb/gdb/event-top.c:543 #16 0x56478668752f in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /tmp/src/binutils-gdb/gdb/event-top.c:779 #17 0x564787dcb29a in tui_command_line_handler /tmp/src/binutils-gdb/gdb/tui/tui-interp.c:104 #18 0x56478668443d in gdb_rl_callback_handler /tmp/src/binutils-gdb/gdb/event-top.c:250 #19 0x7f4efd506246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb) #20 0x564786683dea in gdb_rl_callback_read_char_wrapper_noexcept /tmp/src/binutils-gdb/gdb/event-top.c:192 #21 0x564786684042 in gdb_rl_callback_read_char_wrapper /tmp/src/binutils-gdb/gdb/event-top.c:225 #22 0x564787f1b119 in stdin_event_handler /tmp/src/binutils-gdb/gdb/ui.c:155 #23 0x56478862438d in handle_file_event /tmp/src/binutils-gdb/gdbsupport/event-loop.cc:573 #24 0x564788624d23 in gdb_wait_for_event /tmp/src/binutils-gdb/gdbsupport/event-loop.cc:694 #25 0x56478862297c in gdb_do_one_event(int) /tmp/src/binutils-gdb/gdbsupport/event-loop.cc:264 #26 0x564786df99f0 in start_event_loop /tmp/src/binutils-gdb/gdb/main.c:412 #27 0x564786dfa069 in captured_command_loop /tmp/src/binutils-gdb/gdb/main.c:476 #28 0x564786dff61f in captured_main /tmp/src/binutils-gdb/gdb/main.c:1320 #29 0x564786dff75c in gdb_main(captured_main_args*) /tmp/src/binutils-gdb/gdb/main.c:1339 #30 0x564785381b6d in main /tmp/src/binutils-gdb/gdb/gdb.c:32 #31 0x7f4efbc3984f (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e) #32 0x7f4efbc39909 in __libc_start_main (/usr/lib/libc.so.6+0x23909) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e) #33 0x564785381934 in _start (/tmp/build/binutils-gdb/gdb/gdb+0xabc5934) (BuildId: 90de353ac158646e7dab501b76a18a76628fca33) 0x603000068034 is located 0 bytes after 20-byte region [0x603000068020,0x603000068034) allocated by thread T0 here: #0 0x7f4efcee0cd1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x5647856265d8 in xcalloc /tmp/src/binutils-gdb/gdb/alloc.c:97 #2 0x564788610c6b in xzalloc(unsigned long) /tmp/src/binutils-gdb/gdbsupport/common-utils.cc:29 #3 0x56478815721a in value::allocate_contents(bool) /tmp/src/binutils-gdb/gdb/value.c:929 #4 0x564788157285 in value::allocate(type*, bool) /tmp/src/binutils-gdb/gdb/value.c:941 #5 0x56478815733a in value::allocate(type*) /tmp/src/binutils-gdb/gdb/value.c:951 #6 0x5647854ae81c in expr::ada_string_operation::evaluate(type*, expression*, noside) /tmp/src/binutils-gdb/gdb/ada-lang.c:10675 #7 0x5647854b63b8 in expr::ada_funcall_operation::evaluate(type*, expression*, noside) /tmp/src/binutils-gdb/gdb/ada-lang.c:11184 #8 0x564786658266 in expression::evaluate(type*, noside) /tmp/src/binutils-gdb/gdb/eval.c:111 #9 0x5647871242d6 in process_print_command_args /tmp/src/binutils-gdb/gdb/printcmd.c:1322 #10 0x5647871244b3 in print_command_1 /tmp/src/binutils-gdb/gdb/printcmd.c:1335 #11 0x564787125384 in print_command /tmp/src/binutils-gdb/gdb/printcmd.c:1468 #12 0x564785caac44 in do_simple_func /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:95 #13 0x564785cc18f0 in cmd_func(cmd_list_element*, char const*, int) /tmp/src/binutils-gdb/gdb/cli/cli-decode.c:2735 #14 0x564787c70c68 in execute_command(char const*, int) /tmp/src/binutils-gdb/gdb/top.c:574 #15 0x564786686180 in command_handler(char const*) /tmp/src/binutils-gdb/gdb/event-top.c:543 #16 0x56478668752f in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) /tmp/src/binutils-gdb/gdb/event-top.c:779 #17 0x564787dcb29a in tui_command_line_handler /tmp/src/binutils-gdb/gdb/tui/tui-interp.c:104 #18 0x56478668443d in gdb_rl_callback_handler /tmp/src/binutils-gdb/gdb/event-top.c:250 #19 0x7f4efd506246 in rl_callback_read_char (/usr/lib/libreadline.so.8+0x3b246) (BuildId: 092e91fc4361b0ef94561e3ae03a75f69398acbb) The problem is in cli/cli-cmds.c, in the function setting_cmd, where we do this: const char *a0 = (const char *) argv[0]->contents ().data (); Here argv[0] is a value* which we know is either a TYPE_CODE_ARRAY or a TYPE_CODE_STRING. The problem is that the above line is casting the value contents directly to a C-string, i.e. one that is assumed to have a null-terminator at the end. After the above commit this can no longer be assumed to be true. A string value will be represented just as it would be in the current language, so for Ada and Fortran the string will be an array of characters with no null-terminator at the end. My proposed solution is to copy the string contents into a std::string object, and then use the std::string::c_str() value, this will ensure that a null-terminator has been added. I had a check through GDB at places TYPE_CODE_STRING was used and couldn't see any other obvious places where this type of assumption was being made, so hopefully this is the only offender. Running the above test with ASan compiled in no longer gives an error. Reviewed-By: Tom Tromey <tom@tromey.com>

This commit fixes a bug introduced by this commit: commit d8bbae6 Date: Fri Jan 14 15:40:59 2022 -0500 gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) The problem can be seen in this GDB session: $ gdb -q (gdb) set non-stop on (gdb) file ./gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork Reading symbols from ./gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork... (gdb) tcatch vfork Catchpoint 1 (vfork) (gdb) run Starting program: /tmp/gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork Temporary catchpoint 1 (vforked process 1375914), 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6 #1 0x00000000004011af in main (argc=1, argv=0x7fffffffad88) at .../gdb/testsuite/gdb.base/foll-vfork.c:32 (gdb) finish Run till exit from #0 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6 [Detaching after vfork from child process 1375914] No unwaited-for children left. (gdb) Notice the "No unwaited-for children left." error. This is incorrect, given where we are stopped there's no reason why we shouldn't be able to use "finish" to return to the main frame. When the inferior is stopped as a result of the 'tcatch vfork', the inferior is in the process of performing the vfork, that is, GDB has seen the VFORKED event, but has not yet attached to the new child process, nor has the child process been resumed. However, GDB has seen the VFORKED, and, as we are going to follow the parent process, the inferior for the vfork parent will have its thread_waiting_for_vfork_done member variable set, this will point to the one and only thread that makes up the vfork parent process. When the "finish" command is used GDB eventually ends up in the proceed function (in infrun.c), in here we pass through all the function until we eventually encounter this 'else if' condition: else if (!cur_thr->resumed () && !thread_is_in_step_over_chain (cur_thr) /* In non-stop, forbid resuming a thread if some other thread of that inferior is waiting for a vfork-done event (this means breakpoints are out for this inferior). */ && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done != nullptr)) { The first two of these conditions will both be true, the thread is not already resumed, and is not in the step-over chain, however, the third condition, this one: && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done != nullptr)) is false, and this prevents the thread we are trying to finish from being resumed. This condition is false because (a) non_stop is true, and (b) cur_thr->inf->thread_waiting_for_vfork_done is not nullptr (see above for why). Now, if we check the comment embedded within the condition it says: /* In non-stop, forbid resuming a thread if some other thread of that inferior is waiting for a vfork-done event (this means breakpoints are out for this inferior). */ And this makes sense, if we have a vfork parent with two thread, and one thread has performed a vfork, then we shouldn't try to resume the second thread. However, if we are trying to resume the thread that actually performed a vfork, then this is fine. If we never resume the vfork parent then we'll never get a VFORK_DONE event, and so the vfork will never complete. Thus, the condition should actually be: && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done != nullptr && cur_thr->inf->thread_waiting_for_vfork_done != cur_thr)) This extra check will allow the vfork parent thread to resume, but prevent any other thread in the vfork parent process from resuming. This is the same condition that already exists in the all-stop on a non-stop-target block earlier in the proceed function. My actual fix is slightly different to the above, first, I've chosen to use a nested 'if' check instead of extending the original 'else if' check, this makes it easier to write a longer comment explaining what's going on, and second, instead of checking 'non_stop' I've switched to checking 'target_is_non_stop_p'. In this context this is effectively the same thing, a previous 'else if' block in proceed already handles '!non_stop && target_is_non_stop_p ()', so by the time we get here, if 'target_is_non_stop_p ()' then we must be running in non_stop mode. Both of these tweaks will make the next patch easier, which is a refactor to merge two parts of the proceed function, so this nested 'if' block is not going to exist for long. For testing, there is no test included with this commit. The test was exposed when using a modified version of the gdb.base/foll-vfork.exp test script, however, there are other bugs that are exposed when using the modified test script. These bugs will be addressed in subsequent commits, and then I'll add the updated gdb.base/foll-vfork.exp. If you wish to reproduce this failure then grab the updates to gdb.base/foll-vfork.exp from the later commit and run this test, the failure is always reproducible.

Expect a `.MIPS.options' section alternatively to `.reginfo' and ignore contents of either as irrelevant for all the affected compact EH tests, removing these regressions: mips64-openbsd -FAIL: Compact EH EB #1 with personality ID and FDE data mips64-openbsd -FAIL: Compact EH EB #2 with personality routine and FDE data mips64-openbsd -FAIL: Compact EH EB #3 with personality id and large FDE data mips64-openbsd -FAIL: Compact EH EB #4 with personality id, FDE data and LSDA mips64-openbsd -FAIL: Compact EH EB #5 with personality routine, FDE data and LSDA mips64-openbsd -FAIL: Compact EH EB #6 with personality id, LSDA and large FDE data mips64-openbsd -FAIL: Compact EH EL #1 with personality ID and FDE data mips64-openbsd -FAIL: Compact EH EL #2 with personality routine and FDE data mips64-openbsd -FAIL: Compact EH EL #3 with personality id and large FDE data mips64-openbsd -FAIL: Compact EH EL #4 with personality id, FDE data and LSDA mips64-openbsd -FAIL: Compact EH EL #5 with personality routine, FDE data and LSDA mips64-openbsd -FAIL: Compact EH EL #6 with personality id, LSDA and large FDE data mips64el-openbsd -FAIL: Compact EH EB #1 with personality ID and FDE data mips64el-openbsd -FAIL: Compact EH EB #2 with personality routine and FDE data mips64el-openbsd -FAIL: Compact EH EB #3 with personality id and large FDE data mips64el-openbsd -FAIL: Compact EH EB #4 with personality id, FDE data and LSDA mips64el-openbsd -FAIL: Compact EH EB #5 with personality routine, FDE data and LSDA mips64el-openbsd -FAIL: Compact EH EB #6 with personality id, LSDA and large FDE data mips64el-openbsd -FAIL: Compact EH EL #1 with personality ID and FDE data mips64el-openbsd -FAIL: Compact EH EL #2 with personality routine and FDE data mips64el-openbsd -FAIL: Compact EH EL #3 with personality id and large FDE data mips64el-openbsd -FAIL: Compact EH EL #4 with personality id, FDE data and LSDA mips64el-openbsd -FAIL: Compact EH EL #5 with personality routine, FDE data and LSDA mips64el-openbsd -FAIL: Compact EH EL #6 with personality id, LSDA and large FDE data Co-Authored-By: Maciej W. Rozycki <macro@orcam.me.uk> gas/ * testsuite/gas/mips/compact-eh-eb-1.d: Accept `.MIPS.options' section as an alternative to `.reginfo' and ignore contents of either. * testsuite/gas/mips/compact-eh-eb-2.d: Likewise. * testsuite/gas/mips/compact-eh-eb-3.d: Likewise. * testsuite/gas/mips/compact-eh-eb-4.d: Likewise. * testsuite/gas/mips/compact-eh-eb-5.d: Likewise. * testsuite/gas/mips/compact-eh-eb-6.d: Likewise. * testsuite/gas/mips/compact-eh-el-1.d: Likewise. * testsuite/gas/mips/compact-eh-el-2.d: Likewise. * testsuite/gas/mips/compact-eh-el-3.d: Likewise. * testsuite/gas/mips/compact-eh-el-4.d: Likewise. * testsuite/gas/mips/compact-eh-el-5.d: Likewise. * testsuite/gas/mips/compact-eh-el-6.d: Likewise.

With gdb build with -fsanitize=thread and test-case gdb.base/index-cache.exp I run into: ... (gdb) file build/gdb/testsuite/outputs/gdb.base/index-cache/index-cache Reading symbols from build/gdb/testsuite/outputs/gdb.base/index-cache/index-cache... (gdb) show index-cache enabled The index cache is off. (gdb) PASS: gdb.base/index-cache.exp: test_basic_stuff: index-cache is disabled by default set index-cache enabled on ================== WARNING: ThreadSanitizer: data race (pid=32248) Write of size 1 at 0x00000321f540 by main thread: #0 index_cache::enable() gdb/dwarf2/index-cache.c:76 (gdb+0x82cfdd) #1 set_index_cache_enabled_command gdb/dwarf2/index-cache.c:270 (gdb+0x82d9af) #2 bool setting::set<bool>(bool const&) gdb/command.h:353 (gdb+0x6fe5f2) #3 do_set_command(char const*, int, cmd_list_element*) gdb/cli/cli-setshow.c:414 (gdb+0x6fcd21) #4 execute_command(char const*, int) gdb/top.c:567 (gdb+0xff2e64) #5 command_handler(char const*) gdb/event-top.c:552 (gdb+0x94acc0) #6 command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:788 (gdb+0x94b37d) #7 tui_command_line_handler gdb/tui/tui-interp.c:104 (gdb+0x103467e) #8 gdb_rl_callback_handler gdb/event-top.c:259 (gdb+0x94a265) #9 rl_callback_read_char readline/readline/callback.c:290 (gdb+0x11bdd3f) #10 gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 (gdb+0x94a064) #11 gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 (gdb+0x94a125) #12 stdin_event_handler gdb/ui.c:155 (gdb+0x1074922) #13 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1d94de4) #14 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1d9551c) #15 gdb_do_one_event(int) gdbsupport/event-loop.cc:264 (gdb+0x1d93908) #16 start_event_loop gdb/main.c:412 (gdb+0xb5a256) #17 captured_command_loop gdb/main.c:476 (gdb+0xb5a445) #18 captured_main gdb/main.c:1320 (gdb+0xb5c5c5) #19 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xb5c674) #20 main gdb/gdb.c:32 (gdb+0x416776) Previous read of size 1 at 0x00000321f540 by thread T12: #0 index_cache::enabled() const gdb/dwarf2/index-cache.h:48 (gdb+0x82e1a6) #1 index_cache::store(dwarf2_per_bfd*) gdb/dwarf2/index-cache.c:94 (gdb+0x82d0bc) #2 cooked_index::maybe_write_index(dwarf2_per_bfd*) gdb/dwarf2/cooked-index.c:638 (gdb+0x7f1b97) #3 operator() gdb/dwarf2/cooked-index.c:468 (gdb+0x7f0f24) #4 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x7f285b) #5 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x700952) #6 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x7381a0) #7 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x737e91) #8 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x737b59) #9 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x738660) #10 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x73825c) #11 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x733623) #12 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x732bdf) #13 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x734c4f) #14 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x733bc5) #15 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x73300d) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x7330b2) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x7330f2) #18 pthread_once <null> (libtsan.so.0+0x4457c) #19 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72f5dd) #20 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x733224) #21 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x732852) #22 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x737bef) #23 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x1dac492) #24 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x1dabdb4) #25 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x1dace63) #26 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x1dac294) #27 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x1daf5c6) #28 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x1daf551) #29 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x1daf506) #30 <null> <null> (libstdc++.so.6+0xdcac2) Location is global 'global_index_cache' of size 48 at 0x00000321f520 (gdb+0x00000321f540) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/index-cache.c:76 in index_cache::enable() ... The race happens when issuing a "file $exec" command followed by a "set index-cache enabled on" command. The race is between: - a worker thread reading index_cache::m_enabled to determine whether an index-cache entry for $exec needs to be written (due to command "file $exec"), and - the main thread setting index_cache::m_enabled (due to command "set index-cache enabled on"). Fix this by capturing the value of index_cache::m_enabled in the main thread, and using the captured value in the worker thread. Tested on x86_64-linux. PR symtab/30392 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30392

…s_debug_type} With gdb build with -fsanitize=thread and test-case gdb.base/index-cache.exp and target board debug-types, I run into: ... (gdb) file build/gdb/testsuite/outputs/gdb.base/index-cache/index-cache Reading symbols from build/gdb/testsuite/outputs/gdb.base/index-cache/index-cache... ================== WARNING: ThreadSanitizer: data race (pid=9654) Write of size 1 at 0x7b200000420d by main thread: #0 dwarf2_per_cu_data::get_header() const gdb/dwarf2/read.c:21513 (gdb+0x8d1eee) #1 dwarf2_per_cu_data::addr_size() const gdb/dwarf2/read.c:21524 (gdb+0x8d1f4e) #2 dwarf2_cu::addr_type() const gdb/dwarf2/cu.c:112 (gdb+0x806327) #3 set_die_type gdb/dwarf2/read.c:21932 (gdb+0x8d3870) #4 read_base_type gdb/dwarf2/read.c:15448 (gdb+0x8bcacb) #5 read_type_die_1 gdb/dwarf2/read.c:19832 (gdb+0x8cc0a5) #6 read_type_die gdb/dwarf2/read.c:19767 (gdb+0x8cbe6d) #7 lookup_die_type gdb/dwarf2/read.c:19739 (gdb+0x8cbdc7) #8 die_type gdb/dwarf2/read.c:19593 (gdb+0x8cb68a) #9 read_subroutine_type gdb/dwarf2/read.c:14648 (gdb+0x8b998e) #10 read_type_die_1 gdb/dwarf2/read.c:19792 (gdb+0x8cbf2f) #11 read_type_die gdb/dwarf2/read.c:19767 (gdb+0x8cbe6d) #12 read_func_scope gdb/dwarf2/read.c:10154 (gdb+0x8a4f36) #13 process_die gdb/dwarf2/read.c:6667 (gdb+0x898daa) #14 read_file_scope gdb/dwarf2/read.c:7682 (gdb+0x89bad8) #15 process_die gdb/dwarf2/read.c:6654 (gdb+0x898ced) #16 process_full_comp_unit gdb/dwarf2/read.c:6418 (gdb+0x8981de) #17 process_queue gdb/dwarf2/read.c:5690 (gdb+0x894433) #18 dw2_do_instantiate_symtab gdb/dwarf2/read.c:1770 (gdb+0x88623a) #19 dw2_instantiate_symtab gdb/dwarf2/read.c:1792 (gdb+0x886300) #20 dw2_expand_symtabs_matching_one(dwarf2_per_cu_data*, dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>, gdb::function_view<bool (compunit_symtab*)>) gdb/dwarf2/read.c:3042 (gdb+0x88b1f1) #21 cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum, search_domain) gdb/dwarf2/read.c:16917 (gdb+0x8c228e) #22 objfile::lookup_symbol(block_enum, char const*, domain_enum) gdb/symfile-debug.c:288 (gdb+0xf39055) #23 lookup_symbol_via_quick_fns gdb/symtab.c:2385 (gdb+0xf66ab7) #24 lookup_symbol_in_objfile gdb/symtab.c:2516 (gdb+0xf6711b) #25 operator() gdb/symtab.c:2562 (gdb+0xf67272) #26 operator() gdb/../gdbsupport/function-view.h:305 (gdb+0xf776b1) #27 _FUN gdb/../gdbsupport/function-view.h:299 (gdb+0xf77708) #28 gdb::function_view<bool (objfile*)>::operator()(objfile*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xc3fc97) #29 svr4_iterate_over_objfiles_in_search_order gdb/solib-svr4.c:3455 (gdb+0xecae47) #30 gdbarch_iterate_over_objfiles_in_search_order(gdbarch*, gdb::function_view<bool (objfile*)>, objfile*) gdb/gdbarch.c:5041 (gdb+0x537cad) #31 lookup_global_or_static_symbol gdb/symtab.c:2559 (gdb+0xf674fb) #32 lookup_global_symbol(char const*, block const*, domain_enum) gdb/symtab.c:2615 (gdb+0xf67780) #33 language_defn::lookup_symbol_nonlocal(char const*, block const*, domain_enum) const gdb/symtab.c:2447 (gdb+0xf66d6e) #34 lookup_symbol_aux gdb/symtab.c:2123 (gdb+0xf65cb3) #35 lookup_symbol_in_language(char const*, block const*, domain_enum, language, field_of_this_result*) gdb/symtab.c:1931 (gdb+0xf64dab) #36 set_initial_language() gdb/symfile.c:1708 (gdb+0xf43074) #37 symbol_file_add_main_1 gdb/symfile.c:1212 (gdb+0xf41608) #38 symbol_file_command(char const*, int) gdb/symfile.c:1681 (gdb+0xf42faf) #39 file_command gdb/exec.c:554 (gdb+0x94ff29) #40 do_simple_func gdb/cli/cli-decode.c:95 (gdb+0x6d9528) #41 cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 (gdb+0x6e0f69) #42 execute_command(char const*, int) gdb/top.c:575 (gdb+0xff379c) #43 command_handler(char const*) gdb/event-top.c:552 (gdb+0x94b5bc) #44 command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:788 (gdb+0x94bc79) #45 tui_command_line_handler gdb/tui/tui-interp.c:104 (gdb+0x1034efc) #46 gdb_rl_callback_handler gdb/event-top.c:259 (gdb+0x94ab61) #47 rl_callback_read_char readline/readline/callback.c:290 (gdb+0x11be4ef) #48 gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 (gdb+0x94a960) #49 gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 (gdb+0x94aa21) #50 stdin_event_handler gdb/ui.c:155 (gdb+0x10751a0) #51 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1d95bac) #52 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1d962e4) #53 gdb_do_one_event(int) gdbsupport/event-loop.cc:264 (gdb+0x1d946d0) #54 start_event_loop gdb/main.c:412 (gdb+0xb5ab52) #55 captured_command_loop gdb/main.c:476 (gdb+0xb5ad41) #56 captured_main gdb/main.c:1320 (gdb+0xb5cec1) #57 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xb5cf70) #58 main gdb/gdb.c:32 (gdb+0x416776) Previous read of size 1 at 0x7b200000420d by thread T11: #0 write_gdbindex gdb/dwarf2/index-write.c:1229 (gdb+0x831630) #1 write_dwarf_index(dwarf2_per_bfd*, char const*, char const*, char const*, dw_index_kind) gdb/dwarf2/index-write.c:1484 (gdb+0x832897) #2 index_cache::store(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/index-cache.c:173 (gdb+0x82db8d) #3 cooked_index::maybe_write_index(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/cooked-index.c:645 (gdb+0x7f1d49) #4 operator() gdb/dwarf2/cooked-index.c:474 (gdb+0x7f0f31) #5 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x7f2a13) #6 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x700952) #7 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x7381a0) #8 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x737e91) #9 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x737b59) #10 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x738660) #11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x73825c) #12 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x733623) #13 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x732bdf) #14 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x734c4f) #15 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x733bc5) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x73300d) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x7330b2) #18 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x7330f2) #19 pthread_once <null> (libtsan.so.0+0x4457c) #20 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72f5dd) #21 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x733224) #22 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x732852) #23 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x737bef) #24 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x1dad25a) #25 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x1dacb7c) #26 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x1dadc2b) #27 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x1dad05c) #28 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x1db038e) #29 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x1db0319) #30 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x1db02ce) #31 <null> <null> (libstdc++.so.6+0xdcac2) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:21513 in dwarf2_per_cu_data::get_header() const ... The race happens when issuing the "file $exec" command. The race is between: - a worker thread writing the index cache, and in the process reading dwarf2_per_cu_data::is_debug_type, and - the main thread writing to dwarf2_per_cu_data::m_header_read_in. The two bitfields dwarf2_per_cu_data::m_header_read_in and dwarf2_per_cu_data::is_debug_type share the same bitfield container. Fix this by making dwarf2_per_cu_data::m_header_read_in a packed<bool, 1>. Tested on x86_64-linux. Approved-By: Tom Tromey <tom@tromey.com> PR symtab/30392 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30392

With gdb build with -fsanitize=thread, and the exec from test-case gdb.base/index-cache.exp, I run into: ... $ rm -f ~/.cache/gdb/*; \ gdb -q -batch -iex "set index-cache enabled on" index-cache \ -ex "print foobar" ... WARNING: ThreadSanitizer: data race (pid=23970) Write of size 1 at 0x7b200000410d by main thread: #0 dw_expand_symtabs_matching_file_matcher(dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>) gdb/dwarf2/read.c:3077 (gdb+0x7ac54e) #1 cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum, search_domain) gdb/dwarf2/read.c:16812 (gdb+0x7d039f) #2 objfile::map_symtabs_matching_filename(char const*, char const*, gdb::function_view<bool (symtab*)>) gdb/symfile-debug.c:219 (gdb+0xda5aee) #3 iterate_over_symtabs(char const*, gdb::function_view<bool (symtab*)>) gdb/symtab.c:648 (gdb+0xdc439d) #4 lookup_symtab(char const*) gdb/symtab.c:662 (gdb+0xdc44a2) #5 classify_name gdb/c-exp.y:3083 (gdb+0x61afec) #6 c_yylex gdb/c-exp.y:3251 (gdb+0x61dd13) #7 c_yyparse() build/gdb/c-exp.c.tmp:1988 (gdb+0x61f07e) #8 c_parse(parser_state*) gdb/c-exp.y:3417 (gdb+0x62d864) #9 language_defn::parser(parser_state*) const gdb/language.c:598 (gdb+0x9771c5) #10 parse_exp_in_context gdb/parse.c:414 (gdb+0xb10a9b) #11 parse_expression(char const*, innermost_block_tracker*, enum_flags<parser_flag>) gdb/parse.c:462 (gdb+0xb110ae) #12 process_print_command_args gdb/printcmd.c:1321 (gdb+0xb4bf0c) #13 print_command_1 gdb/printcmd.c:1335 (gdb+0xb4ca2a) #14 print_command gdb/printcmd.c:1468 (gdb+0xb4cd5a) #15 do_simple_func gdb/cli/cli-decode.c:95 (gdb+0x65b078) #16 cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 (gdb+0x65ed53) #17 execute_command(char const*, int) gdb/top.c:575 (gdb+0xe3a76a) #18 catch_command_errors gdb/main.c:518 (gdb+0xa1837d) #19 execute_cmdargs gdb/main.c:617 (gdb+0xa1853f) #20 captured_main_1 gdb/main.c:1289 (gdb+0xa1aa58) #21 captured_main gdb/main.c:1310 (gdb+0xa1b95a) #22 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xa1b95a) #23 main gdb/gdb.c:39 (gdb+0x42506a) Previous read of size 1 at 0x7b200000410d by thread T1: #0 write_gdbindex gdb/dwarf2/index-write.c:1214 (gdb+0x75bb30) #1 write_dwarf_index(dwarf2_per_bfd*, char const*, char const*, char const*, dw_index_kind) gdb/dwarf2/index-write.c:1469 (gdb+0x75f803) #2 index_cache::store(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/index-cache.c:173 (gdb+0x755a36) #3 cooked_index::maybe_write_index(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/cooked-index.c:642 (gdb+0x71c96d) #4 operator() gdb/dwarf2/cooked-index.c:471 (gdb+0x71c96d) #5 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x71c96d) #6 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x72a57c) #7 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x72a5db) #8 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72a5db) #9 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x72a5db) #10 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x72a5db) #11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x72a5db) #12 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x724954) #13 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x724954) #14 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x72434a) #15 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72434a) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x72434a) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x72434a) #18 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x72434a) #19 pthread_once <null> (libtsan.so.0+0x4457c) #20 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72532b) #21 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x72532b) #22 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x174568d) #23 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x174568d) #24 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x174568d) #25 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x174568d) #26 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x1748040) #27 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x1748040) #28 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x1748040) #29 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x1748040) #30 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x1748040) #31 <null> <null> (libstdc++.so.6+0xdcac2) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:3077 in dw_expand_symtabs_matching_file_matcher(dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>) ... The race happens when issuing the "file $exec" command. The race is between: - a worker thread writing the index cache, and in the process reading dwarf2_per_cu_data::is_debug_type, and - the main thread writing to dwarf2_per_cu_data::mark. The two bitfields dwarf2_per_cu_data::mark and dwarf2_per_cu_data::is_debug_type share the same bitfield container. Fix this by making dwarf2_per_cu_data::mark a packed<unsigned int, 1>. Tested on x86_64-linux. PR symtab/30718 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30718

…g_types} With gdb build with -fsanitize=thread, and the exec from test-case gdb.base/index-cache.exp, I run into: ... $ rm -f ~/.cache/gdb/*; \ gdb -q -batch -iex "set index-cache enabled on" index-cache \ -ex "print foobar" ... WARNING: ThreadSanitizer: data race (pid=25018) Write of size 1 at 0x7b200000410d by main thread: #0 dw2_get_file_names_reader gdb/dwarf2/read.c:2033 (gdb+0x7ab023) #1 dw2_get_file_names gdb/dwarf2/read.c:2130 (gdb+0x7ab023) #2 dw_expand_symtabs_matching_file_matcher(dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>) gdb/dwarf2/read.c:3105 (gdb+0x7ac6e9) #3 cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum, search_domain) gdb/dwarf2/read.c:16812 (gdb+0x7d040f) #4 objfile::map_symtabs_matching_filename(char const*, char const*, gdb::function_view<bool (symtab*)>) gdb/symfile-debug.c:219 (gdb+0xda5b6e) #5 iterate_over_symtabs(char const*, gdb::function_view<bool (symtab*)>) gdb/symtab.c:648 (gdb+0xdc441d) #6 lookup_symtab(char const*) gdb/symtab.c:662 (gdb+0xdc4522) #7 classify_name gdb/c-exp.y:3083 (gdb+0x61afec) #8 c_yylex gdb/c-exp.y:3251 (gdb+0x61dd13) #9 c_yyparse() build/gdb/c-exp.c.tmp:1988 (gdb+0x61f07e) #10 c_parse(parser_state*) gdb/c-exp.y:3417 (gdb+0x62d864) #11 language_defn::parser(parser_state*) const gdb/language.c:598 (gdb+0x977245) #12 parse_exp_in_context gdb/parse.c:414 (gdb+0xb10b1b) #13 parse_expression(char const*, innermost_block_tracker*, enum_flags<parser_flag>) gdb/parse.c:462 (gdb+0xb1112e) #14 process_print_command_args gdb/printcmd.c:1321 (gdb+0xb4bf8c) #15 print_command_1 gdb/printcmd.c:1335 (gdb+0xb4caaa) #16 print_command gdb/printcmd.c:1468 (gdb+0xb4cdda) #17 do_simple_func gdb/cli/cli-decode.c:95 (gdb+0x65b078) #18 cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 (gdb+0x65ed53) #19 execute_command(char const*, int) gdb/top.c:575 (gdb+0xe3a7ea) #20 catch_command_errors gdb/main.c:518 (gdb+0xa183fd) #21 execute_cmdargs gdb/main.c:617 (gdb+0xa185bf) #22 captured_main_1 gdb/main.c:1289 (gdb+0xa1aad8) #23 captured_main gdb/main.c:1310 (gdb+0xa1b9da) #24 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xa1b9da) #25 main gdb/gdb.c:39 (gdb+0x42506a) Previous read of size 1 at 0x7b200000410d by thread T2: #0 write_gdbindex gdb/dwarf2/index-write.c:1214 (gdb+0x75bb30) #1 write_dwarf_index(dwarf2_per_bfd*, char const*, char const*, char const*, dw_index_kind) gdb/dwarf2/index-write.c:1469 (gdb+0x75f803) #2 index_cache::store(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/index-cache.c:173 (gdb+0x755a36) #3 cooked_index::maybe_write_index(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/cooked-index.c:642 (gdb+0x71c96d) #4 operator() gdb/dwarf2/cooked-index.c:471 (gdb+0x71c96d) #5 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x71c96d) #6 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x72a57c) #7 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x72a5db) #8 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72a5db) #9 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x72a5db) #10 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x72a5db) #11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x72a5db) #12 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x724954) #13 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x724954) #14 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x72434a) #15 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72434a) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x72434a) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x72434a) #18 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x72434a) #19 pthread_once <null> (libtsan.so.0+0x4457c) #20 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72532b) #21 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x72532b) #22 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x174570d) #23 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x174570d) #24 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x174570d) #25 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x174570d) #26 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x17480c0) #27 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x17480c0) #28 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x17480c0) #29 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x17480c0) #30 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x17480c0) #31 <null> <null> (libstdc++.so.6+0xdcac2) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:2033 in dw2_get_file_names_reader ... The race happens when issuing the "file $exec" command. The race is between: - a worker thread writing the index cache, and in the process reading dwarf2_per_cu_data::is_debug_type, and - the main thread writing to dwarf2_per_cu_data::files_read. The two bitfields dwarf2_per_cu_data::files_read and dwarf2_per_cu_data::is_debug_type share the same bitfield container. Fix this by making dwarf2_per_cu_data::files_read a packed<bool, 1>. Tested on x86_64-linux. PR symtab/30718 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30718

This commit fixes an issue that was discovered while writing the tests for the previous commit. I noticed that, when GDB restarts an inferior, the executable_changed event would trigger twice. The first notification would originate from: #0 exec_file_attach (filename=0x4046680 "/tmp/hello.x", from_tty=0) at ../../src/gdb/exec.c:513 #1 0x00000000006f3adb in reopen_exec_file () at ../../src/gdb/corefile.c:122 openhwgroup#2 0x0000000000e6a3f2 in generic_mourn_inferior () at ../../src/gdb/target.c:3682 openhwgroup#3 0x0000000000995121 in inf_child_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/inf-child.c:192 openhwgroup#4 0x0000000000995cff in inf_ptrace_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/inf-ptrace.c:125 openhwgroup#5 0x0000000000a32472 in linux_nat_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3609 openhwgroup#6 0x0000000000e68a40 in target_mourn_inferior (ptid=...) at ../../src/gdb/target.c:2761 openhwgroup#7 0x0000000000a323ec in linux_nat_target::kill (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3593 openhwgroup#8 0x0000000000e64d1c in target_kill () at ../../src/gdb/target.c:924 openhwgroup#9 0x00000000009a19bc in kill_if_already_running (from_tty=1) at ../../src/gdb/infcmd.c:328 openhwgroup#10 0x00000000009a1a6f in run_command_1 (args=0x0, from_tty=1, run_how=RUN_STOP_AT_MAIN) at ../../src/gdb/infcmd.c:381 openhwgroup#11 0x00000000009a20a5 in start_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:527 openhwgroup#12 0x000000000068dc5d in do_simple_func (args=0x0, from_tty=1, c=0x35c7200) at ../../src/gdb/cli/cli-decode.c:95 While the second originates from: #0 exec_file_attach (filename=0x3d7a1d0 "/tmp/hello.x", from_tty=0) at ../../src/gdb/exec.c:513 #1 0x0000000000dfe525 in reread_symbols (from_tty=1) at ../../src/gdb/symfile.c:2517 openhwgroup#2 0x00000000009a1a98 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_STOP_AT_MAIN) at ../../src/gdb/infcmd.c:398 openhwgroup#3 0x00000000009a20a5 in start_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:527 openhwgroup#4 0x000000000068dc5d in do_simple_func (args=0x0, from_tty=1, c=0x35c7200) at ../../src/gdb/cli/cli-decode.c:95 In the first case the call to exec_file_attach first passes through reopen_exec_file. The reopen_exec_file performs a modification time check on the executable file, and only calls exec_file_attach if the executable has changed on disk since it was last loaded. However, in the second case things work a little differently. In this case GDB is really trying to reread the debug symbol. As such, we iterate over the objfiles list, and for each of those we check the modification time, if the file on disk has changed then we reload the debug symbols from that file. However, there is an additional check, if the objfile has the same name as the executable then we will call exec_file_attach, but we do so without checking the cached modification time that indicates when the executable was last reloaded, as a result, we reload the executable twice. In this commit I propose that reread_symbols be changed to unconditionally call reopen_exec_file before performing the objfile iteration. This will ensure that, if the executable has changed, then the executable will be reloaded, however, if the executable has already been recently reloaded, we will not reload it for a second time. After handling the executable, GDB can then iterate over the objfiles list and reload them in the normal way. With this done I now see the executable reloaded only once when GDB restarts an inferior, which means I can remove the kfail that I added to the gdb.python/py-exec-file.exp test in the previous commit. Approved-By: Tom Tromey <tom@tromey.com>

It was pointed out on the mailing list that a recently added test (gdb.python/py-progspace-events.exp) was failing when run with the native-extended-gdbserver board. This test was added with this commit: commit 59912fb Date: Tue Sep 19 11:45:36 2023 +0100 gdb: add Python events for program space addition and removal It turns out though that the test is failing due to a existing bug in GDB, the new test just exposes the problem. Additionally, the failure really doesn't even rely on the new functionality added in the above commit. I reduced the test to a simple set of steps that reproduced the failure and tested against GDB 13, and the test passes; so the bug was introduced since then. In fact, the bug was introduced with this commit: commit a282736 Date: Fri Sep 8 15:48:16 2023 +0100 gdb: remove final user of the executable_changed observer This commit changed how the per-inferior auxv data cache is managed, specifically, when the cache is cleared, and it is this that leads to the failure. This bug is interesting because it exposes a number of issues with GDB, I'll explain all of the problems I see, though ultimately, I only propose fixing one problem in this commit, which is enough to resolve the crash we are currently seeing. The crash that we are seeing manifests like this: ... [Inferior 2 (process 3970384) exited normally] +inferior 1 [Switching to inferior 1 [process 3970383] (/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events)] [Switching to thread 1.1 (Thread 3970383.3970383)] #0 breakpt () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-progspace-events.c:28 28 { /* Nothing. */ } (gdb) step +step terminate called after throwing an instance of 'gdb_exception_error' Fatal signal: Aborted ... etc ... What's happening is that GDB attempts to refill the auxv cache as a result of the gdbarch_has_shared_address_space call in program_space::~program_space, the backtrace looks like this: #0 0x00007fb4f419a9a5 in raise () from /lib64/libpthread.so.0 #1 0x00000000008b635d in handle_fatal_signal (sig=6) at ../../src/gdb/event-top.c:912 openhwgroup#2 <signal handler called> openhwgroup#3 0x00007fb4f38e3625 in raise () from /lib64/libc.so.6 openhwgroup#4 0x00007fb4f38cc8d9 in abort () from /lib64/libc.so.6 openhwgroup#5 0x00007fb4f3c70756 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6 openhwgroup#6 0x00007fb4f3c7c6dc in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 openhwgroup#7 0x00007fb4f3c7b6e9 in __cxa_call_terminate () from /lib64/libstdc++.so.6 openhwgroup#8 0x00007fb4f3c7c094 in __gxx_personality_v0 () from /lib64/libstdc++.so.6 openhwgroup#9 0x00007fb4f3a80c63 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1 openhwgroup#10 0x00007fb4f3a8154e in _Unwind_Resume () from /lib64/libgcc_s.so.1 openhwgroup#11 0x0000000000e8832d in target_read_alloc_1<unsigned char> (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2266 openhwgroup#12 0x0000000000e73dea in target_read_alloc (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2315 openhwgroup#13 0x000000000058248c in target_read_auxv_raw (ops=0x408a3a0) at ../../src/gdb/auxv.c:379 openhwgroup#14 0x000000000058243d in target_read_auxv () at ../../src/gdb/auxv.c:368 openhwgroup#15 0x000000000058255c in target_auxv_search (match=0x0, valp=0x7ffdee17c598) at ../../src/gdb/auxv.c:415 openhwgroup#16 0x0000000000a464bb in linux_is_uclinux () at ../../src/gdb/linux-tdep.c:433 openhwgroup#17 0x0000000000a464f6 in linux_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/linux-tdep.c:440 openhwgroup#18 0x0000000000510eae in gdbarch_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/gdbarch.c:4889 openhwgroup#19 0x0000000000bc7558 in program_space::~program_space (this=0x4544aa0, __in_chrg=<optimized out>) at ../../src/gdb/progspace.c:124 openhwgroup#20 0x00000000009b245d in delete_inferior (inf=0x47b3de0) at ../../src/gdb/inferior.c:290 openhwgroup#21 0x00000000009b2c10 in prune_inferiors () at ../../src/gdb/inferior.c:480 openhwgroup#22 0x00000000009c5e3e in fetch_inferior_event () at ../../src/gdb/infrun.c:4558 openhwgroup#23 0x000000000099b4dc in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42 openhwgroup#24 0x0000000000cbc64f in remote_async_serial_handler (scb=0x4090a30, context=0x408a6b0) at ../../src/gdb/remote.c:14859 openhwgroup#25 0x0000000000d83d3a in run_async_handler_and_reschedule (scb=0x4090a30) at ../../src/gdb/ser-base.c:138 openhwgroup#26 0x0000000000d83e1f in fd_event (error=0, context=0x4090a30) at ../../src/gdb/ser-base.c:189 So this is problem #1, if we throw an exception while deleting a program_space then this is not caught, and is going to crash GDB. Problem openhwgroup#2 becomes evident when we ask why GDB is throwing an error in this case; the error is thrown because the remote target, operating in non-async mode, can't read the auxv data while an inferior is running and GDB is waiting for a stop reply. The problem here then, is why does GDB get into a position where it tries to interact with the remote target in this way, at this time? The problem is caused by the prune_inferiors call which can be seen in the above backtrace. In prune_inferiors we check if the inferior is deletable, and if it is, we delete it. The problem is, I think, we should also check if the target is currently in a state that would allow us to delete the inferior. We don't currently have such a check available, we'd need to add one, but for the remote target, this would return false if the remote is in async mode and the remote is currently waiting for a stop reply. With this change in place GDB would defer deleting the inferior until the remote target has stopped, at which point GDB would be able to refill the auxv cache successfully. And then, problem openhwgroup#3 becomes evident when we ask why GDB is needing to refill the auxv cache now when it didn't need to for GDB 13. This is where the second commit mentioned above (a282736) comes in. Prior to this commit, the auxv cache was cleared by the executable_changed observer, while after that commit the auxv cache was cleared by the new_objfile observer -- but only when the new_objfile observer is used in the special mode that actually means that all objfiles have been unloaded (I know, the overloading of the new_objfile observer is horrible, and unnecessary, but it's not really important for this bug). The difference arises because the new_objfile observer is triggered from clear_symtab_users, which in turn is called from program_space::~program_space. The new_objfile observer for auxv does this: static void auxv_new_objfile_observer (struct objfile *objfile) { if (objfile == nullptr) invalidate_auxv_cache_inf (current_inferior ()); } That is, when all the objfiles are unloaded, we clear the auxv cache for the current inferior. The problem is, then when we look at the prune_inferiors -> delete_inferior -> ~program_space path, we see that the current inferior is not going to be an inferior that exists within the program_space being deleted; delete_inferior removes the deleted inferior from the global inferior list, and then only deletes the program_space if program_space::empty() returns true, which is only the case if the current inferior isn't within the program_space to delete, and no other inferior exists within that program_space either. What this means is that when the new_objfile observer is called we can't rely on the current inferior having any relationship with the program space in which the objfiles were removed. This was an error in the commit a282736, the only thing we can rely on is the current program space. As a result of this mistake, after commit a282736, GDB was sometimes clearing the auxv cache for a random inferior. In the native target case this was harmless as we can always refill the cache when needed, but in the remote target case, if we need to refill the cache when the remote target is executing, then we get the crash we observed. And additionally, if we think about this a little more, we see that commit a282736 made another mistake. When all the objfiles are removed, they are removed from a program_space, a program_space might contain multiple inferiors, so surely, we should clear the auxv cache for all of the matching inferiors? Given these two insights, that the current_inferior is not relevant, only the current_program_space, and that we should be clearing the cache for all inferiors in the current_program_space, we can update auxv_new_objfile_observer to: if (objfile == nullptr) { for (inferior *inf : all_inferiors ()) { if (inf->pspace == current_program_space) invalidate_auxv_cache_inf (inf); } } With this change we now correctly clear the auxv cache for the correct inferiors, and GDB no longer needs to refill the cache at an inconvenient time, this avoids the crash we were seeing. And finally, we reach problem openhwgroup#4. Inspired by the observation that using the current_inferior from within the ~program_space function was not correct, I added some debug to see if current_inferior() was called anywhere else (below ~program_space), and the answer is yes, it's called a often. Mostly the culprit is GDB doing: current_inferior ()->top_target ()-> .... But I think all of these calls are most likely doing the wrong thing, and only work because the top target in all these cases is shared between all inferiors, e.g. it's the native target, or the remote target for all inferiors. But if we had a truly multi-connection setup, then we might start to see odd behaviour. Problem #1 I'm just ignoring for now, I guess at some point we might run into this again, and then we'd need to solve this. But in this case I wasn't sure what a "good" solution would look like. We need the auxv data in order to implement the linux_is_uclinux() function. If we can't get the auxv data then what should we do, assume yes, or assume no? The right answer would probably be to propagate the error back up the stack, but then we reach ~program_space, and throwing exceptions from a destructor is problematic, so we'd need to catch and deal at this point. The linux_is_uclinux() call is made from within gdbarch_has_shared_address_space(), which is used like: if (!gdbarch_has_shared_address_space (target_gdbarch ())) delete this->aspace; So, we would have to choose; delete the address space or not. If we delete it on error, then we might delete an address space that is shared within another program space. If we don't delete the address space, then we might leak it. Neither choice is great. A better solution might be to have the address spaces be reference counted, then we could remove the gdbarch_has_shared_address_space call completely, and just rely on the reference count to auto-delete the address space when appropriate. The solution for problem openhwgroup#2 I already hinted at above, we should have a new target_can_delete_inferiors() call, which should be called from prune_inferiors, this would prevent GDB from trying to delete inferiors when a (remote) target is in a state where we know it can't delete the inferior. Deleting an inferior often (always?) requires sending packets to the remote, and if the remote is waiting for a stop reply then this will never work, so the pruning should be deferred in this case. The solution for problem openhwgroup#3 is included in this commit. And, for problem openhwgroup#4, I'm not sure what the right solution is. Maybe delete_inferior should ensure the inferior to be deleted is in place when ~program_space is called? But that seems a little weird, as the current inferior would, in theory, still be using the current program_space... Anyway, after this commit, the gdb.python/py-progspace-events.exp test now passes when run with the native-extended-remote board. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30935 Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I41f0e6e2d7ecc1e5e55ec170f37acd4052f46eaf

Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 openhwgroup#2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 openhwgroup#3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>

I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643

(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>

This commit extends the logic added by these two commits from a while ago: #1 7b96196 (gdbserver: hide fork child threads from GDB), #2 df5ad10 (gdb, gdbserver: detach fork child when detaching from fork parent) ... to handle thread clone events, which are very similar to (v)fork events. For #1, we want to hide clone children as well, so just update the comments. For #2, unlike (v)fork children, pending clone children aren't full processes, they're just threads, so don't detach them in handle_detach. linux-low.cc will take care of detaching them along with all other threads of the process, there's nothing special that needs to be done. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I7f5901d07efda576a2522d03e183994e071b8ffc

Running the gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase added later in the series against gdbserver, after the TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run into an infinite loop in stop_all_threads, leading to a timeout: FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout) The is really a latent bug, and it is about the fact that stop_all_threads stops listening to events from a target as soon as it sees a TARGET_WAITKIND_NO_RESUMED, ignoring that TARGET_WAITKIND_NO_RESUMED may be delayed. handle_no_resumed knows how to handle delayed no-resumed events, but stop_all_threads was never taught to. In more detail, here's what happens with that testcase: #1 - Multiple threads report breakpoint hits to gdb. #2 - gdb picks one events, and it's for thread 1. All other stops are left pending. thread 1 needs to move past a breakpoint, so gdb stops all threads to start an inline step over for thread 1. While stopping threads, some of the threads that were still running report events that are also left pending. #2 - gdb steps thread 1 #3 - Thread 1 exits while stepping (it steps over an exit syscall), gdbserver reports thread exit for thread 1 #4 - Thread 1 was the last resumed thread, so gdbserver also reports no-resumed: [remote] Notification received: Stop:w0;p3445d0.3445d3 [remote] Sending packet: $vStopped#55 [remote] Packet received: N [remote] Sending packet: $vStopped#55 [remote] Packet received: OK #5 - gdb processes the thread exit for thread 1, finishes the step over and restarts threads. #6 - gdb picks the next event to process out of one of the resumed threads with pending events: [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11 #7 - This is again a breakpoint hit and the breakpoint needs to be stepped over too, so gdb starts a step-over dance again. #8 - We reach stop_all_threads, which finds that some threads need to be stopped. #9 - wait_one finally consumes the no-resumed event queue by #4. Seeing this, wait_one disable target async, to stop listening for events out of the remote target. #10 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. #11 - Because the remote target is no longer async, and there are no other targets, wait_one return no-resumed immediately without polling the remote target. #12 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. goto #11, looping forever. Fix this by explicitly enabling/re-enabling target async on targets that can async, before waiting for stops. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f

This commit adds a new extension_language_ops hook which allows an extension to handle the case where GDB can't find a separate debug information file for a particular objfile. This commit doesn't actually implement the hook for any of GDB's extension languages, the next commit will do that. This commit just adds support for the hook to extension-priv.h and extension.[ch], and then reworks symfile-debug.c to call the hook. Right now the hook will always return its default value, which means GDB should do nothing different. As such, there should be no user visible changes after this commit. I'll give a brief description of what the hook does here so that we can understand the changes in symfile-debug.c. The next commit adds a Python implementation for this new hook, and gives a fuller description of the new functionality. Currently, when looking for separate debug information GDB tries three things, in this order: 1. Use the build-id to find the required debug information, 2. Check for .gnu_debuglink section and use that to look up the required debug information, 3. Check with debuginfod to see if it can supply the required information. The new extension_language_ops::handle_missing_debuginfo hook is called if all three steps fail to find any debug information. The hook has three possible return values: a. Nothing, no debug information is found, GDB continues without the debug information for this objfile. This matches the current behaviour of GDB, and is the default if nothing is implementing this new hook, b. Install debug information into a location that step #1 or #2 above would normally check, and then request that GDB repeats steps #1 and #2 in the hope that GDB will now find the debug information. If the debug information is still not found then GDB carries on without the debug information. If the debug information is found the GDB loads it and carries on, c. Return a filename for a file containing the required debug information. GDB loads the contents of this file and carries on. The changes in this commit mostly involve placing the core of objfile::find_and_add_separate_symbol_file into a loop which allows for steps #1 and #2 to be repeated. We take care to ensure that debuginfod is only queried once, the first time through. The assumption is that no extension is going to be able to control the replies from debuginfod, so there's no point making a second request -- and as these requests go over the network, they could potentially be slow. The warnings that find_and_add_separate_symbol_file collects are displayed only once assuming that no debug information is found. If debug information is found, even after the extension has operated, then the warnings are not shown; remember, these are warnings from GDB about failure to find any suitable debug information, so it makes sense to hide these if debug information is found. Approved-By: Tom Tromey <tom@tromey.com>

Change gdbarch_pseudo_register_read_value to take a frame instead of a regcache. The frame (and formerly the regcache) is used to read raw registers needed to make up the pseudo register value. The problem with using the regcache is that it always provides raw register values for the current frame (frame 0). Let's say the user wants to read the ebx register on amd64. ebx is a pseudo register, obtained by reading the bottom half (bottom 4 bytes) of the rbx register, which is a raw register. If the currently selected frame is frame 0, it works fine: (gdb) frame 0 #0 break_here_asm () at /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.arch/amd64-pseudo-unwind-asm.S:36 36 in /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.arch/amd64-pseudo-unwind-asm.S (gdb) p/x $ebx $1 = 0x24252627 (gdb) p/x $rbx $2 = 0x2021222324252627 But if the user is looking at another frame, and the raw register behind the pseudo register has been saved at some point in the call stack, then we get a wrong answer: (gdb) frame 1 #1 0x000055555555517d in caller () at /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.arch/amd64-pseudo-unwind-asm.S:56 56 in /home/smarchi/src/binutils-gdb/gdb/testsuite/gdb.arch/amd64-pseudo-unwind-asm.S (gdb) p/x $ebx $3 = 0x24252627 (gdb) p/x $rbx $4 = 0x1011121314151617 Here, the value of ebx was computed using the value of rbx in frame 0 (through the regcache), it should have been computed using the value of rbx in frame 1. In other to make this work properly, make the following changes: - Make dwarf2_frame_prev_register return nullptr if it doesn't know how to unwind a register and that register is a pseudo register. Previously, it returned `frame_unwind_got_register`, meaning, in our example, "the value of ebx in frame 1 is the same as the value of ebx in frame 0", which is obviously false. Return nullptr as a way to say "I don't know". - In frame_unwind_register_value, when prev_register (for instance dwarf2_frame_prev_register) returns nullptr, and we are trying to read a pseudo register, try to get the register value through gdbarch_pseudo_register_read_value or gdbarch_pseudo_register_read. If using gdbarch_pseudo_register_read, the behavior is known to be broken. Implementations should be migrated to use gdbarch_pseudo_register_read_value to fix that. - Change gdbarch_pseudo_register_read_value to take a frame_info instead of a regcache, update implementations (aarch64, amd64, i386). In i386-tdep.c, I made a copy of i386_mmx_regnum_to_fp_regnum that uses a frame instead of a regcache. The version using the regcache is still used by i386_pseudo_register_write. It will get removed in a subsequent patch. - Add some helpers in value.{c,h} to implement the common cases of pseudo registers: taking part of a raw register and concatenating multiple raw registers. - Update readable_regcache::{cooked_read,cooked_read_value} to pass the current frame to gdbarch_pseudo_register_read_value. Passing the current frame will give the same behavior as before: for frame 0, raw registers will be read from the current thread's regcache. Notes: - I do not plan on changing gdbarch_pseudo_register_read to receive a frame instead of a regcache. That method is considered deprecated. Instead, we should be working on migrating implementations to use gdbarch_pseudo_register_read_value instead. - In frame_unwind_register_value, we still ask the unwinder to try to unwind pseudo register values. It's apparently possible for the debug info to provide information about [1] pseudo registers, so we want to try that first, before falling back to computing them ourselves. [1] https://inbox.sourceware.org/gdb-patches/20180528174715.A954AD804AD@oc3748833570.ibm.com/ Change-Id: Id6ef1c64e19090a183dec050e4034d8c2394e7ca Reviewed-by: John Baldwin <jhb@FreeBSD.org>

On aarch64-linux, with gcc 13.2.1, I run into: ... (gdb) backtrace^M #0 break_here () at solib-search.c:30^M openhwgroup#1 0x0000fffff7f20194 in lib2_func4 () at solib-search-lib2.c:50^M openhwgroup#2 0x0000fffff7f70194 in lib1_func3 () at solib-search-lib1.c:50^M openhwgroup#3 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M openhwgroup#4 0x0000fffff7f70174 in lib1_func1 () at solib-search-lib1.c:30^M openhwgroup#5 0x00000000004101b4 in main () at solib-search.c:23^M (gdb) PASS: gdb.base/solib-search.exp: \ backtrace (with wrong libs) (data collection) FAIL: gdb.base/solib-search.exp: backtrace (with wrong libs) ... The FAIL is generated by this code in the test-case: ... if { $expect_fail } { # If the backtrace output is correct the test isn't sufficiently # testing what it should. if { $count == $total_expected } { set fail 1 } ... The test-case: - builds two versions of two shared libs, a "right" and "wrong" version, the difference being an additional dummy function (called spacer function), - uses the "right" version to generate a core file, - uses the "wrong" version to interpret the core file, and - generates a backtrace. The intent is that the backtrace is incorrect due to using the "wrong" version, but actually it's correct. This is because the spacer functions aren't large enough. Fix this by increasing the size of the spacer functions by adding a dummy loop, after which we have, as expected, an incorrect backtrace: ... (gdb) backtrace^M #0 break_here () at solib-search.c:30^M openhwgroup#1 0x0000fffff7f201c0 in ?? ()^M openhwgroup#2 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M openhwgroup#3 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M openhwgroup#4 0x0000fffff7f70174 in lib1_func1 () at solib-search-lib1.c:30^M openhwgroup#5 0x00000000004101b4 in main () at solib-search.c:23^M (gdb) PASS: gdb.base/solib-search.exp: \ backtrace (with wrong libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with wrong libs) ... Tested on aarch64-linux.

On ppc64le-linux, I run into: ... (gdb) bt^M #0 0x00000000100006dc in foobar (J=2)^M openhwgroup#1 0x000000001000070c in prog ()^M (gdb) FAIL: gdb.dwarf2/dw2-entry-points.exp: bt foo ... The test-case attemps to emulate additional entry points of a function, with function bar having entry points foo and foobar: ... (gdb) p bar $1 = {void (int, int)} 0x1000064c <bar> (gdb) p foo $2 = {void (int, int)} 0x10000698 <foo> (gdb) p foobar $3 = {void (int)} 0x100006d0 <foobar> ... However, when setting a breakpoint on the entry point foo: ... (gdb) b foo Breakpoint 1 at 0x100006dc ... it ends up in foobar instead of in foo, due to prologue skipping, and consequently the backtrace show foobar instead foo. The problem is that the test-case does not emulate an actual prologue at each entry point. Fix this by disabling the prologue skipping when setting a breakpoint, using "break *foo". Tested on ppc64le-linux and x86_64-linux. Tested-By: Guinevere Larsen <blarsen@redhat.com> Approved-By: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> PR testsuite/31232 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31232

The testsuite for SCFI contains target-specific tests. When a test is executed with --scfi=experimental command line option, the CFI annotations in the test .s files are skipped altogether by the GAS for processing. The CFI directives in the input assembly files are, however, validated by running the assembler one more time without --scfi=experimental. Some testcases are used to highlight those asm constructs that the SCFI machinery in GAS currently does not support: - Only System V AMD64 ABI is supported for now. Using either --32 or --x32 with SCFI results in hard error. See scfi-unsupported-1.s. - Untraceable stack-pointer manipulation in function epilougue and prologue. See scfi-unsupported-2.s. - Using Dynamically Realigned Arguement Pointer (DRAP) register to realign the stack. For SCFI, the CFA must be only REG_SP or REG_FP based. See scfi-unsupported-drap-1.s Some testcases are used to highlight some diagnostics that the SCFI machinery in GAS currently issues, with an intent to help user correct inadvertent errors in their hand-written asm. An error is issued when GAS finds that input asm is not amenable to correct CFI synthesis. - (openhwgroup#1) "Warning: SCFI: Asymetrical register restore" - (openhwgroup#2) "Error: SCFI: usage of REG_FP as scratch not supported" - (openhwgroup#3) "Error: SCFI: unsupported stack manipulation pattern" In case of (openhwgroup#2) and (openhwgroup#3), SCFI generation is skipped for the respective function. Above is a subset of the warnings/errors implemented in the code. gas/testsuite/: * gas/scfi/README: New test. * gas/scfi/x86_64/ginsn-add-1.l: New test. * gas/scfi/x86_64/ginsn-add-1.s: New test. * gas/scfi/x86_64/ginsn-dw2-regnum-1.l: New test. * gas/scfi/x86_64/ginsn-dw2-regnum-1.s: New test. * gas/scfi/x86_64/ginsn-pop-1.l: New test. * gas/scfi/x86_64/ginsn-pop-1.s: New test. * gas/scfi/x86_64/ginsn-push-1.l: New test. * gas/scfi/x86_64/ginsn-push-1.s: New test. * gas/scfi/x86_64/scfi-add-1.d: New test. * gas/scfi/x86_64/scfi-add-1.l: New test. * gas/scfi/x86_64/scfi-add-1.s: New test. * gas/scfi/x86_64/scfi-add-2.d: New test. * gas/scfi/x86_64/scfi-add-2.l: New test. * gas/scfi/x86_64/scfi-add-2.s: New test. * gas/scfi/x86_64/scfi-asm-marker-1.d: New test. * gas/scfi/x86_64/scfi-asm-marker-1.l: New test. * gas/scfi/x86_64/scfi-asm-marker-1.s: New test. * gas/scfi/x86_64/scfi-asm-marker-2.d: New test. * gas/scfi/x86_64/scfi-asm-marker-2.l: New test. * gas/scfi/x86_64/scfi-asm-marker-2.s: New test. * gas/scfi/x86_64/scfi-asm-marker-3.d: New test. * gas/scfi/x86_64/scfi-asm-marker-3.l: New test. * gas/scfi/x86_64/scfi-asm-marker-3.s: New test. * gas/scfi/x86_64/scfi-bp-sp-1.d: New test. * gas/scfi/x86_64/scfi-bp-sp-1.l: New test. * gas/scfi/x86_64/scfi-bp-sp-1.s: New test. * gas/scfi/x86_64/scfi-bp-sp-2.d: New test. * gas/scfi/x86_64/scfi-bp-sp-2.l: New test. * gas/scfi/x86_64/scfi-bp-sp-2.s: New test. * gas/scfi/x86_64/scfi-callee-saved-1.d: New test. * gas/scfi/x86_64/scfi-callee-saved-1.l: New test. * gas/scfi/x86_64/scfi-callee-saved-1.s: New test. * gas/scfi/x86_64/scfi-callee-saved-2.d: New test. * gas/scfi/x86_64/scfi-callee-saved-2.l: New test. * gas/scfi/x86_64/scfi-callee-saved-2.s: New test. * gas/scfi/x86_64/scfi-callee-saved-3.d: New test. * gas/scfi/x86_64/scfi-callee-saved-3.l: New test. * gas/scfi/x86_64/scfi-callee-saved-3.s: New test. * gas/scfi/x86_64/scfi-callee-saved-4.d: New test. * gas/scfi/x86_64/scfi-callee-saved-4.l: New test. * gas/scfi/x86_64/scfi-callee-saved-4.s: New test. * gas/scfi/x86_64/scfi-cfg-1.d: New test. * gas/scfi/x86_64/scfi-cfg-1.l: New test. * gas/scfi/x86_64/scfi-cfg-1.s: New test. * gas/scfi/x86_64/scfi-cfg-2.d: New test. * gas/scfi/x86_64/scfi-cfg-2.l: New test. * gas/scfi/x86_64/scfi-cfg-2.s: New test. * gas/scfi/x86_64/scfi-cfi-label-1.d: New test. * gas/scfi/x86_64/scfi-cfi-label-1.l: New test. * gas/scfi/x86_64/scfi-cfi-label-1.s: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.d: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.l: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.s: New test. * gas/scfi/x86_64/scfi-cofi-1.d: New test. * gas/scfi/x86_64/scfi-cofi-1.l: New test. * gas/scfi/x86_64/scfi-cofi-1.s: New test. * gas/scfi/x86_64/scfi-diag-1.l: New test. * gas/scfi/x86_64/scfi-diag-1.s: New test. * gas/scfi/x86_64/scfi-diag-2.l: New test. * gas/scfi/x86_64/scfi-diag-2.s: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.d: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.l: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.s: New test. * gas/scfi/x86_64/scfi-enter-1.d: New test. * gas/scfi/x86_64/scfi-enter-1.l: New test. * gas/scfi/x86_64/scfi-enter-1.s: New test. * gas/scfi/x86_64/scfi-fp-diag-2.l: New test. * gas/scfi/x86_64/scfi-fp-diag-2.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-5.s: New test. * gas/scfi/x86_64/scfi-lea-1.d: New test. * gas/scfi/x86_64/scfi-lea-1.l: New test. * gas/scfi/x86_64/scfi-lea-1.s: New test. * gas/scfi/x86_64/scfi-leave-1.d: New test. * gas/scfi/x86_64/scfi-leave-1.l: New test. * gas/scfi/x86_64/scfi-leave-1.s: New test. * gas/scfi/x86_64/scfi-pushq-1.d: New test. * gas/scfi/x86_64/scfi-pushq-1.l: New test. * gas/scfi/x86_64/scfi-pushq-1.s: New test. * gas/scfi/x86_64/scfi-pushsection-1.d: New test. * gas/scfi/x86_64/scfi-pushsection-1.l: New test. * gas/scfi/x86_64/scfi-pushsection-1.s: New test. * gas/scfi/x86_64/scfi-pushsection-2.d: New test. * gas/scfi/x86_64/scfi-pushsection-2.l: New test. * gas/scfi/x86_64/scfi-pushsection-2.s: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.d: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.l: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.s: New test. * gas/scfi/x86_64/scfi-simple-1.d: New test. * gas/scfi/x86_64/scfi-simple-1.l: New test. * gas/scfi/x86_64/scfi-simple-1.s: New test. * gas/scfi/x86_64/scfi-simple-2.d: New test. * gas/scfi/x86_64/scfi-simple-2.l: New test. * gas/scfi/x86_64/scfi-simple-2.s: New test. * gas/scfi/x86_64/scfi-sub-1.d: New test. * gas/scfi/x86_64/scfi-sub-1.l: New test. * gas/scfi/x86_64/scfi-sub-1.s: New test. * gas/scfi/x86_64/scfi-sub-2.d: New test. * gas/scfi/x86_64/scfi-sub-2.l: New test. * gas/scfi/x86_64/scfi-sub-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-2.l: New test. * gas/scfi/x86_64/scfi-unsupported-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-3.l: New test. * gas/scfi/x86_64/scfi-unsupported-3.s: New test. * gas/scfi/x86_64/scfi-unsupported-4.l: New test. * gas/scfi/x86_64/scfi-unsupported-4.s: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-2.l: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-drap-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-drap-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-insn-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-insn-1.s: New test. * gas/scfi/x86_64/scfi-x86-64.exp: New file.

A review comment on the SCFI V4 series was to handle ginsn creation for certain lea opcodes more precisely. Specifically, we should preferably handle the following two cases of lea opcodes similarly: - openhwgroup#1 lea with "index register and scale factor of 1, but no base register", - openhwgroup#2 lea with "no index register, but base register present". Currently, a ginsn of type GINSN_TYPE_OTHER is generated for the case of openhwgroup#1 above. For openhwgroup#2, however, the lea insn is translated to either a GINSN_TYPE_ADD or GINSN_TYPE_MOV depending on whether the immediate for displacement is non-zero or not respectively. Change the handling in x86_ginsn_lea so that both of the above lea manifestations are handled similarly. While at it, remove the code paths creating GINSN_TYPE_OTHER altogether from the function. It makes sense to piggy back on the x86_ginsn_unhandled code path to create GINSN_TYPE_OTHER if the destination register is interesting. This was also suggested in one of the previous review rounds; the other functions already follow that model, so this keeps functions symmetrical looking. gas/ * gas/config/tc-i386.c (x86_ginsn_lea): Handle select lea ops with no base register similar to the case of no index register. Remove creation of GINSN_TYPE_OTHER from the function. gas/testsuite/ * gas/scfi/x86_64/ginsn-lea-1.l: New test. * gas/scfi/x86_64/ginsn-lea-1.s: Likewise. * gas/scfi/x86_64/scfi-x86-64.exp: Add new test.

Bug PR gdb/28313 describes attaching to a process when the executable has been deleted. The bug is for S390 and describes how a user sees a message 'PC not saved'. On x86-64 (GNU/Linux) I don't see a 'PC not saved' message, but instead I see this: (gdb) attach 901877 Attaching to process 901877 No executable file now. warning: Could not load vsyscall page because no executable was specified 0x00007fa9d9c121e7 in ?? () (gdb) bt #0 0x00007fa9d9c121e7 in ?? () openhwgroup#1 0x00007fa9d9c1211e in ?? () openhwgroup#2 0x0000000000000007 in ?? () openhwgroup#3 0x000000002dc8b18d in ?? () openhwgroup#4 0x0000000000000000 in ?? () (gdb) Notice that the addresses in the backtrace don't seem right, quickly heading to 0x7 and finally ending at 0x0. What's going on, in both the s390 case and the x86-64 case is that the architecture's prologue scanner is going wrong and causing the stack unwinding to fail. The prologue scanner goes wrong because GDB has no unwind information. And GDB has no unwind information because, of course, the executable has been deleted. Notice in the example session above we get this line in the output: No executable file now. which indicates that GDB failed to find an executable to debug. For GNU/Linux when GDB tries to find an executable for a given pid we end up calling linux_proc_pid_to_exec_file in gdb/nat/linux-procfs.c. Within this function we call `readlink` on /proc/PID/exe to find the path of the actual executable. If the `readlink` call fails then we already fallback on using /proc/PID/exe as the path to the executable to debug. However, when the executable has been deleted the `readlink` call doesn't fail, but the path that is returned points to a non-existent file. I propose that we add an `access` call to linux_proc_pid_to_exec_file to check that the target file exists and can be read. If the target can't be read then we should fall back to /proc/PID/exe (assuming that /proc/PID/exe can be read). Now on x86-64 the output looks like this: (gdb) attach 901877 Attaching to process 901877 Reading symbols from /proc/901877/exe... Reading symbols from /lib64/libc.so.6... (No debugging symbols found in /lib64/libc.so.6) Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007fa9d9c121e7 in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007fa9d9c121e7 in nanosleep () from /lib64/libc.so.6 openhwgroup#1 0x00007fa9d9c1211e in sleep () from /lib64/libc.so.6 openhwgroup#2 0x000000000040117e in spin_forever () at attach-test.c:17 openhwgroup#3 0x0000000000401198 in main () at attach-test.c:24 (gdb) which is much better. I've also tagged the bug PR gdb/29782 which concerns the test gdb.server/connect-with-no-symbol-file.exp. After making this change, when running gdb.server/connect-with-no-symbol-file.exp GDB would now pick up the /proc/PID/exe file as the executable in some cases. As GDB is not restarted for the multiple iterations of this test GDB (or rather BFD) would given a warning/error like: (gdb) PASS: gdb.server/connect-with-no-symbol-file.exp: sysroot=target:: action=permission: setup: disconnect set sysroot target: BFD: reopening /proc/3283001/exe: No such file or directory (gdb) FAIL: gdb.server/connect-with-no-symbol-file.exp: sysroot=target:: action=permission: setup: adjust sysroot What's happening is that an executable found for an earlier iteration of the test is still registered for the inferior when we are setting up for a second iteration of the test. When the sysroot changes, if there's an executable registered GDB tries to reopen it, but in this case the file has disappeared (the previous inferior has exited by this point). I did think about maybe, when the executable is /proc/PID/exe, we should auto-delete the file from the inferior. But in the end I thought this was a bad idea. Not only would this require a lot of special code in GDB just to support this edge case: we'd need to track if the exe file name came from /proc and should be auto-deleted, or we'd need target specific code to check if a path should be auto-deleted..... ... in addition, we'd still want to warn the user when we auto-deleted the file from the inferior, otherwise they might be surprised to find their inferior suddenly has no executable attached, so we wouldn't actually reduce the number of warnings the user sees. So in the end I figured that the best solution is to just update the test to avoid the warning. This is easily done by manually removing the executable from the inferior once each iteration of the test has completed. Now, in bug PR gdb/29782 GDB is clearly managing to pick up an executable from the NFS cache somehow. I guess what's happening is that when the original file is deleted /proc/PID/exe is actually pointing to a file in the NFS cache which is only deleted at some later point, and so when GDB starts up we do manage to associate a file with the inferior, this results in the same message being emitted from BFD as I was seeing. The fix included in this commit should also fix that bug. One final note: On x86-64 GNU/Linux, the gdb.server/connect-with-no-symbol-file.exp test will produce 2 core files. This is due to a bug in gdbserver that is nothing to do with this test. These core files are created before and after this commit. I am working on a fix for the gdbserver issue, but will post that separately. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28313 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29782 Approved-By: Tom Tromey <tom@tromey.com>

Currently, if frame-filters are active, raw-values is used instead of raw-frame-arguments to decide if a pretty-printer should be invoked for frame arguments in a backtrace. In this example, "super struct" is the output of the pretty-printer: (gdb) disable frame-filter global BasicFrameFilter (gdb) bt #0 foo (x=42, ss=super struct = {...}) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 If no frame-filter is active, then the raw-values print option does not affect the backtrace output: (gdb) set print raw-values on (gdb) bt #0 foo (x=42, ss=super struct = {...}) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 (gdb) set print raw-values off Instead, the raw-frame-arguments option disables the pretty-printer in the backtrace: (gdb) bt -raw-frame-arguments on #0 foo (x=42, ss=...) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 But if a frame-filter is active, the same rules don't apply. The option raw-frame-arguments is ignored, but raw-values decides if the pretty-printer is used: (gdb) enable frame-filter global BasicFrameFilter (gdb) bt #0 foo (x=42, ss=super struct = {...}) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 (gdb) set print raw-values on (gdb) bt #0 foo (x=42, ss=...) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 (gdb) set print raw-values off (gdb) bt -raw-frame-arguments on #0 foo (x=42, ss=super struct = {...}) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 So this adds the PRINT_RAW_FRAME_ARGUMENTS flag to frame_filter_flag, which is then used in the frame-filter to override the raw flag in enumerate_args. Then the output is the same if a frame-filter is active, the pretty-printer for backtraces is only disabled with the raw-frame-arguments option: (gdb) enable frame-filter global BasicFrameFilter (gdb) bt #0 foo (x=42, ss=super struct = {...}) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 (gdb) set print raw-values on (gdb) bt #0 foo (x=42, ss=super struct = {...}) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 (gdb) set print raw-values off (gdb) bt -raw-frame-arguments on #0 foo (x=42, ss=...) at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:47 openhwgroup#1 0x004016aa in main () at C:/src/repos/gdb-testsuite/gdb/testsuite/gdb.python/py-frame-args.c:57 Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Approved-By: Tom Tromey <tom@tromey.com>

When running test-case gdb.dap/eof.exp, it occasionally coredumps. The thread triggering the coredump is: ... #0 0x0000ffff42bb2280 in __pthread_kill_implementation () from /lib64/libc.so.6 openhwgroup#1 0x0000ffff42b65800 [PAC] in raise () from /lib64/libc.so.6 openhwgroup#2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11) at gdb/event-top.c:926 openhwgroup#3 0x00000000007b0470 in handle_sigsegv (sig=11) at gdb/event-top.c:976 openhwgroup#4 <signal handler called> openhwgroup#5 0x0000000000606080 in cli_ui_out::do_message (this=0xffff2f7ed728, style=..., format=0xffff0c002af1 "%s", args=...) at gdb/cli-out.c:232 openhwgroup#6 0x0000000000ce6358 in ui_out::call_do_message (this=0xffff2f7ed728, style=..., format=0xffff0c002af1 "%s") at gdb/ui-out.c:584 openhwgroup#7 0x0000000000ce6610 in ui_out::vmessage (this=0xffff2f7ed728, in_style=..., format=0x16f93ea "", args=...) at gdb/ui-out.c:621 openhwgroup#8 0x0000000000ce3a9c in ui_file::vprintf (this=0xfffffbea1b18, ...) at gdb/ui-file.c:74 openhwgroup#9 0x0000000000d2b148 in gdb_vprintf (stream=0xfffffbea1b18, format=0x16f93e8 "%s", args=...) at gdb/utils.c:1898 openhwgroup#10 0x0000000000d2b23c in gdb_printf (stream=0xfffffbea1b18, format=0x16f93e8 "%s") at gdb/utils.c:1913 openhwgroup#11 0x0000000000ab5208 in gdbpy_write (self=0x33fe35d0, args=0x342ec280, kw=0x345c08b0) at gdb/python/python.c:1464 openhwgroup#12 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0 openhwgroup#13 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall () from /lib64/libpython3.12.so.1.0 openhwgroup#14 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 openhwgroup#15 0x0000ffff434d8cd0 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0 openhwgroup#16 0x0000ffff434b9824 [PAC] in PyObject_CallOneArg () from /lib64/libpython3.12.so.1.0 openhwgroup#17 0x0000ffff43557674 [PAC] in PyFile_WriteObject () from /lib64/libpython3.12.so.1.0 openhwgroup#18 0x0000ffff435577a0 [PAC] in PyFile_WriteString () from /lib64/libpython3.12.so.1.0 openhwgroup#19 0x0000ffff43465354 [PAC] in thread_excepthook () from /lib64/libpython3.12.so.1.0 openhwgroup#20 0x0000ffff434ac6e0 [PAC] in cfunction_vectorcall_O () from /lib64/libpython3.12.so.1.0 openhwgroup#21 0x0000ffff434a32d8 [PAC] in PyObject_Vectorcall () from /lib64/libpython3.12.so.1.0 openhwgroup#22 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 openhwgroup#23 0x0000ffff434d8d88 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0 openhwgroup#24 0x0000ffff435e0ef4 [PAC] in thread_run () from /lib64/libpython3.12.so.1.0 openhwgroup#25 0x0000ffff43591ec0 [PAC] in pythread_wrapper () from /lib64/libpython3.12.so.1.0 openhwgroup#26 0x0000ffff42bb0584 [PAC] in start_thread () from /lib64/libc.so.6 openhwgroup#27 0x0000ffff42c1fd4c [PAC] in thread_start () from /lib64/libc.so.6 ... The direct cause for the coredump seems to be that cli_ui_out::do_message is trying to write to a stream variable which does not look sound: ... (gdb) p *stream $8 = {_vptr.ui_file = 0x0, m_applied_style = {m_foreground = {m_simple = true, { m_value = 0, {m_red = 0 '\000', m_green = 0 '\000', m_blue = 0 '\000'}}}, m_background = {m_simple = 32, {m_value = 65535, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 0 '\000'}}}, m_intensity = (unknown: 0x438fe710), m_reverse = 255}} ... The string that is being printed is: ... (gdb) p str $9 = "Exception in thread " ... so AFAICT this is a DAP thread running into an exception and trying to print it. If we look at the state of gdb's main thread, we have: ... #0 0x0000ffff42bac914 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 openhwgroup#1 0x0000ffff42bafb44 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6 openhwgroup#2 0x0000ffff43466e9c [PAC] in take_gil () from /lib64/libpython3.12.so.1.0 openhwgroup#3 0x0000ffff43484fe0 [PAC] in PyEval_RestoreThread () from /lib64/libpython3.12.so.1.0 openhwgroup#4 0x0000000000ab8698 [PAC] in gdbpy_allow_threads::~gdbpy_allow_threads ( this=0xfffffbea1cf8, __in_chrg=<optimized out>) at gdb/python/python-internal.h:769 openhwgroup#5 0x0000000000ab2fec in execute_gdb_command (self=0x33fe35d0, args=0x34297b60, kw=0x34553d20) at gdb/python/python.c:681 openhwgroup#6 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0 openhwgroup#7 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall () from /lib64/libpython3.12.so.1.0 openhwgroup#8 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 openhwgroup#9 0x0000ffff4353bce8 [PAC] in _PyObject_VectorcallTstate.lto_priv.3 () from /lib64/libpython3.12.so.1.0 openhwgroup#10 0x0000000000ab87fc [PAC] in gdbpy_event::operator() (this=0xffff14005900) at gdb/python/python.c:1061 openhwgroup#11 0x0000000000ab93e8 in std::__invoke_impl<void, gdbpy_event&> (__f=...) at /usr/include/c++/13/bits/invoke.h:61 openhwgroup#12 0x0000000000ab9204 in std::__invoke_r<void, gdbpy_event&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:111 openhwgroup#13 0x0000000000ab8e90 in std::_Function_handler<..>::_M_invoke(...) (...) at /usr/include/c++/13/bits/std_function.h:290 openhwgroup#14 0x000000000062e0d0 in std::function<void ()>::operator()() const ( this=0xffff14005830) at /usr/include/c++/13/bits/std_function.h:591 openhwgroup#15 0x0000000000b67f14 in run_events (error=0, client_data=0x0) at gdb/run-on-main-thread.c:76 openhwgroup#16 0x000000000157e290 in handle_file_event (file_ptr=0x33dae3a0, ready_mask=1) at gdbsupport/event-loop.cc:573 openhwgroup#17 0x000000000157e760 in gdb_wait_for_event (block=1) at gdbsupport/event-loop.cc:694 openhwgroup#18 0x000000000157d464 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:264 openhwgroup#19 0x0000000000943a84 in start_event_loop () at gdb/main.c:401 openhwgroup#20 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465 openhwgroup#21 0x000000000094567c in captured_main (data=0xfffffbea23e8) at gdb/main.c:1335 openhwgroup#22 0x0000000000945700 in gdb_main (args=0xfffffbea23e8) at gdb/main.c:1354 openhwgroup#23 0x0000000000423ab4 in main (argc=14, argv=0xfffffbea2578) at gdb/gdb.c:39 ... AFAIU, there's a race between the two threads on gdb_stderr: - the DAP thread samples the gdb_stderr value, and uses it a bit later to print to - the gdb main thread changes the gdb_stderr value forth and back, using a temporary value for string capture purposes The non-sound stream value is caused by gdb_stderr being sampled while pointing to a str_file object, and used once the str_file object is already destroyed. The error here is that the DAP thread attempts to print to gdb_stderr. Fix this by adding a thread_wrapper that: - catches all exceptions and logs them to dap.log, and - while we're at it, logs when exiting and using the thread_wrapper for each DAP thread. Tested on aarch64-linux. Approved-By: Tom Tromey <tom@tromey.com>

When running test-case gdb.dap/eof.exp, we're likely to get a coredump due to a segfault in new_threadstate. At the point of the core dump, the gdb main thread looks like: ... (gdb) bt #0 0x0000fffee30d2280 in __pthread_kill_implementation () from /lib64/libc.so.6 openhwgroup#1 0x0000fffee3085800 [PAC] in raise () from /lib64/libc.so.6 openhwgroup#2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11) at gdb/event-top.c:926 openhwgroup#3 0x00000000007b0470 in handle_sigsegv (sig=11) at gdb/event-top.c:976 openhwgroup#4 <signal handler called> openhwgroup#5 0x0000fffee3a4db14 in new_threadstate () from /lib64/libpython3.12.so.1.0 openhwgroup#6 0x0000fffee3ab0548 [PAC] in PyGILState_Ensure () from /lib64/libpython3.12.so.1.0 openhwgroup#7 0x0000000000a6d034 [PAC] in gdbpy_gil::gdbpy_gil (this=0xffffcb279738) at gdb/python/python-internal.h:787 openhwgroup#8 0x0000000000ab87ac in gdbpy_event::~gdbpy_event (this=0xfffea8001ee0, __in_chrg=<optimized out>) at gdb/python/python.c:1051 openhwgroup#9 0x0000000000ab9460 in std::_Function_base::_Base_manager<...>::_M_destroy (__victim=...) at /usr/include/c++/13/bits/std_function.h:175 openhwgroup#10 0x0000000000ab92dc in std::_Function_base::_Base_manager<...>::_M_manager (__dest=..., __source=..., __op=std::__destroy_functor) at /usr/include/c++/13/bits/std_function.h:203 openhwgroup#11 0x0000000000ab8f14 in std::_Function_handler<...>::_M_manager(...) (...) at /usr/include/c++/13/bits/std_function.h:282 openhwgroup#12 0x000000000042dd9c in std::_Function_base::~_Function_base (this=0xfffea8001c10, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:244 openhwgroup#13 0x000000000042e654 in std::function<void ()>::~function() (this=0xfffea8001c10, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:334 openhwgroup#14 0x0000000000b68e60 in std::_Destroy<std::function<void ()> >(...) (...) at /usr/include/c++/13/bits/stl_construct.h:151 openhwgroup#15 0x0000000000b68cd0 in std::_Destroy_aux<false>::__destroy<...>(...) (...) at /usr/include/c++/13/bits/stl_construct.h:163 openhwgroup#16 0x0000000000b689d8 in std::_Destroy<...>(...) (...) at /usr/include/c++/13/bits/stl_construct.h:196 openhwgroup#17 0x0000000000b68414 in std::_Destroy<...>(...) (...) at /usr/include/c++/13/bits/alloc_traits.h:948 openhwgroup#18 std::vector<...>::~vector() (this=0x2a183c8 <runnables>) at /usr/include/c++/13/bits/stl_vector.h:732 openhwgroup#19 0x0000fffee3088370 in __run_exit_handlers () from /lib64/libc.so.6 openhwgroup#20 0x0000fffee3088450 [PAC] in exit () from /lib64/libc.so.6 openhwgroup#21 0x0000000000c95600 [PAC] in quit_force (exit_arg=0x0, from_tty=0) at gdb/top.c:1822 openhwgroup#22 0x0000000000609140 in quit_command (args=0x0, from_tty=0) at gdb/cli/cli-cmds.c:508 openhwgroup#23 0x0000000000c926a4 in quit_cover () at gdb/top.c:300 openhwgroup#24 0x00000000007b09d4 in async_disconnect (arg=0x0) at gdb/event-top.c:1230 openhwgroup#25 0x0000000000548acc in invoke_async_signal_handlers () at gdb/async-event.c:234 openhwgroup#26 0x000000000157d2d4 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:199 openhwgroup#27 0x0000000000943a84 in start_event_loop () at gdb/main.c:401 openhwgroup#28 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465 openhwgroup#29 0x000000000094567c in captured_main (data=0xffffcb279d08) at gdb/main.c:1335 openhwgroup#30 0x0000000000945700 in gdb_main (args=0xffffcb279d08) at gdb/main.c:1354 openhwgroup#31 0x0000000000423ab4 in main (argc=14, argv=0xffffcb279e98) at gdb/gdb.c:39 ... The direct cause of the segfault is calling PyGILState_Ensure after calling Py_Finalize. AFAICT the problem is a race between the gdb main thread and DAP's JSON writer thread. On one side, we have the following events: - DAP's JSON reader thread reads an EOF, and lets DAP's main thread known by writing None into read_queue - DAP's main thread lets DAP's JSON writer thread known by writing None into write_queue - DAP's JSON writer thread sees the None in its queue, and calls send_gdb("quit") - a corresponding gdbpy_event is deposited in the runnables vector, to be run by the gdb main thread On the other side, we have the following events: - the gdb main thread receives a SIGHUP - the corresponding handler calls quit_force, which calls do_final_cleanups - one of the final cleanups is finalize_python, which calls Py_Finalize - quit_force calls exit, which triggers the exit handlers - one of the exit handlers is the destructor of the runnables vector - destruction of the vector triggers destruction of the remaining element - the remaining element is a gdbpy_event, and the destructor (indirectly) calls PyGILState_Ensure It's good to note that both events (EOF and SIGHUP) are caused by this line in the test-case: ... catch "close -i $gdb_spawn_id" ... where "expect close" closes the stdin and stdout file descriptors, which causes the SIGHUP to be send. So, for the system I'm running this on, the send_gdb("quit") is actually not needed. I'm not sure if we support any systems where it's actually needed. Fix this by removing the send_gdb("quit"). Tested on aarch64-linux. PR dap/31306 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31306

When building gdb with -O0 -fsanitize=address, and running test-case gdb.ada/uninitialized_vars.exp, I run into: ... (gdb) info locals a = 0 z = (a => 1, b => false, c => 2.0) ================================================================= ==66372==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000097f58 at pc 0xffff52c0da1c bp 0xffffc90a1d40 sp 0xffffc90a1d80 READ of size 4 at 0x602000097f58 thread T0 #0 0xffff52c0da18 in memmove (/lib64/libasan.so.8+0x6da18) openhwgroup#1 0xbcab24 in unsigned char* std::__copy_move_backward<false, true, std::random_access_iterator_tag>::__copy_move_b<unsigned char const, unsigned char>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:748 openhwgroup#2 0xbc9bf4 in unsigned char* std::__copy_move_backward_a2<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:769 openhwgroup#3 0xbc898c in unsigned char* std::__copy_move_backward_a1<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:778 openhwgroup#4 0xbc715c in unsigned char* std::__copy_move_backward_a<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:807 openhwgroup#5 0xbc4e6c in unsigned char* std::copy_backward<unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:867 openhwgroup#6 0xbc2934 in void gdb::copy<unsigned char const, unsigned char>(gdb::array_view<unsigned char const>, gdb::array_view<unsigned char>) gdb/../gdbsupport/array-view.h:223 openhwgroup#7 0x20e0100 in value::contents_copy_raw(value*, long, long, long) gdb/value.c:1239 openhwgroup#8 0x20e9830 in value::primitive_field(long, int, type*) gdb/value.c:3078 openhwgroup#9 0x20e98f8 in value_field(value*, int) gdb/value.c:3095 openhwgroup#10 0xcafd64 in print_field_values gdb/ada-valprint.c:658 openhwgroup#11 0xcb0fa0 in ada_val_print_struct_union gdb/ada-valprint.c:857 openhwgroup#12 0xcb1bb4 in ada_value_print_inner(value*, ui_file*, int, value_print_options const*) gdb/ada-valprint.c:1042 openhwgroup#13 0xc66e04 in ada_language::value_print_inner(value*, ui_file*, int, value_print_options const*) const (/home/vries/gdb/build/gdb/gdb+0xc66e04) openhwgroup#14 0x20ca1e8 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1092 openhwgroup#15 0x20caabc in common_val_print_checked(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1184 openhwgroup#16 0x196c524 in print_variable_and_value(char const*, symbol*, frame_info_ptr, ui_file*, int) gdb/printcmd.c:2355 openhwgroup#17 0x1d99ca0 in print_variable_and_value_data::operator()(char const*, symbol*) gdb/stack.c:2308 openhwgroup#18 0x1dabca0 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)openhwgroup#1}::operator()(gdb::fv_detail::erased_callable, char const*, symbol*) const gdb/../gdbsupport/function-view.h:305 openhwgroup#19 0x1dabd14 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)openhwgroup#1}::_FUN(gdb::fv_detail::erased_callable, char const*, symbol*) gdb/../gdbsupport/function-view.h:299 openhwgroup#20 0x1dab34c in gdb::function_view<void (char const*, symbol*)>::operator()(char const*, symbol*) const gdb/../gdbsupport/function-view.h:289 openhwgroup#21 0x1d9963c in iterate_over_block_locals gdb/stack.c:2240 openhwgroup#22 0x1d99790 in iterate_over_block_local_vars(block const*, gdb::function_view<void (char const*, symbol*)>) gdb/stack.c:2259 openhwgroup#23 0x1d9a598 in print_frame_local_vars gdb/stack.c:2380 openhwgroup#24 0x1d9afac in info_locals_command(char const*, int) gdb/stack.c:2458 openhwgroup#25 0xfd7b30 in do_simple_func gdb/cli/cli-decode.c:95 openhwgroup#26 0xfe5a2c in cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 openhwgroup#27 0x1f03790 in execute_command(char const*, int) gdb/top.c:575 openhwgroup#28 0x1384080 in command_handler(char const*) gdb/event-top.c:566 openhwgroup#29 0x1384e2c in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:802 openhwgroup#30 0x1f731e4 in tui_command_line_handler gdb/tui/tui-interp.c:104 openhwgroup#31 0x1382a58 in gdb_rl_callback_handler gdb/event-top.c:259 openhwgroup#32 0x21dbb80 in rl_callback_read_char readline/readline/callback.c:290 openhwgroup#33 0x1382510 in gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 openhwgroup#34 0x138277c in gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 openhwgroup#35 0x1fe9b40 in stdin_event_handler gdb/ui.c:155 openhwgroup#36 0x35ff1bc in handle_file_event gdbsupport/event-loop.cc:573 openhwgroup#37 0x35ff9d8 in gdb_wait_for_event gdbsupport/event-loop.cc:694 openhwgroup#38 0x35fd284 in gdb_do_one_event(int) gdbsupport/event-loop.cc:264 openhwgroup#39 0x1768080 in start_event_loop gdb/main.c:408 openhwgroup#40 0x17684c4 in captured_command_loop gdb/main.c:472 openhwgroup#41 0x176cfc8 in captured_main gdb/main.c:1342 openhwgroup#42 0x176d088 in gdb_main(captured_main_args*) gdb/main.c:1361 openhwgroup#43 0xb73edc in main gdb/gdb.c:39 openhwgroup#44 0xffff519b09d8 in __libc_start_call_main (/lib64/libc.so.6+0x309d8) openhwgroup#45 0xffff519b0aac in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x30aac) openhwgroup#46 0xb73c2c in _start (/home/vries/gdb/build/gdb/gdb+0xb73c2c) 0x602000097f58 is located 0 bytes after 8-byte region [0x602000097f50,0x602000097f58) allocated by thread T0 here: #0 0xffff52c65218 in calloc (/lib64/libasan.so.8+0xc5218) openhwgroup#1 0xcbc278 in xcalloc gdb/alloc.c:97 openhwgroup#2 0x35f21e8 in xzalloc(unsigned long) gdbsupport/common-utils.cc:29 openhwgroup#3 0x20de270 in value::allocate_contents(bool) gdb/value.c:937 openhwgroup#4 0x20edc08 in value::fetch_lazy() gdb/value.c:4033 openhwgroup#5 0x20dadc0 in value::entirely_covered_by_range_vector(std::vector<range, std::allocator<range> > const&) gdb/value.c:229 openhwgroup#6 0xcb2298 in value::entirely_optimized_out() gdb/value.h:560 openhwgroup#7 0x20ca6fc in value_check_printable gdb/valprint.c:1133 openhwgroup#8 0x20caa8c in common_val_print_checked(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1182 openhwgroup#9 0x196c524 in print_variable_and_value(char const*, symbol*, frame_info_ptr, ui_file*, int) gdb/printcmd.c:2355 openhwgroup#10 0x1d99ca0 in print_variable_and_value_data::operator()(char const*, symbol*) gdb/stack.c:2308 openhwgroup#11 0x1dabca0 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)openhwgroup#1}::operator()(gdb::fv_detail::erased_callable, char const*, symbol*) const gdb/../gdbsupport/function-view.h:305 openhwgroup#12 0x1dabd14 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)openhwgroup#1}::_FUN(gdb::fv_detail::erased_callable, char const*, symbol*) gdb/../gdbsupport/function-view.h:299 openhwgroup#13 0x1dab34c in gdb::function_view<void (char const*, symbol*)>::operator()(char const*, symbol*) const gdb/../gdbsupport/function-view.h:289 openhwgroup#14 0x1d9963c in iterate_over_block_locals gdb/stack.c:2240 openhwgroup#15 0x1d99790 in iterate_over_block_local_vars(block const*, gdb::function_view<void (char const*, symbol*)>) gdb/stack.c:2259 openhwgroup#16 0x1d9a598 in print_frame_local_vars gdb/stack.c:2380 openhwgroup#17 0x1d9afac in info_locals_command(char const*, int) gdb/stack.c:2458 openhwgroup#18 0xfd7b30 in do_simple_func gdb/cli/cli-decode.c:95 openhwgroup#19 0xfe5a2c in cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 openhwgroup#20 0x1f03790 in execute_command(char const*, int) gdb/top.c:575 openhwgroup#21 0x1384080 in command_handler(char const*) gdb/event-top.c:566 openhwgroup#22 0x1384e2c in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:802 openhwgroup#23 0x1f731e4 in tui_command_line_handler gdb/tui/tui-interp.c:104 openhwgroup#24 0x1382a58 in gdb_rl_callback_handler gdb/event-top.c:259 openhwgroup#25 0x21dbb80 in rl_callback_read_char readline/readline/callback.c:290 openhwgroup#26 0x1382510 in gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 openhwgroup#27 0x138277c in gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 openhwgroup#28 0x1fe9b40 in stdin_event_handler gdb/ui.c:155 openhwgroup#29 0x35ff1bc in handle_file_event gdbsupport/event-loop.cc:573 SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib64/libasan.so.8+0x6da18) in memmove ... The error happens when trying to print either variable y or y2: ... type Variable_Record (A : Boolean := True) is record case A is when True => B : Integer; when False => C : Float; D : Integer; end case; end record; Y : Variable_Record := (A => True, B => 1); Y2 : Variable_Record := (A => False, C => 1.0, D => 2); ... when the variables are uninitialized. The error happens only when printing the entire variable: ... (gdb) p y.a $2 = 216 (gdb) p y.b There is no member named b. (gdb) p y.c $3 = 9.18340949e-41 (gdb) p y.d $4 = 1 (gdb) p y <AddressSanitizer: heap-buffer-overflow> ... The error happens as follows: - field a functions as discriminant, choosing either the b, or c+d variant. - when y.a happens to be set to 216, as above, gdb interprets this as the variable having the c+d variant (which is why trying to print y.b fails). - when printing y, gdb allocates a value, copies the bytes into it from the target, and then prints the value. - gdb allocates the value using the type size, which is 8. It's 8 because that's what the DW_AT_byte_size indicates. Note that for valid values of a, it gives correct results: if a is 0 (c+d variant), size is 12, if a is 1 (b variant), size is 8. - gdb tries to print field d, which is at an 8 byte offset, and that results in a out-of-bounds access for the allocated 8-byte value. Fix this by handling this case in value::contents_copy_raw, such that we have: ... (gdb) p y $1 = (a => 24, c => 9.18340949e-41, d => <error reading variable: access outside bounds of object>) ... An alternative (additional) fix could be this: in compute_variant_fields_inner gdb reads the discriminant y.a to decide which variant is active. It would be nice to detect that the value (y.a == 24) is not a valid Boolean, and give up on choosing a variant altoghether. However, the situation regarding the internal type CODE_TYPE_BOOL is currently ambiguous (see PR31282) and it's not possible to reliably decide what valid values are. The test-case source file gdb.ada/uninitialized-variable-record/parse.adb is a reduced version of gdb.ada/uninitialized_vars/parse.adb, so it copies the copyright years. Note that the test-case needs gcc-12 or newer, it's unsupported for older gcc versions. [ So, it would be nice to rewrite it into a dwarf assembly test-case. ] The test-case loops over all languages. This is inherited from an earlier attempt to fix this, which had language-specific fixes (in print_field_values, cp_print_value_fields, pascal_object_print_value_fields and f_language::value_print_inner). I've left this in, but I suppose it's not strictly necessary anymore. Tested on x86_64-linux. PR exp/31258 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31258

From the Python API, we can execute GDB commands via gdb.execute. If the command gives an exception, however, we need to recover the GDB prompt and enable stdin, because the exception does not reach top-level GDB or normal_stop. This was done in commit commit 1ba1ac8 Author: Andrew Burgess <andrew.burgess@embecosm.com> Date: Tue Nov 19 11:17:20 2019 +0000 gdb: Enable stdin on exception in execute_gdb_command with the following code: catch (const gdb_exception &except) { /* If an exception occurred then we won't hit normal_stop (), or have an exception reach the top level of the event loop, which are the two usual places in which stdin would be re-enabled. So, before we convert the exception and continue back in Python, we should re-enable stdin here. */ async_enable_stdin (); GDB_PY_HANDLE_EXCEPTION (except); } In this patch, we explain what happens when we run a GDB command in the context of a synchronous command, e.g. via Python observer notifications. As an example, suppose we have the following objfile event listener, specified in a file named file.py: ~~~ import gdb class MyListener: def __init__(self): gdb.events.new_objfile.connect(self.handle_new_objfile_event) self.processed_objfile = False def handle_new_objfile_event(self, event): if self.processed_objfile: return print("loading " + event.new_objfile.filename) self.processed_objfile = True gdb.execute('add-inferior -no-connection') gdb.execute('inferior 2') gdb.execute('target remote | gdbserver - /tmp/a.out') gdb.execute('inferior 1') the_listener = MyListener() ~~~ Using this Python file, we see the behavior below: $ gdb -q -ex "source file.py" -ex "run" --args a.out Reading symbols from a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 [New inferior 2] Added inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] stdin/stdout redirected Process /tmp/a.out created; pid = 3075406 Remote debugging using stdio Reading /tmp/a.out from remote target... ... [Switching to inferior 1 [process 3075400] (/tmp/a.out)] [Switching to thread 1.1 (process 3075400)] #0 0x00007ffff7fe3290 in ?? () from /lib64/ld-linux-x86-64.so.2 (gdb) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 1 (process 3075400) exited normally] Note how the GDB prompt comes in-between the debugger output. We have this obscure behavior, because the executed command, "target remote", triggers an invocation of `normal_stop` that enables stdin. After that, however, the Python notification context completes and GDB continues with its normal flow of executing the 'run' command. This can be seen in the call stack below: (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 openhwgroup#1 0x00005555561c3acd in normal_stop () at src/gdb/infrun.c:9432 openhwgroup#2 0x00005555561b328e in start_remote (from_tty=0) at src/gdb/infrun.c:3801 openhwgroup#3 0x0000555556441224 in remote_target::start_remote_1 (this=0x5555587882e0, from_tty=0, extended_p=0) at src/gdb/remote.c:5225 openhwgroup#4 0x000055555644166c in remote_target::start_remote (this=0x5555587882e0, from_tty=0, extended_p=0) at src/gdb/remote.c:5316 openhwgroup#5 0x00005555564430cf in remote_target::open_1 (name=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0, extended_p=0) at src/gdb/remote.c:6175 openhwgroup#6 0x0000555556441707 in remote_target::open (name=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0) at src/gdb/remote.c:5338 openhwgroup#7 0x00005555565ea63f in open_target (args=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0, command=0x555558589280) at src/gdb/target.c:824 openhwgroup#8 0x0000555555f0d89a in cmd_func (cmd=0x555558589280, args=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0) at src/gdb/cli/cli-decode.c:2735 openhwgroup#9 0x000055555661fb42 in execute_command (p=0x55555878529e "t", from_tty=0) at src/gdb/top.c:575 openhwgroup#10 0x0000555555f1a506 in execute_control_command_1 (cmd=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:529 openhwgroup#11 0x0000555555f1abea in execute_control_command (cmd=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:701 openhwgroup#12 0x0000555555f19fc7 in execute_control_commands (cmdlines=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:411 openhwgroup#13 0x0000555556400d91 in execute_gdb_command (self=0x7ffff43b5d00, args=0x7ffff440ab60, kw=0x0) at src/gdb/python/python.c:700 openhwgroup#14 0x00007ffff7a96023 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#15 0x00007ffff7a4dadc in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#16 0x00007ffff79e9a1c in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#17 0x00007ffff7b303af in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#18 0x00007ffff7a50358 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#19 0x00007ffff7a4f3f4 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#20 0x00007ffff7a4f883 in PyObject_CallFunctionObjArgs () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#21 0x00005555563a9758 in evpy_emit_event (event=0x7ffff42b5430, registry=0x7ffff42b4690) at src/gdb/python/py-event.c:104 openhwgroup#22 0x00005555563cb874 in emit_new_objfile_event (objfile=0x555558761700) at src/gdb/python/py-newobjfileevent.c:52 openhwgroup#23 0x00005555563b53bc in python_new_objfile (objfile=0x555558761700) at src/gdb/python/py-inferior.c:195 openhwgroup#24 0x0000555555d6dff0 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61 openhwgroup#25 0x0000555555d6be18 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111 openhwgroup#26 0x0000555555d69661 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffd080: 0x555558761700) at /usr/include/c++/11/bits/std_function.h:290 openhwgroup#27 0x0000555556314caf in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555585b5860, __args#0=0x555558761700) at /usr/include/c++/11/bits/std_function.h:590 openhwgroup#28 0x000055555631444e in gdb::observers::observable<objfile*>::notify (this=0x55555836eea0 <gdb::observers::new_objfile>, args#0=0x555558761700) at src/gdb/../gdbsupport/observable.h:166 openhwgroup#29 0x0000555556599b3f in symbol_file_add_with_addrs (abfd=..., name=0x55555875d310 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1125 openhwgroup#30 0x0000555556599ca4 in symbol_file_add_from_bfd (abfd=..., name=0x55555875d310 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1160 openhwgroup#31 0x0000555556546371 in solib_read_symbols (so=..., flags=...) at src/gdb/solib.c:692 openhwgroup#32 0x0000555556546f0f in solib_add (pattern=0x0, from_tty=0, readsyms=1) at src/gdb/solib.c:1015 openhwgroup#33 0x0000555556539891 in enable_break (info=0x55555874e180, from_tty=0) at src/gdb/solib-svr4.c:2416 openhwgroup#34 0x000055555653b305 in svr4_solib_create_inferior_hook (from_tty=0) at src/gdb/solib-svr4.c:3058 openhwgroup#35 0x0000555556547cee in solib_create_inferior_hook (from_tty=0) at src/gdb/solib.c:1217 openhwgroup#36 0x0000555556196f6a in post_create_inferior (from_tty=0) at src/gdb/infcmd.c:275 openhwgroup#37 0x0000555556197670 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at src/gdb/infcmd.c:486 openhwgroup#38 0x000055555619783f in run_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:512 openhwgroup#39 0x0000555555f0798d in do_simple_func (args=0x0, from_tty=1, c=0x555558567510) at src/gdb/cli/cli-decode.c:95 openhwgroup#40 0x0000555555f0d89a in cmd_func (cmd=0x555558567510, args=0x0, from_tty=1) at src/gdb/cli/cli-decode.c:2735 openhwgroup#41 0x000055555661fb42 in execute_command (p=0x7fffffffe2c4 "", from_tty=1) at src/gdb/top.c:575 openhwgroup#42 0x000055555626303b in catch_command_errors (command=0x55555661f4ab <execute_command(char const*, int)>, arg=0x7fffffffe2c1 "run", from_tty=1, do_bp_actions=true) at src/gdb/main.c:513 openhwgroup#43 0x000055555626328a in execute_cmdargs (cmdarg_vec=0x7fffffffdaf0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffda3c) at src/gdb/main.c:612 openhwgroup#44 0x0000555556264849 in captured_main_1 (context=0x7fffffffdd40) at src/gdb/main.c:1293 openhwgroup#45 0x0000555556264a7f in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1314 openhwgroup#46 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 openhwgroup#47 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) The use of the "target remote" command here is just an example. In principle, we would reproduce the problem with any command that triggers an invocation of `normal_stop`. To omit enabling the stdin in `normal_stop`, we would have to check the context we are in. Since we cannot do that, we add a new field to `struct ui` to track whether the prompt was already blocked, and set the tracker flag in the Python context before executing a GDB command. After applying this patch, the output becomes ... Reading symbols from a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 [New inferior 2] Added inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] stdin/stdout redirected Process /tmp/a.out created; pid = 3032261 Remote debugging using stdio Reading /tmp/a.out from remote target... ... [Switching to inferior 1 [process 3032255] (/tmp/a.out)] [Switching to thread 1.1 (process 3032255)] #0 0x00007ffff7fe3290 in ?? () from /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 1 (process 3032255) exited normally] (gdb) Let's now consider a secondary scenario, where the command executed from the Python raises an error. As an example, suppose we have the Python file below: def handle_new_objfile_event(self, event): ... print("loading " + event.new_objfile.filename) self.processed_objfile = True gdb.execute('print a') The executed command, "print a", gives an error because "a" is not defined. Without this patch, we see the behavior below, where the prompt is again placed incorrectly: ... Reading symbols from /tmp/a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 Python Exception <class 'gdb.error'>: No symbol "a" in current context. (gdb) [Inferior 1 (process 3980401) exited normally] This time, `async_enable_stdin` is called from the 'catch' block in `execute_gdb_command`: (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 openhwgroup#1 0x0000555556400f0a in execute_gdb_command (self=0x7ffff43b5d00, args=0x7ffff440ab60, kw=0x0) at src/gdb/python/python.c:713 openhwgroup#2 0x00007ffff7a96023 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#3 0x00007ffff7a4dadc in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#4 0x00007ffff79e9a1c in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#5 0x00007ffff7b303af in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#6 0x00007ffff7a50358 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#7 0x00007ffff7a4f3f4 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#8 0x00007ffff7a4f883 in PyObject_CallFunctionObjArgs () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 openhwgroup#9 0x00005555563a9758 in evpy_emit_event (event=0x7ffff42b5430, registry=0x7ffff42b4690) at src/gdb/python/py-event.c:104 openhwgroup#10 0x00005555563cb874 in emit_new_objfile_event (objfile=0x555558761410) at src/gdb/python/py-newobjfileevent.c:52 openhwgroup#11 0x00005555563b53bc in python_new_objfile (objfile=0x555558761410) at src/gdb/python/py-inferior.c:195 openhwgroup#12 0x0000555555d6dff0 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61 openhwgroup#13 0x0000555555d6be18 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111 openhwgroup#14 0x0000555555d69661 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffd080: 0x555558761410) at /usr/include/c++/11/bits/std_function.h:290 openhwgroup#15 0x0000555556314caf in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555585b5860, __args#0=0x555558761410) at /usr/include/c++/11/bits/std_function.h:590 openhwgroup#16 0x000055555631444e in gdb::observers::observable<objfile*>::notify (this=0x55555836eea0 <gdb::observers::new_objfile>, args#0=0x555558761410) at src/gdb/../gdbsupport/observable.h:166 openhwgroup#17 0x0000555556599b3f in symbol_file_add_with_addrs (abfd=..., name=0x55555875d020 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1125 openhwgroup#18 0x0000555556599ca4 in symbol_file_add_from_bfd (abfd=..., name=0x55555875d020 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1160 openhwgroup#19 0x0000555556546371 in solib_read_symbols (so=..., flags=...) at src/gdb/solib.c:692 openhwgroup#20 0x0000555556546f0f in solib_add (pattern=0x0, from_tty=0, readsyms=1) at src/gdb/solib.c:1015 openhwgroup#21 0x0000555556539891 in enable_break (info=0x55555874a670, from_tty=0) at src/gdb/solib-svr4.c:2416 openhwgroup#22 0x000055555653b305 in svr4_solib_create_inferior_hook (from_tty=0) at src/gdb/solib-svr4.c:3058 openhwgroup#23 0x0000555556547cee in solib_create_inferior_hook (from_tty=0) at src/gdb/solib.c:1217 openhwgroup#24 0x0000555556196f6a in post_create_inferior (from_tty=0) at src/gdb/infcmd.c:275 openhwgroup#25 0x0000555556197670 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at src/gdb/infcmd.c:486 openhwgroup#26 0x000055555619783f in run_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:512 openhwgroup#27 0x0000555555f0798d in do_simple_func (args=0x0, from_tty=1, c=0x555558567510) at src/gdb/cli/cli-decode.c:95 openhwgroup#28 0x0000555555f0d89a in cmd_func (cmd=0x555558567510, args=0x0, from_tty=1) at src/gdb/cli/cli-decode.c:2735 openhwgroup#29 0x000055555661fb42 in execute_command (p=0x7fffffffe2c4 "", from_tty=1) at src/gdb/top.c:575 openhwgroup#30 0x000055555626303b in catch_command_errors (command=0x55555661f4ab <execute_command(char const*, int)>, arg=0x7fffffffe2c1 "run", from_tty=1, do_bp_actions=true) at src/gdb/main.c:513 openhwgroup#31 0x000055555626328a in execute_cmdargs (cmdarg_vec=0x7fffffffdaf0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffda3c) at src/gdb/main.c:612 openhwgroup#32 0x0000555556264849 in captured_main_1 (context=0x7fffffffdd40) at src/gdb/main.c:1293 openhwgroup#33 0x0000555556264a7f in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1314 openhwgroup#34 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 openhwgroup#35 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) Again, after we enable stdin, GDB continues with its normal flow of the 'run' command and receives the inferior's exit event, where it would have enabled stdin, if we had not done it prematurely. (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 openhwgroup#1 0x00005555561c3acd in normal_stop () at src/gdb/infrun.c:9432 openhwgroup#2 0x00005555561b5bf1 in fetch_inferior_event () at src/gdb/infrun.c:4700 openhwgroup#3 0x000055555618d6a7 in inferior_event_handler (event_type=INF_REG_EVENT) at src/gdb/inf-loop.c:42 openhwgroup#4 0x000055555620ecdb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4316 openhwgroup#5 0x0000555556f33035 in handle_file_event (file_ptr=0x5555587024e0, ready_mask=1) at src/gdbsupport/event-loop.cc:573 openhwgroup#6 0x0000555556f3362f in gdb_wait_for_event (block=0) at src/gdbsupport/event-loop.cc:694 openhwgroup#7 0x0000555556f322cd in gdb_do_one_event (mstimeout=-1) at src/gdbsupport/event-loop.cc:217 openhwgroup#8 0x0000555556262df8 in start_event_loop () at src/gdb/main.c:407 openhwgroup#9 0x0000555556262f85 in captured_command_loop () at src/gdb/main.c:471 openhwgroup#10 0x0000555556264a84 in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1324 openhwgroup#11 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 openhwgroup#12 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) The solution implemented by this patch addresses the problem. After applying the patch, the output becomes $ gdb -q -ex "source file.py" -ex "run" --args a.out Reading symbols from /tmp/a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 Python Exception <class 'gdb.error'>: No symbol "a" in current context. [Inferior 1 (process 3984511) exited normally] (gdb) Regression-tested on X86_64 Linux using the default board file (i.e. unix). Co-Authored-By: Oguzhan Karakaya <oguzhan.karakaya@intel.com> Reviewed-By: Guinevere Larsen <blarsen@redhat.com> Approved-By: Tom Tromey <tom@tromey.com>

This started with a Red Hat bug report which can be seen here: https://bugzilla.redhat.com/show_bug.cgi?id=1850710 The problem reported here was using GDB on GNU/Linux for S390, the user stepped into JIT generated code. As they enter the JIT code GDB would report 'PC not saved', and this same message would be reported after each step/stepi. Additionally, the user had 'set disassemble-next-line on', and once they entered the JIT code this output was not displayed, nor were any 'display' directives displayed. The user is not making use of the JIT plugin API to provide debug information. But that's OK, they aren't expecting any source level debug here, they are happy to use 'stepi', but the missing 'display' directives are a problem, as is the constant 'PC not saved' (error) message. What is happening here is that as GDB is failing to find any debug information for the JIT generated code, it is falling back on to the S390 prologue unwinder to try and unwind frame #0. Unfortunately, without being able to identify the function boundaries, the S390 prologue scanner can't help much, in fact, it doesn't even suggest an arbitrary previous $pc value (some targets that use a link-register will, by default, assume the link-register contains the previous $pc), instead the S390 will just say, "sorry, I have no previous $pc value". The result of this is that when GDB tries to find frame openhwgroup#1 we end throwing an error from frame_unwind_pc (the 'PC not saved' error). This error is not caught anywhere except at the top-level interpreter loop, and so we end up skipping all the 'display' directive handling. While thinking about this, I wondered, could I trigger the same error using the Python Unwinder API? What happens if a Python unwinder claims a frame, but then fails to provide a previous $pc value? Turns out that exactly the same thing happens, which is great, as that means we now have a way to reproduce this bug on any target. And so the test included with this patch does just this. I have a Python unwinder that claims a frame, but doesn't provide any previous register values. I then do two tests, first I stop in the claimed frame (i.e. frame #0 is the frame that can't be unwound), I perform a few steps, and check the backtrace. And second, I stop in a child of the problem frame (i.e. frame openhwgroup#1 is the frame that can't be unwound), and from here I check the backtrace. While all this is going on I have a 'display' directive in place, and each time GDB stops I check that the display directive triggers. Additionally, when checking the backtrace, I am checking that the backtrace finishes with the message 'Backtrace stopped: frame did not save the PC'. As for the fix I chose to add a call to frame_unwind_pc directly to get_prev_frame_always_1. Calling frame_unwind_pc will cache the unwound $pc value, so this doesn't add much additional work as immediately after the new frame_unwind_pc call, we call get_prev_frame_maybe_check_cycle, which actually generates the previous frame, which will always (I think) require a call to frame_unwind_pc anyway. The reason for adding the frame_unwind_pc call into get_prev_frame_always_1, is that if the frame_unwind_pc call fails we want to set the frames 'stop_reason', and get_prev_frame_always_1 seems to be the place where this is done, so I wanted to keep the new stop_reason setting code next to all the existing stop_reason setting code. Additionally, once we enter get_prev_frame_maybe_check_cycle we actually create the previous frame, then, if it turns out that the previous frame can't be created we need to remove the frame .. this seemed more complex than just making the check in get_prev_frame_always_1. With this fix in place the original S390 bug is fixed, and also the test added in this commit, that uses the Python API, is also fixed. Reviewed-By: Kevin Buettner <kevinb@redhat.com>

This commit fixes bug PR 28942, that is, creating a conditional breakpoint in a multi-threaded inferior, where the breakpoint condition includes an inferior function call. Currently, when a user tries to create such a breakpoint, then GDB will fail with: (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ()) Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2460150)] [New Thread 0x7ffff745c700 (LWP 2460151)] [New Thread 0x7ffff6c5b700 (LWP 2460152)] [New Thread 0x7ffff645a700 (LWP 2460153)] [New Thread 0x7ffff5c59700 (LWP 2460154)] Error in testing breakpoint condition: Couldn't get registers: No such process. An error occurred while in a function called from GDB. Evaluation of the expression containing the function (return_true) will be abandoned. When the function is done executing, GDB will silently stop. Selected thread is running. (gdb) Or, in some cases, like this: (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1)) Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2461106)] [New Thread 0x7ffff745c700 (LWP 2461107)] ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. The precise error depends on the exact thread state; so there's race conditions depending on which threads have fully started, and which have not. But the underlying problem is always the same; when GDB tries to execute the inferior function call from within the breakpoint condition, GDB will, incorrectly, try to resume threads that are already running - GDB doesn't realise that some threads might already be running. The solution proposed in this patch requires an additional member variable thread_info::in_cond_eval. This flag is set to true (in breakpoint.c) when GDB is evaluating a breakpoint condition. In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is true, then GDB will only try to resume the current thread, that is, the thread for which the breakpoint condition is being evaluated. This solves the problem of GDB trying to resume threads that are already running. The next problem is that inferior function calls are assumed to be synchronous, that is, GDB doesn't expect to start an inferior function call in thread #1, then receive a stop from thread #2 for some other, unrelated reason. To prevent GDB responding to an event from another thread, we update fetch_inferior_event and do_target_wait in infrun.c, so that, when an inferior function call (on behalf of a breakpoint condition) is in progress, we only wait for events from the current thread (the one evaluating the condition). In do_target_wait I had to change the inferior_matches lambda function, which is used to select which inferior to wait on. Previously the logic was this: auto inferior_matches = [&wait_ptid] (inferior *inf) { return (inf->process_target () != nullptr && ptid_t (inf->pid).matches (wait_ptid)); }; This compares the pid of the inferior against the complete ptid we want to wait on. Before this commit wait_ptid was only ever minus_one_ptid (which is special, and means any process), and so every inferior would match. After this commit though wait_ptid might represent a specific thread in a specific inferior. If we compare the pid of the inferior to a specific ptid then these will not match. The fix is to compare against the pid extracted from the wait_ptid, not against the complete wait_ptid itself. In fetch_inferior_event, after receiving the event, we only want to stop all the other threads, and call inferior_event_handler with INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint. If we are, then all the other threads should be left doing whatever they were before. The inferior_event_handler call will be performed once the breakpoint condition has finished being evaluated, and GDB decides to stop or not. The final problem that needs solving relates to GDB's commit-resume mechanism, which allows GDB to collect resume requests into a single packet in order to reduce traffic to a remote target. The problem is that the commit-resume mechanism will not send any resume requests for an inferior if there are already events pending on the GDB side. Imagine an inferior with two threads. Both threads hit a breakpoint, maybe the same conditional breakpoint. At this point there are two pending events, one for each thread. GDB selects one of the events and spots that this is a conditional breakpoint, GDB evaluates the condition. The condition includes an inferior function call, so GDB sets up for the call and resumes the one thread, the resume request is added to the commit-resume queue. When the commit-resume queue is committed GDB sees that there is a pending event from another thread, and so doesn't send any resume requests to the actual target, GDB is assuming that when we wait we will select the event from the other thread. However, as this is an inferior function call for a condition evaluation, we will not select the event from the other thread, we only care about events from the thread that is evaluating the condition - and the resume for this thread was never sent to the target. And so, GDB hangs, waiting for an event from a thread that was never fully resumed. To fix this issue I have added the concept of "forcing" the commit-resume queue. When enabling commit resume, if the force flag is true, then any resumes will be committed to the target, even if there are other threads with pending events. A note on authorship: this patch was based on some work done by Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made some changes to their work in this version. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942 [1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com> Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>

…ro linux When running test-case gdb.threads/attach-stopped.exp on aarch64-linux, using the manjaro linux distro, I get: ... (gdb) thread apply all bt^M ^M Thread 2 (Thread 0xffff8d8af120 (LWP 278116) "attach-stopped"):^M #0 0x0000ffff8d964864 in clock_nanosleep () from /usr/lib/libc.so.6^M #1 0x0000ffff8d969cac in nanosleep () from /usr/lib/libc.so.6^M #2 0x0000ffff8d969b68 in sleep () from /usr/lib/libc.so.6^M #3 0x0000aaaade370828 in func (arg=0x0) at attach-stopped.c:29^M #4 0x0000ffff8d930aec in ?? () from /usr/lib/libc.so.6^M #5 0x0000ffff8d99a5dc in ?? () from /usr/lib/libc.so.6^M ^M Thread 1 (Thread 0xffff8db62020 (LWP 278111) "attach-stopped"):^M #0 0x0000ffff8d92d2d8 in ?? () from /usr/lib/libc.so.6^M #1 0x0000ffff8d9324b8 in ?? () from /usr/lib/libc.so.6^M #2 0x0000aaaade37086c in main () at attach-stopped.c:45^M (gdb) FAIL: gdb.threads/attach-stopped.exp: threaded: attach2 to stopped bt ... The problem is that the test-case expects to see start_thread: ... gdb_test "thread apply all bt" ".*sleep.*start_thread.*" \ "$threadtype: attach2 to stopped bt" ... but lack of symbols makes that impossible. Fix this by allowing " in ?? () from " as well. Tested on aarch64-linux. PR testsuite/31451 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31451

When -fsanitize=address,undefined is used to build, the mmap configure check failed with ================================================================= ==231796==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4096 byte(s) in 1 object(s) allocated from: #0 0x7cdd3d0defdf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5750c7f6d72b in main /home/alan/build/gas-san/all/bfd/conftest.c:239 Direct leak of 4096 byte(s) in 1 object(s) allocated from: #0 0x7cdd3d0defdf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5750c7f6d2e1 in main /home/alan/build/gas-san/all/bfd/conftest.c:190 SUMMARY: AddressSanitizer: 8192 byte(s) leaked in 2 allocation(s). Define GCC_AC_FUNC_MMAP with export ASAN_OPTIONS=detect_leaks=0 to avoid the sanitizer configure check failure. config/ * mmap.m4 (GCC_AC_FUNC_MMAP): New. * no-executables.m4 (AC_FUNC_MMAP): Renamed to GCC_AC_FUNC_MMAP. Change AC_FUNC_MMAP to GCC_AC_FUNC_MMAP. libiberty/ * Makefile.in (aclocal_deps): Add $(srcdir)/../config/mmap.m4. * acinclude.m4: Change AC_FUNC_MMAP to GCC_AC_FUNC_MMAP. * aclocal.m4: Regenerated. * configure: Likewise. zlib/ * acinclude.m4: Include ../config/mmap.m4. * Makefile.in: Regenerated. * configure: Likewise.

When -fsanitize=address,undefined is used to build, the mmap configure check failed with ================================================================= ==231796==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4096 byte(s) in 1 object(s) allocated from: #0 0x7cdd3d0defdf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5750c7f6d72b in main /home/alan/build/gas-san/all/bfd/conftest.c:239 Direct leak of 4096 byte(s) in 1 object(s) allocated from: #0 0x7cdd3d0defdf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 #1 0x5750c7f6d2e1 in main /home/alan/build/gas-san/all/bfd/conftest.c:190 SUMMARY: AddressSanitizer: 8192 byte(s) leaked in 2 allocation(s). Replace AC_FUNC_MMAP with GCC_AC_FUNC_MMAP to avoid the sanitizer configure check failure. bfd/ * configure.ac: Replace AC_FUNC_MMAP with GCC_AC_FUNC_MMAP. * Makefile.in: Regenerated. * aclocal.m4: Likewise. * configure: Likewise. binutils/ * configure.ac: Replace AC_FUNC_MMAP with GCC_AC_FUNC_MMAP. * Makefile.in: Regenerated. * aclocal.m4: Likewise. * configure: Likewise. ld/ * configure.ac: Replace AC_FUNC_MMAP with GCC_AC_FUNC_MMAP. * Makefile.in: Regenerated. * aclocal.m4: Likewise. * configure: Likewise. libctf/ * configure.ac: Replace AC_FUNC_MMAP with GCC_AC_FUNC_MMAP. * Makefile.in: Regenerated. * aclocal.m4: Likewise. * configure: Likewise. libsframe/ * configure.ac: Replace AC_FUNC_MMAP with GCC_AC_FUNC_MMAP. * Makefile.in: Regenerated. * aclocal.m4: Likewise. * configure: Likewise.

After installing glibc debuginfo, I ran into: ... FAIL: gdb.threads/threadcrash.exp: test_live_inferior: \ $thread_count == [llength $test_list] ... This happens because the clause: ... -re "^\r\n${hs}main$hs$eol" { ... which is intended to match only: ... #1 <hex> in main () at threadcrash.c:423^M ... also matches "remaining" in: ... #1 <hex> in __GI___nanosleep (requested_time=<hex>, remaining=<hex>) at \ nanosleep.c:27^M ... Fix this by checking for "in main" instead. Tested on x86_64-linux.

When running test-case gdb.server/connect-with-no-symbol-file.exp on aarch64-linux (specifically, an opensuse leap 15.5 container on a fedora asahi 39 system), I run into: ... (gdb) detach^M Detaching from program: target:connect-with-no-symbol-file, process 185104^M Ending remote debugging.^M terminate called after throwing an instance of 'gdb_exception_error'^M ... The detailed backtrace of the corefile is: ... (gdb) bt #0 0x0000ffff75504f54 in raise () from /lib64/libpthread.so.0 #1 0x00000000007a86b4 in handle_fatal_signal (sig=6) at gdb/event-top.c:926 #2 <signal handler called> #3 0x0000ffff74b977b4 in raise () from /lib64/libc.so.6 #4 0x0000ffff74b98c18 in abort () from /lib64/libc.so.6 #5 0x0000ffff74ea26f4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #6 0x0000ffff74ea011c in ?? () from /usr/lib64/libstdc++.so.6 #7 0x0000ffff74ea0180 in std::terminate() () from /usr/lib64/libstdc++.so.6 #8 0x0000ffff74ea0464 in __cxa_throw () from /usr/lib64/libstdc++.so.6 #9 0x0000000001548870 in throw_it (reason=RETURN_ERROR, error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...) at gdbsupport/common-exceptions.cc:203 #10 0x0000000001548920 in throw_verror (error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...) at gdbsupport/common-exceptions.cc:211 #11 0x0000000001548a00 in throw_error (error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed") at gdbsupport/common-exceptions.cc:226 #12 0x0000000000ac8f2c in remote_target::readchar (this=0x233d3d90, timeout=2) at gdb/remote.c:9856 #13 0x0000000000ac9f04 in remote_target::getpkt (this=0x233d3d90, buf=0x233d40a8, forever=false, is_notif=0x0) at gdb/remote.c:10326 #14 0x0000000000acf3d0 in remote_target::remote_hostio_send_command (this=0x233d3d90, command_bytes=13, which_packet=17, remote_errno=0xfffff1a3cf38, attachment=0xfffff1a3ce88, attachment_len=0xfffff1a3ce90) at gdb/remote.c:12567 #15 0x0000000000ad03bc in remote_target::fileio_fstat (this=0x233d3d90, fd=3, st=0xfffff1a3d020, remote_errno=0xfffff1a3cf38) at gdb/remote.c:12979 #16 0x0000000000c39878 in target_fileio_fstat (fd=0, sb=0xfffff1a3d020, target_errno=0xfffff1a3cf38) at gdb/target.c:3315 #17 0x00000000007eee5c in target_fileio_stream::stat (this=0x233d4400, abfd=0x2323fc40, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:467 #18 0x00000000007f012c in <lambda(bfd*, void*, stat*)>::operator()(bfd *, void *, stat *) const (__closure=0x0, abfd=0x2323fc40, stream=0x233d4400, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:955 #19 0x00000000007f015c in <lambda(bfd*, void*, stat*)>::_FUN(bfd *, void *, stat *) () at gdb/gdb_bfd.c:956 #20 0x0000000000f9b838 in opncls_bstat (abfd=0x2323fc40, sb=0xfffff1a3d020) at bfd/opncls.c:665 #21 0x0000000000f90adc in bfd_stat (abfd=0x2323fc40, statbuf=0xfffff1a3d020) at bfd/bfdio.c:431 #22 0x000000000065fe20 in reopen_exec_file () at gdb/corefile.c:52 #23 0x0000000000c3a3e8 in generic_mourn_inferior () at gdb/target.c:3642 #24 0x0000000000abf3f0 in remote_unpush_target (target=0x233d3d90) at gdb/remote.c:6067 #25 0x0000000000aca8b0 in remote_target::mourn_inferior (this=0x233d3d90) at gdb/remote.c:10587 #26 0x0000000000c387cc in target_mourn_inferior ( ptid=<error reading variable: Cannot access memory at address 0x2d310>) at gdb/target.c:2738 #27 0x0000000000abfff0 in remote_target::remote_detach_1 (this=0x233d3d90, inf=0x22fce540, from_tty=1) at gdb/remote.c:6421 #28 0x0000000000ac0094 in remote_target::detach (this=0x233d3d90, inf=0x22fce540, from_tty=1) at gdb/remote.c:6436 #29 0x0000000000c37c3c in target_detach (inf=0x22fce540, from_tty=1) at gdb/target.c:2526 #30 0x0000000000860424 in detach_command (args=0x0, from_tty=1) at gdb/infcmd.c:2817 #31 0x000000000060b594 in do_simple_func (args=0x0, from_tty=1, c=0x231431a0) at gdb/cli/cli-decode.c:94 #32 0x00000000006108c8 in cmd_func (cmd=0x231431a0, args=0x0, from_tty=1) at gdb/cli/cli-decode.c:2741 #33 0x0000000000c65a94 in execute_command (p=0x232e52f6 "", from_tty=1) at gdb/top.c:570 #34 0x00000000007a7d2c in command_handler (command=0x232e52f0 "") at gdb/event-top.c:566 #35 0x00000000007a8290 in command_line_handler (rl=...) at gdb/event-top.c:802 #36 0x0000000000c9092c in tui_command_line_handler (rl=...) at gdb/tui/tui-interp.c:103 #37 0x00000000007a750c in gdb_rl_callback_handler (rl=0x23385330 "detach") at gdb/event-top.c:258 #38 0x0000000000d910f4 in rl_callback_read_char () at readline/readline/callback.c:290 #39 0x00000000007a7338 in gdb_rl_callback_read_char_wrapper_noexcept () at gdb/event-top.c:194 #40 0x00000000007a73f0 in gdb_rl_callback_read_char_wrapper (client_data=0x22fbf640) at gdb/event-top.c:233 #41 0x0000000000cbee1c in stdin_event_handler (error=0, client_data=0x22fbf640) at gdb/ui.c:154 #42 0x000000000154ed60 in handle_file_event (file_ptr=0x232be730, ready_mask=1) at gdbsupport/event-loop.cc:572 #43 0x000000000154f21c in gdb_wait_for_event (block=1) at gdbsupport/event-loop.cc:693 #44 0x000000000154dec4 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:263 #45 0x0000000000910f98 in start_event_loop () at gdb/main.c:400 #46 0x0000000000911130 in captured_command_loop () at gdb/main.c:464 #47 0x0000000000912b5c in captured_main (data=0xfffff1a3db58) at gdb/main.c:1338 #48 0x0000000000912bf4 in gdb_main (args=0xfffff1a3db58) at gdb/main.c:1357 #49 0x00000000004170f4 in main (argc=10, argv=0xfffff1a3dcc8) at gdb/gdb.c:38 (gdb) ... The abort happens because a c++ exception escapes to c code, specifically opncls_bstat in bfd/opncls.c. Compiling with -fexceptions works around this. Fix this by catching the exception just before it escapes, in stat_trampoline and likewise in few similar spot. Add a new template catch_exceptions to do so in a consistent way. Tested on aarch64-linux. Approved-by: Pedro Alves <pedro@palves.net> PR remote/31577 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31577

If threads are disabled, either by --disable-threading explicitely, or by missing std::thread support, you get the following ASAN error when loading symbols: ==7310==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000002128 at pc 0x00000098794a bp 0x7ffe37e6af70 sp 0x7ffe37e6af68 READ of size 1 at 0x614000002128 thread T0 #0 0x987949 in index_cache_store_context::store() const ../../gdb/dwarf2/index-cache.c:163 #1 0x943467 in cooked_index_worker::write_to_cache(cooked_index const*, deferred_warnings*) const ../../gdb/dwarf2/cooked-index.c:601 #2 0x1705e39 in std::function<void ()>::operator()() const /gcc/9/include/c++/9.2.0/bits/std_function.h:690 #3 0x1705e39 in gdb::task_group::impl::~impl() ../../gdbsupport/task-group.cc:38 0x614000002128 is located 232 bytes inside of 408-byte region [0x614000002040,0x6140000021d8) freed by thread T0 here: #0 0x7fd75ccf8ea5 in operator delete(void*, unsigned long) ../../.././libsanitizer/asan/asan_new_delete.cc:177 #1 0x9462e5 in cooked_index::index_for_writing() ../../gdb/dwarf2/cooked-index.h:689 #2 0x9462e5 in operator() ../../gdb/dwarf2/cooked-index.c:657 #3 0x9462e5 in _M_invoke /gcc/9/include/c++/9.2.0/bits/std_function.h:300 It's happening because cooked_index_worker::wait always returns true in this case, which tells cooked_index::wait it can delete the m_state cooked_index_worker member, but cooked_index_worker::write_to_cache tries to access it immediately afterwards. Fixed by making cooked_index_worker::wait only return true if desired_state is CACHE_DONE, same as if threading was enabled, so m_state will not be prematurely deleted. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31694 Approved-By: Tom Tromey <tom@tromey.com>

jeremybennett added the bug Something isn't working label Aug 17, 2020

MaryBennett referenced this issue in MaryBennett/corev-binutils-gdb Sep 7, 2020

Merge pull request #1 from MaryBennett/mb-development-wip

bdf8ee8

WIP: Added support for PULP hardware loop

pz9115 referenced this issue in plctlab/corev-binutils-gdb Jan 25, 2022

Merge pull request #1 from linsinan1995/development

f41ccb7

zcee: add option flag & testcases

jeremybennett closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDB needs a sane default set of CSRs in the absence of a target description #1

GDB needs a sane default set of CSRs in the absence of a target description #1

jeremybennett commented Aug 17, 2020

jeremybennett commented Apr 28, 2023

jeremybennett commented Apr 28, 2023 •

edited

Loading

GDB needs a sane default set of CSRs in the absence of a target description #1

GDB needs a sane default set of CSRs in the absence of a target description #1

Comments

jeremybennett commented Aug 17, 2020

jeremybennett commented Apr 28, 2023

jeremybennett commented Apr 28, 2023 • edited Loading

jeremybennett commented Apr 28, 2023 •

edited

Loading