-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement method calls #3
Comments
Simple method calls are done. Trait calls require some debuginfo fixes in rustc. |
immediate_quit used to be necessary back when prompt_for_continue used blocking fread, but nowadays it uses gdb_readline_wrapper, which is implemented in terms of a nested event loop, which already knows how to react to SIGINT: #0 throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:324 #1 0x00000000007bab5d in throw_vquit (fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:366 #2 0x00000000007bac9f in throw_quit (fmt=0x9d6d7e "Quit") at .../src/gdb/common/common-exceptions.c:385 #3 0x0000000000773a2d in quit () at .../src/gdb/utils.c:1039 #4 0x000000000065d81b in async_request_quit (arg=0x0) at .../src/gdb/event-top.c:893 #5 0x000000000065c27b in invoke_async_signal_handlers () at .../src/gdb/event-loop.c:949 #6 0x000000000065aeef in gdb_do_one_event () at .../src/gdb/event-loop.c:280 #7 0x0000000000770838 in gdb_readline_wrapper (prompt=0x7fffffffcd40 "---Type <return> to continue, or q <return> to quit---") at .../src/gdb/top.c:873 The need for the QUIT in stdin_event_handler is then exposed by the gdb.base/double-prompt-target-event-error.exp test, which has: # We're now stopped in a pagination query while handling a # target event (printing where the program stopped). Quitting # the pagination should result in only one prompt being # output. send_gdb "\003p 1\n" Without that change we'd get: Continuing. ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination ^CpQuit (gdb) 1 Undefined command: "1". Try "help". (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt ERROR: Undefined command "". UNRESOLVED: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt Vs: Continuing. ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination ^CQuit (gdb) p 1 $1 = 1 (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt gdb/ChangeLog: 2016-04-12 Pedro Alves <palves@redhat.com> * event-top.c (stdin_event_handler): Call QUIT; (prompt_for_continue): Don't run with immediate_quit set.
I can't seem to get simple method calls to work. For a simple method call I would expect that Here is the struct+impl: pub struct Strtab<'a> {
bytes: &'a [u8],
}
// get_str is here
impl<'a> Strtab<'a> {
pub fn get(&self, idx: usize) -> &'a str {
get_str(idx, self.bytes)
} Here is the gdb transcript:
Or perhaps I'm misunderstanding? |
What version of rustc are you using? |
I'm on nightly rustc 1.10 in this case. |
Nowadays, read_memory may throw NOT_AVAILABLE_ERROR (it is done by patch http://sourceware.org/ml/gdb-patches/2013-08/msg00625.html) however, read_stack and read_code still throws MEMORY_ERROR only. This causes PR 19947, that is prologue unwinder is unable unwind because code memory isn't available, but MEMORY_ERROR is thrown, while unwinder catches NOT_AVAILABLE_ERROR. #0 memory_error (err=err@entry=TARGET_XFER_E_IO, memaddr=memaddr@entry=140737349781158) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:217 tromey#1 0x000000000065f5ba in read_code (memaddr=memaddr@entry=140737349781158, myaddr=myaddr@entry=0x7fffffffd7b0 "\340\023<\001", len=len@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:288 tromey#2 0x000000000065f7b5 in read_code_unsigned_integer (memaddr=memaddr@entry=140737349781158, len=len@entry=1, byte_order=byte_order@entry=BFD_ENDIAN_LITTLE) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:363 tromey#3 0x00000000004717e0 in amd64_analyze_prologue (gdbarch=gdbarch@entry=0x13c13e0, pc=140737349781158, current_pc=140737349781165, cache=cache@entry=0xda0cb0) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2267 tromey#4 0x0000000000471f6d in amd64_frame_cache_1 (cache=0xda0cb0, this_frame=0xda0bf0) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2437 tromey#5 amd64_frame_cache (this_frame=0xda0bf0, this_cache=<optimised out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2508 tromey#6 0x000000000047214d in amd64_frame_this_id (this_frame=<optimised out>, this_cache=<optimised out>, this_id=0xda0c50) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2541 tromey#7 0x00000000006b94c4 in compute_frame_id (fi=0xda0bf0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:481 tromey#8 get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1809 tromey#9 0x00000000006bb6c9 in get_prev_frame_always_1 (this_frame=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1983 tromey#10 get_prev_frame_always (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1999 tromey#11 0x00000000006bbe11 in get_prev_frame (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2241 tromey#12 0x00000000006bc13c in unwind_to_current_frame (ui_out=<optimised out>, args=args@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1485 The fix is to let read_stack and read_code throw NOT_AVAILABLE_ERROR too, in order to align with read_memory. gdb: 2016-05-04 Yao Qi <yao.qi@linaro.org> PR gdb/19947 * corefile.c (read_memory): Rename it to ... (read_memory_object): ... it. Add parameter object. (read_memory): Call read_memory_object. (read_stack): Likewise. (read_code): Likewise.
As reported in PR 19998, after type ctrl-c, GDB hang there and does not send interrupt. It causes a fail in gdb.base/interrupt.exp. All targets support remote fileio should be affected. When we type ctrc-c, SIGINT is handled by remote_fileio_sig_set, as shown below, #0 remote_fileio_sig_set (sigint_func=0x4495d0 <remote_fileio_ctrl_c_signal_handler(int)>) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:325 tromey#1 0x00000000004495de in remote_fileio_ctrl_c_signal_handler (signo=<optimised out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:349 tromey#2 <signal handler called> tromey#3 0x00007ffff647ed83 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 tromey#4 0x00000000005530ce in interruptible_select (n=10, readfds=readfds@entry=0x7fffffffd730, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x0) at /home/yao/SourceCode/gnu/gdb/git/gdb/event-top.c:1017 tromey#5 0x000000000061ab20 in stdio_file_read (file=<optimised out>, buf=0x12d02e0 "\n\022-\001", length_buf=16383) at /home/yao/SourceCode/gnu/gdb/git/gdb/ui-file.c:577 tromey#6 0x000000000044a4dc in remote_fileio_func_read (buf=0x12c0360 "") at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:583 tromey#7 0x0000000000449598 in do_remote_fileio_request (uiout=<optimised out>, buf_arg=buf_arg@entry=0x12c0340) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:1179 we don't set quit_serial_event, do { res = gdb_select (n, readfds, writefds, exceptfds, timeout); } while (res == -1 && errno == EINTR); if (res == 1 && FD_ISSET (fd, readfds)) { errno = EINTR; return -1; } return res; we can't go out of the loop above, and that is why GDB can't send interrupt. Recently, we stop throwing exception from SIGINT handler (remote_fileio_ctrl_c_signal_handler) https://sourceware.org/ml/gdb-patches/2016-03/msg00372.html, which is correct, because gdb_select is interruptible. However, in the same patch series, we add interruptible_select later as a wrapper to gdb_select, https://sourceware.org/ml/gdb-patches/2016-03/msg00375.html and it is not interruptible (because of the loop in it) unless select/poll-able file descriptors are marked. This fix in this patch is to call quit_serial_event_set, so that we can go out of the loop above, return -1 and set errno to EINTR. 2016-06-01 Yao Qi <yao.qi@linaro.org> PR remote/19998 * remote-fileio.c (remote_fileio_ctrl_c_signal_handler): Call quit_serial_event_set.
I see the following GDBserver internal error in two cases, gdb/gdbserver/linux-low.c:1922: A problem internal to GDBserver has been detected. unsuspend LWP 17200, suspended=-1 1. step over a breakpoint on fork/vfork syscall instruction, 2. step over a breakpoint on clone syscall instruction and child threads hits a breakpoint, the stack backtrace is #0 internal_error (file=file@entry=0x44c4c0 "gdb/gdbserver/linux-low.c", line=line@entry=1922, fmt=fmt@entry=0x44c7d0 "unsuspend LWP %ld, suspended=%d\n") at gdb/gdbserver/../common/errors.c:51 tromey#1 0x0000000000424014 in lwp_suspended_decr (lwp=<optimised out>, lwp=<optimised out>) at gdb/gdbserver/linux-low.c:1922 tromey#2 0x000000000042403a in unsuspend_one_lwp (entry=<optimised out>, except=0x66e8c0) at gdb/gdbserver/linux-low.c:2885 tromey#3 0x0000000000405f45 in find_inferior (list=<optimised out>, func=func@entry=0x424020 <unsuspend_one_lwp>, arg=arg@entry=0x66e8c0) at gdb/gdbserver/inferiors.c:243 tromey#4 0x00000000004297de in unsuspend_all_lwps (except=0x66e8c0) at gdb/gdbserver/linux-low.c:2895 tromey#5 linux_wait_1 (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, target_options=target_options@entry=0) at gdb/gdbserver/linux-low.c:3632 tromey#6 0x000000000042a764 in linux_wait (ptid=..., ourstatus=0x665ec0 <last_status>, target_options=0) at gdb/gdbserver/linux-low.c:3770 tromey#7 0x0000000000411163 in mywait (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, options=options@entry=0, connected_wait=connected_wait@entry=1) at gdb/gdbserver/target.c:214 tromey#8 0x000000000040b1f2 in resume (actions=0x66f800, num_actions=1) at gdb/gdbserver/server.c:2757 tromey#9 0x000000000040f660 in handle_v_cont (own_buf=0x66a630 "vCont;c:p45e9.-1") at gdb/gdbserver/server.c:2719 when GDBserver steps over a thread, other threads have been suspended, the "stepping" thread may create new thread, but GDBserver doesn't set it suspend count to 1. When GDBserver unsuspend threads, the child's suspend count goes to -1, and the assert is triggered. In fact, GDBserver has already taken care of suspend count of new thread when GDBserver is suspending all threads except the one GDBserver wants to step over by https://sourceware.org/ml/gdb-patches/2015-07/msg00946.html + /* If we're suspending all threads, leave this one suspended + too. */ + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) + { + if (debug_threads) + debug_printf ("HEW: leaving child suspended\n"); + child_lwp->suspended = 1; + } but that is not enough, because new thread is still can be spawned in the thread which is being stepped over. This patch extends the condition that GDBserver set child's suspend count to one if it is suspending threads or stepping over the thread. gdb/gdbserver: 2016-03-03 Yao Qi <yao.qi@linaro.org> PR server/19736 * linux-low.c (handle_extended_wait): Set child suspended if event_lwp->bp_reinsert isn't zero.
Fix this GDB crash: $ gdb -ex "set architecture mips:10000" Segmentation fault (core dumped) Backtrace: Program received signal SIGSEGV, Segmentation fault. 0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/mips-tdep.c:8436 8436 if (bfd_get_flavour (info.abfd) == bfd_target_elf_flavour (top-gdb) bt #0 0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at .../src/gdb/mips-tdep.c:8436 tromey#1 0x00000000007348a6 in gdbarch_find_by_info (info=...) at .../src/gdb/gdbarch.c:5155 tromey#2 0x000000000073563c in gdbarch_update_p (info=...) at .../src/gdb/arch-utils.c:522 tromey#3 0x0000000000735585 in set_architecture (ignore_args=0x0, from_tty=1, c=0x26bc870) at .../src/gdb/arch-utils.c:496 tromey#4 0x00000000005f29fd in do_sfunc (c=0x26bc870, args=0x0, from_tty=1) at .../src/gdb/cli/cli-decode.c:121 tromey#5 0x00000000005fd3f3 in do_set_command (arg=0x7fffffffdcdd "mips:10000", from_tty=1, c=0x26bc870) at .../src/gdb/cli/cli-setshow.c:455 tromey#6 0x0000000000836157 in execute_command (p=0x7fffffffdcdd "mips:10000", from_tty=1) at .../src/gdb/top.c:460 tromey#7 0x000000000071abfb in catch_command_errors (command=0x835f6b <execute_command>, arg=0x7fffffffdccc "set architecture mips:10000", from_tty=1) at .../src/gdb/main.c:368 tromey#8 0x000000000071bf4f in captured_main (data=0x7fffffffd750) at .../src/gdb/main.c:1132 tromey#9 0x0000000000716737 in catch_errors (func=0x71af44 <captured_main>, func_args=0x7fffffffd750, errstring=0x106b9a1 "", mask=RETURN_MASK_ALL) at .../src/gdb/exceptions.c:240 tromey#10 0x000000000071bfe6 in gdb_main (args=0x7fffffffd750) at .../src/gdb/main.c:1164 tromey#11 0x000000000040a6ad in main (argc=4, argv=0x7fffffffd858) at .../src/gdb/gdb.c:32 (top-gdb) We already check whether info.abfd is NULL before all other bfd_get_flavour calls in the same function. Just this one case was missing. (This was exposed by a WIP test that tries all "set architecture ARCH" values.) gdb/ChangeLog: 2016-03-07 Pedro Alves <palves@redhat.com> * mips-tdep.c (mips_gdbarch_init): Check whether info.abfd is NULL before calling bfd_get_flavour.
This patch adds some sanity check that reinsert breakpoints must be there when doing step-over on software single step target. The check triggers an assert when running forking-threads-plus-breakpoint.exp on arm-linux target, gdb/gdbserver/linux-low.c:4714: A problem internal to GDBserver has been detected.^M int finish_step_over(lwp_info*): Assertion `has_reinsert_breakpoints ()' failed. the error happens when GDBserver has already resumed a thread of process A for step-over (and wait for it hitting reinsert breakpoint), but receives detach request for process B from GDB, which is shown in the backtrace below, (gdb) bt tromey#2 0x000228aa in finish_step_over (lwp=0x12bbd98) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4703 tromey#3 0x00025a50 in finish_step_over (lwp=0x12bbd98) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4749 tromey#4 complete_ongoing_step_over () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4760 tromey#5 linux_detach (pid=25228) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1503 tromey#6 0x00012bae in process_serial_event () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3974 tromey#7 handle_serial_event (err=<optimized out>, client_data=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:4347 tromey#8 0x00016d68 in handle_file_event (event_file_desc=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:429 tromey#9 0x000173ea in process_event () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:184 tromey#10 start_event_loop () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:547 tromey#11 0x0000aa2c in captured_main (argv=<optimized out>, argc=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3719 tromey#12 main (argc=<optimized out>, argv=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3804 the sanity check tries to find the reinsert breakpoint from process B, but nothing is found. It is wrong, we need to search in process A, since we started step-over of a thread of process A. (gdb) p lwp->thread->entry.id $3 = {pid = 25120, lwp = 25131, tid = 0} (gdb) p current_thread->entry.id $4 = {pid = 25228, lwp = 25228, tid = 0} This patch switched current_thread to the thread we are doing step-over in finish_step_over. gdb/gdbserver: 2016-06-17 Yao Qi <yao.qi@linaro.org> * linux-low.c (maybe_hw_step): New function. (linux_resume_one_lwp_throw): Call maybe_hw_step. (finish_step_over): Switch current_thread to lwp temporarily, and assert has_reinsert_breakpoints returns true. (proceed_one_lwp): Call maybe_hw_step. * mem-break.c (has_reinsert_breakpoints): New function. * mem-break.h (has_reinsert_breakpoints): Declare.
When I run process-dies-while-detaching.exp with GDBserver, I see many warnings printed by GDBserver, ptrace(regsets_fetch_inferior_registers) PID=26183: No such process ptrace(regsets_fetch_inferior_registers) PID=26183: No such process ptrace(regsets_fetch_inferior_registers) PID=26184: No such process ptrace(regsets_fetch_inferior_registers) PID=26184: No such process regsets_fetch_inferior_registers is called when GDBserver resumes each lwp. tromey#2 0x0000000000428260 in regsets_fetch_inferior_registers (regsets_info=0x4690d0 <aarch64_regsets_info>, regcache=0x31832020) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:5412 tromey#3 0x00000000004070e8 in get_thread_regcache (thread=0x31832940, fetch=fetch@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/regcache.c:58 tromey#4 0x0000000000429c40 in linux_resume_one_lwp_throw (info=<optimized out>, signal=0, step=0, lwp=0x31832830) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4463 tromey#5 linux_resume_one_lwp (lwp=0x31832830, step=<optimized out>, signal=<optimized out>, info=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4573 The is the case that threads are disappeared when GDB/GDBserver resumes them. We check errno for ESRCH, and don't print error messages, like what we are doing in regsets_store_inferior_registers. gdb/gdbserver: 2016-08-04 Yao Qi <yao.qi@linaro.org> * linux-low.c (regsets_fetch_inferior_registers): Check errno is ESRCH or not.
… out value With something like: struct A { int bitfield:4; } var; If 'var' ends up wholly-optimized out, printing 'var.bitfield' crashes gdb here: (top-gdb) bt #0 0x000000000058b89f in extract_unsigned_integer (addr=0x2 <error: Cannot access memory at address 0x2>, len=2, byte_order=BFD_ENDIAN_LITTLE) at /home/pedro/gdb/mygit/src/gdb/findvar.c:109 tromey#1 0x00000000005a187a in unpack_bits_as_long (field_type=0x16cff70, valaddr=0x0, bitpos=16, bitsize=12) at /home/pedro/gdb/mygit/src/gdb/value.c:3347 tromey#2 0x00000000005a1b9d in unpack_value_bitfield (dest_val=0x1b5d9d0, bitpos=16, bitsize=12, valaddr=0x0, embedded_offset=0, val=0x1b5d8d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3441 tromey#3 0x00000000005a2a5f in value_fetch_lazy (val=0x1b5d9d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3958 tromey#4 0x00000000005a10a7 in value_primitive_field (arg1=0x1b5d8d0, offset=0, fieldno=0, arg_type=0x16d04c0) at /home/pedro/gdb/mygit/src/gdb/value.c:3161 tromey#5 0x00000000005b01e5 in do_search_struct_field (name=0x1727c60 "bitfield", arg1=0x1b5d8d0, offset=0, type=0x16d04c0, looking_for_baseclass=0, result_ptr=0x7fffffffcaf8, [...] unpack_value_bitfield is already optimized-out/unavailable -aware: (...) VALADDR points to the contents of VAL. If the VAL's contents required to extract the bitfield from are unavailable/optimized out, DEST_VAL is correspondingly marked unavailable/optimized out. however, it is not considering the case of the value having no contents buffer at all, as can happen through allocate_optimized_out_value. gdb/ChangeLog: 2016-08-09 Pedro Alves <palves@redhat.com> * value.c (unpack_value_bitfield): Skip unpacking if the parent has no contents buffer to begin with. gdb/testsuite/ChangeLog: 2016-08-09 Pedro Alves <palves@redhat.com> * gdb.dwarf2/bitfield-parent-optimized-out.exp: New file.
If I build gdb with -fsanitize=address and run tests, I get error, malformed linespec error: unexpected colon^M (gdb) PASS: gdb.linespec/ls-errs.exp: lang=C: break : break :=================================================================^M ==3266==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000051451 at pc 0x2b5797a972a8 bp 0x7fffd8e0f3c0 sp 0x7fffd8e0f398^M READ of size 2 at 0x602000051451 thread T0 #0 0x2b5797a972a7 in __interceptor_strlen (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x322a7)^M tromey#1 0x7bd004 in compare_filenames_for_search(char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:316^M tromey#2 0x7bd310 in iterate_over_some_symtabs(char const*, char const*, int (*)(symtab*, void*), void*, compunit_symtab*, compunit_symtab*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:411^M tromey#3 0x7bd775 in iterate_over_symtabs(char const*, int (*)(symtab*, void*), void*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:481^M tromey#4 0x7bda15 in lookup_symtab(char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:527^M tromey#5 0x7d5e2a in make_file_symbol_completion_list_1 /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5635^M tromey#6 0x7d61e1 in make_file_symbol_completion_list(char const*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5684^M tromey#7 0x88dc06 in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:288 .... 0x602000051451 is located 0 bytes to the right of 1-byte region [0x602000051450,0x602000051451)^M mallocated by thread T0 here: #0 0x2b5797ab97ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef)^M tromey#1 0xbbfb8d in xmalloc /home/yao/SourceCode/gnu/gdb/git/gdb/common/common-utils.c:43^M tromey#2 0x88dabd in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:273^M tromey#3 0x88e5ef in location_completer(cmd_list_element*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:531^M tromey#4 0x8902e7 in complete_line_internal /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:964^ The code in question is here file_to_match = (char *) xmalloc (colon - text + 1); strncpy (file_to_match, text, colon - text + 1); it is likely that file_to_match is not null-terminated. The patch is to strncpy 'colon - text' bytes and explicitly set '\0'. gdb: 2016-08-19 Yao Qi <yao.qi@linaro.org> * completer.c (linespec_location_completer): Make file_to_match null-terminated.
This test case verifies that GDB will not attempt to invoke a python unwinder recursively. At the moment, the behavior exhibited by GDB looks like this: (gdb) source py-recurse-unwind.py Python script imported (gdb) b ccc Breakpoint 1 at 0x4004bd: file py-recurse-unwind.c, line 23. (gdb) run Starting program: py-recurse-unwind TestUnwinder: Recursion detected - returning early. TestUnwinder: Recursion detected - returning early. TestUnwinder: Recursion detected - returning early. TestUnwinder: Recursion detected - returning early. Breakpoint 1, ccc (arg=<unavailable>) at py-recurse-unwind.c:23 23 } (gdb) bt #-1 ccc (arg=<unavailable>) at py-recurse-unwind.c:23 Backtrace stopped: previous frame identical to this frame (corrupt stack?) [I've shortened pathnames for easier reading.] The desired / expected behavior looks like this: (gdb) source py-recurse-unwind.py Python script imported (gdb) b ccc Breakpoint 1 at 0x4004bd: file py-recurse-unwind.c, line 23. (gdb) run Starting program: py-recurse-unwind Breakpoint 1, ccc (arg=789) at py-recurse-unwind.c:23 23 } (gdb) bt #0 ccc (arg=789) at py-recurse-unwind.c:23 tromey#1 0x00000000004004d5 in bbb (arg=456) at py-recurse-unwind.c:28 tromey#2 0x00000000004004ed in aaa (arg=123) at py-recurse-unwind.c:34 tromey#3 0x00000000004004fe in main () at py-recurse-unwind.c:40 Note that GDB's problems go well beyond the fact that it invokes the unwinder recursively. In the process it messes up some internal state (the frame stash) leading to display of (only) the sentinel frame in the backtrace. gdb/testsuite/ChangeLog: * gdb.python/py-recurse-unwind.c: New file. * gdb.python/py-recurse-unwind.py: New file. * gdb.python/py-recurse-unwind.exp: New file.
Even though this was supposedly in the gdb 7.2 timeframe, the testcase in PR11094 crashes current GDB with a segfault: Program received signal SIGSEGV, Segmentation fault. 0x00000000005ee894 in event_location_to_string (location=0x0) at src/gdb/location.c:412 412 if (EL_STRING (location) == NULL) (top-gdb) bt #0 0x00000000005ee894 in event_location_to_string (location=0x0) at src/gdb/location.c:412 tromey#1 0x000000000057411a in print_breakpoint_location (b=0x18288e0, loc=0x0) at src/gdb/breakpoint.c:6201 tromey#2 0x000000000057483f in print_one_breakpoint_location (b=0x18288e0, loc=0x182cf10, loc_number=0, last_loc=0x7fffffffd258, allflag=1) at src/gdb/breakpoint.c:6473 tromey#3 0x00000000005751e1 in print_one_breakpoint (b=0x18288e0, last_loc=0x7fffffffd258, allflag=1) at src/gdb/breakpoint.c:6707 tromey#4 0x000000000057589c in breakpoint_1 (args=0x0, allflag=1, filter=0x0) at src/gdb/breakpoint.c:6947 tromey#5 0x0000000000575aa8 in maintenance_info_breakpoints (args=0x0, from_tty=0) at src/gdb/breakpoint.c:7026 [...] This is GDB trying to print the location spec of the JIT event breakpoint, but that's an internal breakpoint without one. If I add a NULL check, then we see that the JIT breakpoint is now pending (because its location has shlib_disabled set): (gdb) maint info breakpoints Num Type Disp Enb Address What [...] -8 jit events keep y <PENDING> inf 1 [...] But that's incorrect. GDB should have managed to recreate the JIT breakpoint's location for the second run. So the problem is elsewhere. The problem is that if the JIT loads at the same address on the second run, we never recreate the JIT breakpoint, because we hit this early return: static int jit_breakpoint_re_set_internal (struct gdbarch *gdbarch, struct jit_program_space_data *ps_data) { [...] if (ps_data->cached_code_address == addr) return 0; [...] delete_breakpoint (ps_data->jit_breakpoint); [...] ps_data->jit_breakpoint = create_jit_event_breakpoint (gdbarch, addr); Fix this by deleting the breakpoint and discarding the cached code address when the objfile where the previous JIT breakpoint was found is deleted/unloaded in the first place. The test that was originally added for PR11094 doesn't trip on this because: tromey#1 - It doesn't test the case of the JIT descriptor's address _not_ changing between reruns. tromey#2 - And then it doesn't do "maint info breakpoints", or really anything with the JIT at all. tromey#3 - and even then, to trigger the problem the JIT descriptor needs to be in a separate library, while the current test puts it in the main program. The patch extends the test to cover all combinations of these scenarios. gdb/ChangeLog: 2016-10-06 Pedro Alves <palves@redhat.com> * jit.c (free_objfile_data): Delete the JIT breakpoint and clear the cached code address. gdb/testsuite/ChangeLog: 2016-10-06 Pedro Alves <palves@redhat.com> * gdb.base/jit-simple-dl.c: New file. * gdb.base/jit-simple-jit.c: New file, factored out from ... * gdb.base/jit-simple.c: ... this. * gdb.base/jit-simple.exp (jit_run): Delete. (build_jit): New proc. (jit_test_reread): Recompile either the main program or the shared library, depending on what is being tested. Skip changing address if caller wants to. Compare before/after addresses. If testing standalone, explicitly load the binary. Test "maint info breakpoints". (top level): Add "standalone vs shared lib" and "change address" vs "same address" axes.
Nowadays, if we build GDB with -fsanitize=address, we can get the asan error below, (gdb) quit ================================================================= ==9723==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete) on 0x60200003bf70 #0 0x7f88f3837527 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x55527) tromey#1 0xac8e13 in __gnu_cxx::new_allocator<void (*)()>::deallocate(void (**)(), unsigned long) /usr/include/c++/4.9/ext/new_allocator.h:110 tromey#2 0xac8cc2 in __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::deallocate(std::allocator<void (*)()>&, void (**)(), unsigned long) /usr/include/c++/4.9/ext/alloc_traits.h:185 .... 0x60200003bf70 is located 0 bytes inside of 8-byte region [0x60200003bf70,0x60200003bf78) allocated by thread T0 here: #0 0x7f88f38367ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef) tromey#1 0xbd2762 in operator new(unsigned long) /home/yao/SourceCode/gnu/gdb/git/gdb/common/new-op.c:42 tromey#2 0xac8edc in __gnu_cxx::new_allocator<void (*)()>::allocate(unsigned long, void const*) /usr/include/c++/4.9/ext/new_allocator.h:104 tromey#3 0xac8d81 in __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::allocate(std::allocator<void (*)()>&, unsigned long) /usr/include/c++/4.9/ext/alloc_traits.h:182 The reason for this is that we override operator new but don't override operator delete. This patch does the override if the code is NOT compiled with asan. gdb: 2016-10-25 Yao Qi <yao.qi@linaro.org> PR gdb/20716 * common/new-op.c (__has_feature): New macro. Don't override operator new if asan is used.
Currently GDB never sends more than one action per vCont packet, when connected in non-stop mode. A follow up patch will change that, and it exposed a gdbserver problem with the vCont handling. For example, this in non-stop mode: => vCont;s:p1.1;c <= OK Should be equivalent to: => vCont;s:p1.1 <= OK => vCont;c <= OK But gdbserver currently doesn't handle this. In the latter case, "vCont;c" makes gdbserver clobber the previous step request. This patch fixes that. Note the server side must ignore resume actions for the thread that has a pending %Stopped notification (and any other threads with events pending), until GDB acks the notification with vStopped. Otherwise, e.g., the following case is mishandled: tromey#1 => g (or any other packet) tromey#2 <= [registers] tromey#3 <= %Stopped T05 thread:p1.2 tromey#4 => vCont s:p1.1;c tromey#5 <= OK Above, the server must not resume thread p1.2 when it processes the vCont. GDB can't know that p1.2 stopped until it acks the %Stopped notification. (Otherwise it wouldn't send a default "c" action.) (The vCont documentation already specifies this.) Finally, special care must also be given to handling fork/vfork events. A (v)fork event actually tells us that two processes stopped -- the parent and the child. Until we follow the fork, we must not resume the child. Therefore, if we have a pending fork follow, we must not send a global wildcard resume action (vCont;c). We can still send process-wide wildcards though. (The comments above will be added as code comments to gdb in a follow up patch.) gdb/gdbserver/ChangeLog: 2016-10-26 Pedro Alves <palves@redhat.com> * linux-low.c (handle_extended_wait): Link parent/child fork threads. (linux_wait_1): Unlink them. (linux_set_resume_request): Ignore resume requests for already-resumed and unhandled fork child threads. * linux-low.h (struct lwp_info) <fork_relative>: New field. * server.c (in_queued_stop_replies_ptid, in_queued_stop_replies): New functions. (handle_v_requests) <vCont>: Don't call require_running. * server.h (in_queued_stop_replies): New declaration.
Most of the time, the trace should be in one piece. This case is handled fine by GDB. In some cases, however, there may be gaps in the trace. They result from trace decode errors or from overflows. A gap in the trace means we lost an unknown amount of trace. Gaps can be very small, such as a few instructions in the same function, or they can be rather big. We may, for example, lose a few function calls or returns. The trace may continue in a different function and we likely don't know how we got there. Even though we can't say how the program executed across a gap, higher levels may not be impacted too much by it. Let's assume we have functions a-e and a trace that looks roughly like this: a \ b b \ / c <gap> c / d d \ / e Even though we can't say for sure, it is likely that b and c are the same function instance before and after the gap. This patch is trying to connect the c and b function segments across the gap. This will add a to the back trace of b on the right hand side. The changes are reflected in GDB's internal representation of the trace and will improve: - the output of "record function-call-history /c" - the output of "backtrace" in replay mode - source stepping in replay mode will be improved indirectly via the improved back trace I don't have an automated test for this patch; decode errors will be fixed and overflows occur sporadically and are quite rare. I tested it by hacking GDB to provoke a decode error and on the expected gap in the gdb.btrace/dlopen.exp test. The issue is that we can't predict where we will be able to re-sync in case of errors. For the expected decode error in gdb.btrace/dlopen.exp, for example, we may be able to re-sync somewhere in dlclose, in test, in main, or not at all. Here's one example run of gdb.btrace/dlopen.exp with and without this patch. (gdb) info record Active record target: record-btrace Recording format: Intel Processor Trace. Buffer size: 16kB. warning: Non-contiguous trace at instruction 66608 (offset = 0xa83, pc = 0xb7fdcc31). warning: Non-contiguous trace at instruction 66652 (offset = 0xa9b, pc = 0xb7fdcc31). warning: Non-contiguous trace at instruction 66770 (offset = 0xacb, pc = 0xb7fdcc31). warning: Non-contiguous trace at instruction 66966 (offset = 0xb60, pc = 0xb7ff5ee4). warning: Non-contiguous trace at instruction 66994 (offset = 0xb74, pc = 0xb7ff5f24). warning: Non-contiguous trace at instruction 67334 (offset = 0xbac, pc = 0xb7ff5e6d). warning: Non-contiguous trace at instruction 69022 (offset = 0xc04, pc = 0xb7ff60b3). warning: Non-contiguous trace at instruction 69116 (offset = 0xc1c, pc = 0xb7ff60b3). warning: Non-contiguous trace at instruction 69504 (offset = 0xc74, pc = 0xb7ff605d). warning: Non-contiguous trace at instruction 83648 (offset = 0xecc, pc = 0xb7ff6134). warning: Decode error (-13) at instruction 83876 (offset = 0xf48, pc = 0xb7fd6380): no memory mapped at this address. warning: Non-contiguous trace at instruction 83876 (offset = 0x11b7, pc = 0xb7ff1c70). Recorded 83948 instructions in 912 functions (12 gaps) for thread 1 (process 12996). (gdb) record instruction-history 83876, +2 83876 => 0xb7fec46f <call_init.part.0+95>: call *%eax [decode error (-13): no memory mapped at this address] [disabled] 83877 0xb7ff1c70 <_dl_close_worker.part.0+1584>: nop Without the patch, the trace is disconnected and the backtrace is short: (gdb) record goto 83876 #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 #1 0xb7fec5d0 in _dl_init () from /lib/ld-linux.so.2 #2 0xb7ff0fe3 in dl_open_worker () from /lib/ld-linux.so.2 Backtrace stopped: not enough registers or memory available to unwind further (gdb) record goto 83877 #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 #1 0xb7ff287a in _dl_close () from /lib/ld-linux.so.2 #2 0xb7fc3d5d in dlclose_doit () from /lib/libdl.so.2 #3 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2 #5 0xb7fc3d98 in dlclose () from /lib/libdl.so.2 #6 0x0804860a in test () #7 0x08048628 in main () With the patch, GDB is able to connect the trace pieces and we get a full backtrace. (gdb) record goto 83876 #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 #1 0xb7fec5d0 in _dl_init () from /lib/ld-linux.so.2 #2 0xb7ff0fe3 in dl_open_worker () from /lib/ld-linux.so.2 #3 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0xb7ff02e2 in _dl_open () from /lib/ld-linux.so.2 #5 0xb7fc3c65 in dlopen_doit () from /lib/libdl.so.2 #6 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #7 0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2 #8 0xb7fc3d0e in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2 #9 0xb7ff28ee in _dl_runtime_resolve () from /lib/ld-linux.so.2 #10 0x0804841c in ?? () #11 0x08048470 in dlopen@plt () #12 0x080485a3 in test () #13 0x08048628 in main () (gdb) record goto 83877 #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 #1 0xb7ff287a in _dl_close () from /lib/ld-linux.so.2 #2 0xb7fc3d5d in dlclose_doit () from /lib/libdl.so.2 #3 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2 #5 0xb7fc3d98 in dlclose () from /lib/libdl.so.2 #6 0x0804860a in test () #7 0x08048628 in main () It worked nicely in this case but it may, of course, also lead to weird connections; it is a heuristic, after all. It works best when the gap is small and the trace pieces are long. gdb/ * btrace.c (bfun_s): New typedef. (ftrace_update_caller): Print caller in debug dump. (ftrace_get_caller, ftrace_match_backtrace, ftrace_fixup_level) (ftrace_compute_global_level_offset, ftrace_connect_bfun) (ftrace_connect_backtrace, ftrace_bridge_gap, btrace_bridge_gaps): New. (btrace_compute_ftrace_bts): Pass vector of gaps. Collect gaps. (btrace_compute_ftrace_pt): Likewise. (btrace_compute_ftrace): Split into this, ... (btrace_compute_ftrace_1): ... this, and ... (btrace_finalize_ftrace): ... this. Call btrace_bridge_gaps.
While adding new tests to gdb.base/commands.exp, I noticed that the file includes a bunch of individual testcases split into their own procedures, and that none have ever been adjusted to use with_test_prefix. Instead, each gdb_test/gdb_test_multiple/etc invocation takes care of including the procedure name in the test message, in order to make sure test messages are unique. Simon convinced me that using the procedure name as prefix is not that bad of an idea: https://sourceware.org/ml/gdb-patches/2016-10/msg00020.html This commit adds an IMO simpler alternative to with_test_prefix_procname added by that patch -- a new "proc_with_prefix" convenience proc that is meant to be used in place of "proc", and then uses it in commands.exp. Procedures defined with this automatically run their bodies under with_test_prefix $proc_name. Here's a sample of the resulting gdb.sum diff: [...] -PASS: gdb.base/commands.exp: break factorial #3 -PASS: gdb.base/commands.exp: set value to 5 in test_command_prompt_position -PASS: gdb.base/commands.exp: if test in test_command_prompt_position -PASS: gdb.base/commands.exp: > OK in test_command_prompt_position +PASS: gdb.base/commands.exp: test_command_prompt_position: break factorial +PASS: gdb.base/commands.exp: test_command_prompt_position: set value to 5 +PASS: gdb.base/commands.exp: test_command_prompt_position: if test +PASS: gdb.base/commands.exp: test_command_prompt_position: > OK [...] gdb/testsuite/ChangeLog: 2016-11-09 Pedro Alves <palves@redhat.com> * gdb.base/commands.exp (gdbvar_simple_if_test) (gdbvar_simple_while_test, gdbvar_complex_if_while_test) (progvar_simple_if_test, progvar_simple_while_test) (progvar_complex_if_while_test, if_while_breakpoint_command_test) (infrun_breakpoint_command_test, breakpoint_command_test) (user_defined_command_test, watchpoint_command_test) (test_command_prompt_position, deprecated_command_test) (bp_deleted_in_command, temporary_breakpoint_commands) (stray_arg0_test, source_file_with_indented_comment) (recursive_source_test, if_commands_test) (error_clears_commands_left, redefine_hook_test) (redefine_backtrace_test): Use proc_with_prefix. * lib/gdb.exp (proc_with_prefix): New proc.
… frame This patch ensures that the frame id for the current frame is stashed before that of the previous frame (to the current frame). First, it should be noted that the frame id for the current frame is not stashed by get_current_frame(). The current frame's frame id is lazily computed and stashed via calls to get_frame_id(). However, it's possible for get_prev_frame() to be called without first stashing the current frame. The frame stash is used not only to speed up frame lookups, but also to detect cycles. When attempting to compute the frame id for a "previous" frame (in get_prev_frame_if_no_cycle), a cycle is detected if the computed frame id is already in the stash. If it should happen that a previous frame id is stashed which should represent a cycle for the current frame, then an assertion failure will trigger should get_frame_id() be later called to determine the frame id for the current frame. As of late 2016, with the "Tweak meaning of VALUE_FRAME_ID" patch in place, this actually occurs when running the gdb.dwarf2/dw2-dup-frame.exp test. While attempting to generate a backtrace, the python frame filter code is invoked, leading to frame_info_to_frame_object() (in python/py-frame.c) being called. That function will potentially call get_prev_frame() before get_frame_id() is called. The call to get_prev_frame() can eventually end up in get_prev_frame_if_no_cycle() which, in turn, calls compute_frame_id(), after which the frame id is stashed for the previous frame. If the frame id for the current frame is stashed, the cycle detection code (which relies on the frame stash) in get_prev_frame_if_no_cycle() will be triggered for a cycle starting with the current frame. If the current frame's id is not stashed, the cycle detecting code can't operate as designed. Instead, when get_frame_id() is called on the current frame at some later point, the current frame's id will found to be already in the stash, triggering an assertion failure. Below is an in depth examination of the failure which lead to this change. I've shortened pathnames for brevity and readability. Here's the portion of the log file showing the failure/internal error: (gdb) break stop_frame Breakpoint 1 at 0x40059a: file dw2-dup-frame.c, line 22. (gdb) run Starting program: testsuite/outputs/gdb.dwarf2/dw2-dup-frame/dw2-dup-frame Breakpoint 1, stop_frame () at dw2-dup-frame.c:22 22 } (gdb) bt gdb/frame.c:544: internal-error: frame_id get_frame_id(frame_info*): Assertion `stashed' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.dwarf2/dw2-dup-frame.exp: backtrace from stop_frame (GDB internal error) Here's a partial backtrace from the internal error, showing the frames which I think are relevant, plus several extra to provide context: #0 internal_error ( file=0x932b98 "gdb/frame.c", line=544, fmt=0x932b20 "%s: Assertion `%s' failed.") at gdb/common/errors.c:54 #1 0x000000000072207e in get_frame_id (fi=0xe5a760) at gdb/frame.c:544 #2 0x00000000004eb50d in frame_info_to_frame_object (frame=0xe5a760) at gdb/python/py-frame.c:390 #3 0x00000000004ef5be in bootstrap_python_frame_filters (frame=0xe5a760, frame_low=0, frame_high=-1) at gdb/python/py-framefilter.c:1453 #4 0x00000000004ef7a9 in gdbpy_apply_frame_filter ( extlang=0x8857e0 <extension_language_python>, frame=0xe5a760, flags=7, args_type=CLI_SCALAR_VALUES, out=0xf6def0, frame_low=0, frame_high=-1) at gdb/python/py-framefilter.c:1548 #5 0x00000000005f2c5a in apply_ext_lang_frame_filter (frame=0xe5a760, flags=7, args_type=CLI_SCALAR_VALUES, out=0xf6def0, frame_low=0, frame_high=-1) at gdb/extension.c:572 #6 0x00000000005ea896 in backtrace_command_1 (count_exp=0x0, show_locals=0, no_filters=0, from_tty=1) at gdb/stack.c:1834 Examination of the code in frame_info_to_frame_object(), which is in python/py-frame.c, is key to understanding this problem: if (get_prev_frame (frame) == NULL && get_frame_unwind_stop_reason (frame) != UNWIND_NO_REASON && get_next_frame (frame) != NULL) { frame_obj->frame_id = get_frame_id (get_next_frame (frame)); frame_obj->frame_id_is_next = 1; } else { frame_obj->frame_id = get_frame_id (frame); frame_obj->frame_id_is_next = 0; } I will first note that the frame id for frame has not been computed yet. (This was verified by placing a breakpoint on compute_frame_id().) The call to get_prev_frame() causes the the frame id to (eventually) be computed for the previous frame. Here's a backtrace showing how we get there: #0 compute_frame_id (fi=0x10e2810) at gdb/frame.c:496 #1 0x0000000000724a67 in get_prev_frame_if_no_cycle (this_frame=0xe5a760) at gdb/frame.c:1871 #2 0x0000000000725136 in get_prev_frame_always_1 (this_frame=0xe5a760) at gdb/frame.c:2045 #3 0x000000000072516b in get_prev_frame_always (this_frame=0xe5a760) at gdb/frame.c:2061 #4 0x000000000072570f in get_prev_frame (this_frame=0xe5a760) at gdb/frame.c:2303 #5 0x00000000004eb471 in frame_info_to_frame_object (frame=0xe5a760) at gdb/python/py-frame.c:381 For this particular case, we end up in the else clause of the code above which calls get_frame_id (frame). It's at this point that the frame id for frame is computed. Again, here's a backtrace: #0 compute_frame_id (fi=0xe5a760) at gdb/frame.c:496 #1 0x000000000072203d in get_frame_id (fi=0xe5a760) at gdb/frame.c:539 #2 0x00000000004eb50d in frame_info_to_frame_object (frame=0xe5a760) at gdb/python/py-frame.c:390 The test in question, dw2-dup-frame.exp, deliberately creates a broken (cyclic) stack. So, in this instance, the frame id for the prev `frame' will be the same as that for `frame'. But that particular frame id ended up in the stash during the previous frame operation. When, just a few lines later, we compute the frame id for `frame', the id in question is already in the stash, thus triggering the assertion failure. I considered two other solutions to solving this problem: We could prevent get_prev_frame() from being called before get_frame_id() in frame_info_to_frame_object(). (See above for the snippet of code where this happens.) A call to get_frame_id (frame) could be placed ahead of that code snippet above. I have tested this approach and, while it does work, I can't be certain that get_prev_frame() isn't called ahead of stashing the current frame somewhere else in GDB, but in a less obvious way. Another approach is to stash the current frame's id by calling get_frame_id() in get_current_frame(). This approach is conceptually simpler, but when importing a python unwinder, has the unwelcome side effect of causing the unwinder to be called during import. A cleaner looking fix would be to place this code after code corresponding to the "Don't compute the frame id of the current frame yet..." comment in get_prev_frame_if_no_cycle(). Sadly, this does not work though; by the time we get to this point, the frame state for the prev frame has been modified just enough to cause an internal error to occur when attempting to compute the (current) frame id for inline frames. (The unexpected failure count increases by roughly 130 failures.) Therefore, I decided to place it as early as possible in get_prev_frame(). gdb/ChangeLog: * frame.c (get_prev_frame): Stash frame id for current frame prior to computing frame id for previous frame.
…binations This adds a test that exposes several problems fixed by earlier patches: #1 - Buffer overrun when host/target formats match, but sizes don't. https://sourceware.org/ml/gdb-patches/2016-03/msg00125.html #2 - Missing handling for FR-V FR300. https://sourceware.org/ml/gdb-patches/2016-03/msg00117.html #3 - BFD architectures with spaces in their names (v850). https://sourceware.org/ml/binutils/2016-03/msg00108.html #4 - The OS ABI names with spaces issue. https://sourceware.org/ml/gdb-patches/2016-03/msg00116.html #5 - Bogus HP/PA long double format. https://sourceware.org/ml/gdb-patches/2016-03/msg00122.html #6 - Cris big endian internal error. https://sourceware.org/ml/gdb-patches/2016-03/msg00126.html #7 - Several PowerPC bfd archs/machines not handled by gdb. https://sourceware.org/bugzilla/show_bug.cgi?id=19797 And hopefully helps catch others in the future. This started out as a test that simply did, gdb -ex "print 1.0L" to exercise #1 above. Then to cover both 32-bit target / 64-bit host and the converse, I thought of having the testcase print the floats twice, once with the architecture set to "i386" and then to "i386:x86-64". This way it wouldn't matter whether gdb was built as 32-bit or a 64-bit program. Then I thought that other archs might have similar host/target floatformat conversion issues as well. Instead of hardcoding some architectures in the test file, I thought we could just iterate over all bfd architectures and OS ABIs supported by the gdb build being tested. This is what then exposed all the other problems listed above... With an --enable-targets=all, this exercises over 14 thousand combinations. If left in a single test file, it all consistenly runs in under a minute on my machine (An Intel i7-4810MQ @ 2.8 MHZ running Fedora 23). Split in 8 chunks, as in this commit, it runs in around 25 seconds, with make -j8. To avoid flooding the gdb.sum file, it avoids calling "pass" on each tested combination/iteration. I'm explicitly not implementing that by passing an empty message to gdb_test / gdb_test_multiple, because I still want a FAIL to be logged in gdb.sum. So instead this puts the internal passes in the gdb.log file, only, prefixed "IPASS:", for internal pass. TBC, if some iteration fails, it'll still show up as FAIL in gdb.sum. If this is an approach that takes on, I can see us extending the common bits to support it for all testcases. gdb/testsuite/ChangeLog: 2016-12-09 Pedro Alves <palves@redhat.com> * gdb.base/all-architectures-0.exp: New file. * gdb.base/all-architectures-1.exp: New file. * gdb.base/all-architectures-2.exp: New file. * gdb.base/all-architectures-3.exp: New file. * gdb.base/all-architectures-4.exp: New file. * gdb.base/all-architectures-5.exp: New file. * gdb.base/all-architectures-6.exp: New file. * gdb.base/all-architectures-7.exp: New file. * gdb.base/all-architectures.exp.in: New file.
I build GDB for all targets enabled. When I "set architecture rl78", GDB crashes, (gdb) set architecture rl78 Program received signal SIGSEGV, Segmentation fault. append_flags_type_flag (type=0x20cc0e0, bitpos=bitpos@entry=0, name=name@entry=0x11dba3f "CY") at ../../binutils-gdb/gdb/gdbtypes.c:4926 4926 name); (gdb) bt 10 #0 append_flags_type_flag (type=0x20cc0e0, bitpos=bitpos@entry=0, name=name@entry=0x11dba3f "CY") at ../../binutils-gdb/gdb/gdbtypes.c:4926 #1 0x00000000004aaca8 in rl78_gdbarch_init (info=..., arches=<optimized out>) at ../../binutils-gdb/gdb/rl78-tdep.c:1410 #2 0x00000000006b05a4 in gdbarch_find_by_info (info=...) at ../../binutils-gdb/gdb/gdbarch.c:5269 #3 0x000000000060eee4 in gdbarch_update_p (info=...) at ../../binutils-gdb/gdb/arch-utils.c:557 #4 0x000000000060f8a8 in set_architecture (ignore_args=<optimized out>, from_tty=1, c=<optimized out>) at ../../binutils-gdb/gdb/arch-utils.c:531 #5 0x0000000000593d0b in do_set_command (arg=<optimized out>, arg@entry=0x20be851 "rl78", from_tty=from_tty@entry=1, c=c@entry=0x20b1540) at ../../binutils-gdb/gdb/cli/cli-setshow.c:455 #6 0x00000000007665c3 in execute_command (p=<optimized out>, p@entry=0x20be840 "set architecture rl78", from_tty=1) at ../../binutils-gdb/gdb/top.c:666 #7 0x00000000006935f4 in command_handler (command=0x20be840 "set architecture rl78") at ../../binutils-gdb/gdb/event-top.c:577 #8 0x00000000006938d8 in command_line_handler (rl=<optimized out>) at ../../binutils-gdb/gdb/event-top.c:767 #9 0x0000000000692c2c in gdb_rl_callback_handler (rl=0x20be890 "") at ../../binutils-gdb/gdb/event-top.c:200 The cause is that we want to access some builtin types in gdbarch init, but it is not initialized yet. I fix it by creating the type when it is to be used. We've already done this in sparc, sparc64 and m68k. gdb: 2016-12-09 Yao Qi <yao.qi@linaro.org> PR tdep/20953 * rl78-tdep.c (rl78_psw_type): New function. (rl78_register_type): Call rl78_psw_type. (rl78_gdbarch_init): Move code to rl78_psw_type. gdb/testsuite: 2016-12-09 Yao Qi <yao.qi@linaro.org> * gdb.base/all-architectures.exp.in: Remove kfail for rl78.
I build GDB with all targets enabled, and "set architecture rx", GDB crashes, (gdb) set architecture rx Program received signal SIGSEGV, Segmentation fault. append_flags_type_flag (type=0x20cc360, bitpos=bitpos@entry=0, name=name@entry=0xd27529 "C") at ../../binutils-gdb/gdb/gdbtypes.c:4926 4926 name); (gdb) bt 10 #0 append_flags_type_flag (type=0x20cc360, bitpos=bitpos@entry=0, name=name@entry=0xd27529 "C") at ../../binutils-gdb/gdb/gdbtypes.c:4926 #1 0x00000000004ce725 in rx_gdbarch_init (info=..., arches=<optimized out>) at ../../binutils-gdb/gdb/rx-tdep.c:1051 #2 0x00000000006b05a4 in gdbarch_find_by_info (info=...) at ../../binutils-gdb/gdb/gdbarch.c:5269 #3 0x000000000060eee4 in gdbarch_update_p (info=...) at ../../binutils-gdb/gdb/arch-utils.c:557 #4 0x000000000060f8a8 in set_architecture (ignore_args=<optimized out>, from_tty=1, c=<optimized out>) at ../../binutils-gdb/gdb/arch-utils.c:531 #5 0x0000000000593d0b in do_set_command (arg=<optimized out>, arg@entry=0x20bee81 "rx ", from_tty=from_tty@entry=1, c=c@entry=0x20b1540) at ../../binutils-gdb/gdb/cli/cli-setshow.c:455 #6 0x00000000007665c3 in execute_command (p=<optimized out>, p@entry=0x20bee70 "set architecture rx ", from_tty=1) at ../../binutils-gdb/gdb/top.c:666 #7 0x00000000006935f4 in command_handler (command=0x20bee70 "set architecture rx ") at ../../binutils-gdb/gdb/event-top.c:577 #8 0x00000000006938d8 in command_line_handler (rl=<optimized out>) at ../../binutils-gdb/gdb/event-top.c:767 #9 0x0000000000692c2c in gdb_rl_callback_handler (rl=0x20be7f0 "") at ../../binutils-gdb/gdb/event-top.c:200 The cause is that we want to access some builtin types in gdbarch init, but it is not initialized yet. I fix it by creating the type when it is to be used. We've already done this in sparc, sparc64 and m68k. gdb: 2016-12-09 Yao Qi <yao.qi@linaro.org> PR tdep/20954 * rx-tdep.c (rx_psw_type): New function. (rx_fpsw_type): New function. (rx_register_type): Call rx_psw_type and rx_fpsw_type. (rx_gdbarch_init): Move code to rx_psw_type and rx_fpsw_type. gdb/testsuite: 2016-12-09 Yao Qi <yao.qi@linaro.org> * gdb.base/all-architectures.exp.in: Remove kfail for "rx".
Nowadays, GDB propagates C++ exceptions across readline using setjmp/longjmp 8952576 ("Propagate GDB/C++ exceptions across readline using sj/lj-based TRY/CATCH") because DWARF-based unwinding can't cross C functions compiled without -fexceptions (see details from the commit above). Unfortunately, toolchains that use SjLj-based C++ exceptions got broken with that fix, because _Unwind_SjLj_Unregister, which is put at the exit of a function, is not executed due to the longjmp added by that commit. (gdb) [New Thread 2936.0xb80] kill Thread 1 received signal SIGSEGV, Segmentation fault. 0x03ff662b in ?? () top?bt 15 #0 0x03ff662b in ?? () #1 0x00526b92 in stdin_event_handler (error=0, client_data=0x172ed8) at ../../binutils-gdb/gdb/event-top.c:555 #2 0x00525a94 in handle_file_event (ready_mask=<optimized out>, file_ptr=0x3ff5cb8) at ../../binutils-gdb/gdb/event-loop.c:733 #3 gdb_wait_for_event (block=block@entry=1) at ../../binutils-gdb/gdb/event-loop.c:884 #4 0x00525bfb in gdb_do_one_event () at ../../binutils-gdb/gdb/event-loop.c:347 #5 0x00525ce5 in start_event_loop () at ../../binutils-gdb/gdb/event-loop.c:371 #6 0x0051fada in captured_command_loop (data=0x0) at ../../binutils-gdb/gdb/main.c:324 #7 0x0051cf5d in catch_errors ( func=func@entry=0x51fab0 <captured_command_loop(void*)>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x7922bf <VEC_interp_factory_p_quick_push(VEC_inte rp_factory_p*, interp_factory*, char const*, unsigned int)::__PRETTY_FUNCTION__+351> "", mask=mask@entry=RETURN_MASK_ALL) at ../../binutils-gdb/gdb/exceptions.c:236 #8 0x00520f0c in captured_main (data=0x328feb4) at ../../binutils-gdb/gdb/main.c:1149 #9 gdb_main (args=args@entry=0x328feb4) at ../../binutils-gdb/gdb/main.c:1159 #10 0x0071e400 in main (argc=1, argv=0x171220) at ../../binutils-gdb/gdb/gdb.c:32 Fix this by making the functions involved in setjmp/longjmp as noexcept, so that the compiler knows it doesn't need to emit the _Unwind_SjLj_Register / _Unwind_SjLj_Unregister calls for C++ exceptions. Tested on x86_64 Fedora 23 with: - GCC 5.3.1 w/ DWARF-based exceptions. - GCC 7 built with --enable-sjlj-exceptions. gdb/ChangeLog: 2016-12-20 Pedro Alves <palves@redhat.com> Yao Qi <yao.qi@linaro.org> PR gdb/20977 * event-top.c (gdb_rl_callback_read_char_wrapper_noexcept): New noexcept function, factored out from ... (gdb_rl_callback_read_char_wrapper): ... this. (gdb_rl_callback_handler): Mark noexcept.
New in v2: - Define PyMem_RawMalloc as PyMem_Malloc for Python < 3.4 and use PyMem_RawMalloc in the code. Since Python 3.4, the callback installed in PyOS_ReadlineFunctionPointer should return a value allocated with PyMem_RawMalloc instead of PyMem_Malloc. The reason is that PyMem_Malloc must be called with the Python Global Interpreter Lock (GIL) held, which is not the case in the context where this function is called. PyMem_RawMalloc was introduced for cases like this. In Python 3.6, it looks like they added an assert to verify that PyMem_Malloc was not called without the GIL. The consequence is that typing anything in the python-interactive mode of gdb crashes the process. The same behavior was observed with the official package on Arch Linux as well as with a manual Python build on Ubuntu 14.04. This is what is shown with a debug build of Python 3.6 (the error with a non-debug build is far less clear): (gdb) pi >>> print(1) Fatal Python error: Python memory allocator called without holding the GIL Current thread 0x00007f1459af8780 (most recent call first): [1] 21326 abort ./gdb and the backtrace: #0 0x00007ffff618bc37 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007ffff618f028 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007ffff6b104d6 in Py_FatalError (msg=msg@entry=0x7ffff6ba15b8 "Python memory allocator called without holding the GIL") at Python/pylifecycle.c:1457 #3 0x00007ffff6a37a68 in _PyMem_DebugCheckGIL () at Objects/obmalloc.c:1972 #4 0x00007ffff6a3804e in _PyMem_DebugFree (ctx=0x7ffff6e65290 <_PyMem_Debug+48>, ptr=0x24f8830) at Objects/obmalloc.c:1994 #5 0x00007ffff6a38e1d in PyMem_Free (ptr=<optimized out>) at Objects/obmalloc.c:442 #6 0x00007ffff6b866c6 in _PyFaulthandler_Fini () at ./Modules/faulthandler.c:1369 #7 0x00007ffff6b104bd in Py_FatalError (msg=msg@entry=0x7ffff6ba15b8 "Python memory allocator called without holding the GIL") at Python/pylifecycle.c:1431 #8 0x00007ffff6a37a68 in _PyMem_DebugCheckGIL () at Objects/obmalloc.c:1972 #9 0x00007ffff6a37aa3 in _PyMem_DebugMalloc (ctx=0x7ffff6e65290 <_PyMem_Debug+48>, nbytes=5) at Objects/obmalloc.c:1980 #10 0x00007ffff6a38d91 in PyMem_Malloc (size=<optimized out>) at Objects/obmalloc.c:418 #11 0x000000000064dbe2 in gdbpy_readline_wrapper (sys_stdin=0x7ffff6514640 <_IO_2_1_stdin_>, sys_stdout=0x7ffff6514400 <_IO_2_1_stdout_>, prompt=0x7ffff4d4f7d0 ">>> ") at /home/emaisin/src/binutils-gdb/gdb/python/py-gdb-readline.c:75 The documentation is very clear about it [1] and it was also mentioned in the "What's New In Python 3.4" page [2]. [1] https://docs.python.org/3/c-api/veryhigh.html#c.PyOS_ReadlineFunctionPointer [2] https://docs.python.org/3/whatsnew/3.4.html#changes-in-the-c-api gdb/ChangeLog: * python/python-internal.h (PyMem_RawMalloc): Define for Python < 3.4. * python/py-gdb-readline.c (gdbpy_readline_wrapper): Use PyMem_RawMalloc instead of PyMem_Malloc.
When the gdbpy_ref objects get destroyed, they call Py_DECREF to decrement the reference counter of the python object they hold a reference to. Any time we call into the Python API, we should be holding the GIL. The gdbpy_enter object does that for us in an RAII-fashion. However, if gdbpy_enter is declared after a gdbpy_ref object in a function, gdbpy_enter's destructor will be called (and the GIL will be released) before gdbpy_ref's destructor is called. Therefore, we will end up calling Py_DECREF without holding the GIL. This became obvious with Python 3.6, where memory management functions have asserts to make sure that the GIL is held. This was exposed by tests py-as-string.exp, py-function.exp and py-xmethods. For example: (gdb) p $_as_string(enum_valid) Fatal Python error: Python memory allocator called without holding the GIL Current thread 0x00007f7f7b21c780 (most recent call first): [1] 18678 abort (core dumped) ./gdb -nx testsuite/outputs/gdb.python/py-as-string/py-as-string #0 0x00007ffff618bc37 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007ffff618f028 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007ffff6b104d6 in Py_FatalError (msg=msg@entry=0x7ffff6ba15b8 "Python memory allocator called without holding the GIL") at Python/pylifecycle.c:1457 #3 0x00007ffff6a37a68 in _PyMem_DebugCheckGIL () at Objects/obmalloc.c:1972 #4 0x00007ffff6a3804e in _PyMem_DebugFree (ctx=0x7ffff6e65290 <_PyMem_Debug+48>, ptr=0x24f8830) at Objects/obmalloc.c:1994 #5 0x00007ffff6a38e1d in PyMem_Free (ptr=<optimized out>) at Objects/obmalloc.c:442 #6 0x00007ffff6b866c6 in _PyFaulthandler_Fini () at ./Modules/faulthandler.c:1369 #7 0x00007ffff6b104bd in Py_FatalError (msg=msg@entry=0x7ffff6ba15b8 "Python memory allocator called without holding the GIL") at Python/pylifecycle.c:1431 #8 0x00007ffff6a37a68 in _PyMem_DebugCheckGIL () at Objects/obmalloc.c:1972 #9 0x00007ffff6a3804e in _PyMem_DebugFree (ctx=0x7ffff6e652c0 <_PyMem_Debug+96>, ptr=0x7ffff46b6040) at Objects/obmalloc.c:1994 #10 0x00007ffff6a38f55 in PyObject_Free (ptr=<optimized out>) at Objects/obmalloc.c:503 #11 0x00007ffff6a5f27e in unicode_dealloc (unicode=unicode@entry=0x7ffff46b6040) at Objects/unicodeobject.c:1794 #12 0x00007ffff6a352a9 in _Py_Dealloc (op=0x7ffff46b6040) at Objects/object.c:1786 #13 0x000000000063f28b in gdb_Py_DECREF (op=0x7ffff46b6040) at /home/emaisin/src/binutils-gdb/gdb/python/python-internal.h:192 #14 0x000000000063fa33 in gdbpy_ref_policy::decref (ptr=0x7ffff46b6040) at /home/emaisin/src/binutils-gdb/gdb/python/py-ref.h:35 #15 0x000000000063fa77 in gdb::ref_ptr<_object, gdbpy_ref_policy>::~ref_ptr (this=0x7fffffffcdf0, __in_chrg=<optimized out>) at /home/emaisin/src/binutils-gdb/gdb/common/gdb_ref_ptr.h:91 #16 0x000000000064d8b8 in fnpy_call (gdbarch=0x2b50010, language=0x115d2c0 <c_language_defn>, cookie=0x7ffff46b7468, argc=1, argv=0x7fffffffcf48) at /home/emaisin/src/binutils-gdb/gdb/python/py-function.c:145 The fix is to place the gdbpy_enter first in the function. I also cleaned up the comments a bit and removed the unnecessary initialization of the value variable. gdb/ChangeLog: * python/py-function.c (fnpy_call): Reorder declarations to have the gdbpy_enter object declared first. * python/py-xmethods.c (gdbpy_get_xmethod_arg_types): Likewise.
This commit fixes a "-gdb-set logging redirect on" crash by not handling "logging redirect on" on the fly. Previous discussion here: https://sourceware.org/ml/gdb-patches/2017-01/msg00467.html Code for handling "logging redirect on" on the fly was added here: https://sourceware.org/ml/gdb-patches/2010-08/msg00202.html Meanwhile, MI gained support for logging, but flipping redirect "on" on the fly was not considered. The result is that this sequence of commands crashes GDB: -gdb-set logging on -gdb-set logging redirect on Program received signal SIGSEGV, Segmentation fault. 0x00000000008dd7bc in gdb_flush (file=0x2a097f0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/ui-file.c:95 194 file->to_flush (file); (top-gdb) bt #0 0x00000000008dd7bc in gdb_flush(ui_file*) (file=0x2a097f0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/ui-file.c:95 #1 0x00000000007b5f34 in gdb_wait_for_event(int) (block=0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/event-loop.c:752 #2 0x00000000007b52b6 in gdb_do_one_event() () at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/event-loop.c:322 #3 0x00000000007b5362 in start_event_loop() () at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/event-loop.c:371 #4 0x000000000082704a in captured_command_loop(void*) (data=0x0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/main.c:325 #5 0x00000000007b8d7c in catch_errors(int (*)(void*), void*, char*, return_mask) (func=0x827008 <captured_command_loop(void*)>, func_args=0x0, errstring=0x11dee51 "", mask=RETURN_MASK_ALL) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/exceptions.c:236 #6 0x000000000082839b in captured_main(void*) (data=0x7fffffffd820) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/main.c:1148 During symbol reading, cannot get low and high bounds for subprogram DIE at 24065. #7 0x00000000008283c4 in gdb_main(captured_main_args*) (args=0x7fffffffd820) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/main.c:1158 #8 0x0000000000412d4d in main(int, char**) (argc=4, argv=0x7fffffffd928) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/gdb.c:32 The handling of redirect on the fly is not really a use case we need to handle, IMO. Its inconsistent (other "set logging foo" commands aren't handled on the fly), and complicates the code significantly. Instead of complicating it further for MI, go back to the original idea of warning, only: https://sourceware.org/ml/gdb-patches/2010-08/msg00083.html New test included. gdb/ChangeLog: 2017-02-02 Pedro Alves <palves@redhat.com> * cli/cli-logging.c (maybe_warn_already_logging): New factored out from ... (set_logging_overwrite): ... here. (logging_no_redirect_file): Delete. (set_logging_redirect): Don't handle redirection on the fly. Instead warn that "logging off" / "logging on" is necessary. (pop_output_files): Delete references to logging_no_redirect_file. (show_logging_command): Always speak in terms of what will happen once logging is reenabled. gdb/testsuite/ChangeLog: 2017-02-02 Pedro Alves <palves@redhat.com> * gdb.mi/mi-logging.exp: Add "redirect while already logging" tests.
…s_debug_type} With gdb build with -fsanitize=thread and test-case gdb.base/index-cache.exp and target board debug-types, I run into: ... (gdb) file build/gdb/testsuite/outputs/gdb.base/index-cache/index-cache Reading symbols from build/gdb/testsuite/outputs/gdb.base/index-cache/index-cache... ================== WARNING: ThreadSanitizer: data race (pid=9654) Write of size 1 at 0x7b200000420d by main thread: #0 dwarf2_per_cu_data::get_header() const gdb/dwarf2/read.c:21513 (gdb+0x8d1eee) #1 dwarf2_per_cu_data::addr_size() const gdb/dwarf2/read.c:21524 (gdb+0x8d1f4e) #2 dwarf2_cu::addr_type() const gdb/dwarf2/cu.c:112 (gdb+0x806327) #3 set_die_type gdb/dwarf2/read.c:21932 (gdb+0x8d3870) #4 read_base_type gdb/dwarf2/read.c:15448 (gdb+0x8bcacb) #5 read_type_die_1 gdb/dwarf2/read.c:19832 (gdb+0x8cc0a5) #6 read_type_die gdb/dwarf2/read.c:19767 (gdb+0x8cbe6d) #7 lookup_die_type gdb/dwarf2/read.c:19739 (gdb+0x8cbdc7) #8 die_type gdb/dwarf2/read.c:19593 (gdb+0x8cb68a) #9 read_subroutine_type gdb/dwarf2/read.c:14648 (gdb+0x8b998e) #10 read_type_die_1 gdb/dwarf2/read.c:19792 (gdb+0x8cbf2f) #11 read_type_die gdb/dwarf2/read.c:19767 (gdb+0x8cbe6d) #12 read_func_scope gdb/dwarf2/read.c:10154 (gdb+0x8a4f36) #13 process_die gdb/dwarf2/read.c:6667 (gdb+0x898daa) #14 read_file_scope gdb/dwarf2/read.c:7682 (gdb+0x89bad8) #15 process_die gdb/dwarf2/read.c:6654 (gdb+0x898ced) #16 process_full_comp_unit gdb/dwarf2/read.c:6418 (gdb+0x8981de) #17 process_queue gdb/dwarf2/read.c:5690 (gdb+0x894433) #18 dw2_do_instantiate_symtab gdb/dwarf2/read.c:1770 (gdb+0x88623a) #19 dw2_instantiate_symtab gdb/dwarf2/read.c:1792 (gdb+0x886300) #20 dw2_expand_symtabs_matching_one(dwarf2_per_cu_data*, dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>, gdb::function_view<bool (compunit_symtab*)>) gdb/dwarf2/read.c:3042 (gdb+0x88b1f1) #21 cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum, search_domain) gdb/dwarf2/read.c:16917 (gdb+0x8c228e) #22 objfile::lookup_symbol(block_enum, char const*, domain_enum) gdb/symfile-debug.c:288 (gdb+0xf39055) #23 lookup_symbol_via_quick_fns gdb/symtab.c:2385 (gdb+0xf66ab7) #24 lookup_symbol_in_objfile gdb/symtab.c:2516 (gdb+0xf6711b) #25 operator() gdb/symtab.c:2562 (gdb+0xf67272) #26 operator() gdb/../gdbsupport/function-view.h:305 (gdb+0xf776b1) #27 _FUN gdb/../gdbsupport/function-view.h:299 (gdb+0xf77708) #28 gdb::function_view<bool (objfile*)>::operator()(objfile*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xc3fc97) #29 svr4_iterate_over_objfiles_in_search_order gdb/solib-svr4.c:3455 (gdb+0xecae47) #30 gdbarch_iterate_over_objfiles_in_search_order(gdbarch*, gdb::function_view<bool (objfile*)>, objfile*) gdb/gdbarch.c:5041 (gdb+0x537cad) #31 lookup_global_or_static_symbol gdb/symtab.c:2559 (gdb+0xf674fb) #32 lookup_global_symbol(char const*, block const*, domain_enum) gdb/symtab.c:2615 (gdb+0xf67780) #33 language_defn::lookup_symbol_nonlocal(char const*, block const*, domain_enum) const gdb/symtab.c:2447 (gdb+0xf66d6e) #34 lookup_symbol_aux gdb/symtab.c:2123 (gdb+0xf65cb3) #35 lookup_symbol_in_language(char const*, block const*, domain_enum, language, field_of_this_result*) gdb/symtab.c:1931 (gdb+0xf64dab) #36 set_initial_language() gdb/symfile.c:1708 (gdb+0xf43074) #37 symbol_file_add_main_1 gdb/symfile.c:1212 (gdb+0xf41608) #38 symbol_file_command(char const*, int) gdb/symfile.c:1681 (gdb+0xf42faf) #39 file_command gdb/exec.c:554 (gdb+0x94ff29) #40 do_simple_func gdb/cli/cli-decode.c:95 (gdb+0x6d9528) #41 cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 (gdb+0x6e0f69) #42 execute_command(char const*, int) gdb/top.c:575 (gdb+0xff379c) #43 command_handler(char const*) gdb/event-top.c:552 (gdb+0x94b5bc) #44 command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:788 (gdb+0x94bc79) #45 tui_command_line_handler gdb/tui/tui-interp.c:104 (gdb+0x1034efc) #46 gdb_rl_callback_handler gdb/event-top.c:259 (gdb+0x94ab61) #47 rl_callback_read_char readline/readline/callback.c:290 (gdb+0x11be4ef) #48 gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 (gdb+0x94a960) #49 gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 (gdb+0x94aa21) #50 stdin_event_handler gdb/ui.c:155 (gdb+0x10751a0) #51 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1d95bac) #52 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1d962e4) #53 gdb_do_one_event(int) gdbsupport/event-loop.cc:264 (gdb+0x1d946d0) #54 start_event_loop gdb/main.c:412 (gdb+0xb5ab52) #55 captured_command_loop gdb/main.c:476 (gdb+0xb5ad41) #56 captured_main gdb/main.c:1320 (gdb+0xb5cec1) #57 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xb5cf70) #58 main gdb/gdb.c:32 (gdb+0x416776) Previous read of size 1 at 0x7b200000420d by thread T11: #0 write_gdbindex gdb/dwarf2/index-write.c:1229 (gdb+0x831630) #1 write_dwarf_index(dwarf2_per_bfd*, char const*, char const*, char const*, dw_index_kind) gdb/dwarf2/index-write.c:1484 (gdb+0x832897) #2 index_cache::store(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/index-cache.c:173 (gdb+0x82db8d) #3 cooked_index::maybe_write_index(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/cooked-index.c:645 (gdb+0x7f1d49) #4 operator() gdb/dwarf2/cooked-index.c:474 (gdb+0x7f0f31) #5 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x7f2a13) #6 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x700952) #7 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x7381a0) #8 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x737e91) #9 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x737b59) #10 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x738660) #11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x73825c) #12 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x733623) #13 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x732bdf) #14 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x734c4f) #15 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x733bc5) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x73300d) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x7330b2) #18 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x7330f2) #19 pthread_once <null> (libtsan.so.0+0x4457c) #20 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72f5dd) #21 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x733224) #22 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x732852) #23 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x737bef) #24 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x1dad25a) #25 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x1dacb7c) #26 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x1dadc2b) #27 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x1dad05c) #28 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x1db038e) #29 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x1db0319) #30 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x1db02ce) #31 <null> <null> (libstdc++.so.6+0xdcac2) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:21513 in dwarf2_per_cu_data::get_header() const ... The race happens when issuing the "file $exec" command. The race is between: - a worker thread writing the index cache, and in the process reading dwarf2_per_cu_data::is_debug_type, and - the main thread writing to dwarf2_per_cu_data::m_header_read_in. The two bitfields dwarf2_per_cu_data::m_header_read_in and dwarf2_per_cu_data::is_debug_type share the same bitfield container. Fix this by making dwarf2_per_cu_data::m_header_read_in a packed<bool, 1>. Tested on x86_64-linux. Approved-By: Tom Tromey <tom@tromey.com> PR symtab/30392 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30392
With gdb build with -fsanitize=thread, and the exec from test-case gdb.base/index-cache.exp, I run into: ... $ rm -f ~/.cache/gdb/*; \ gdb -q -batch -iex "set index-cache enabled on" index-cache \ -ex "print foobar" ... WARNING: ThreadSanitizer: data race (pid=23970) Write of size 1 at 0x7b200000410d by main thread: #0 dw_expand_symtabs_matching_file_matcher(dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>) gdb/dwarf2/read.c:3077 (gdb+0x7ac54e) #1 cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum, search_domain) gdb/dwarf2/read.c:16812 (gdb+0x7d039f) #2 objfile::map_symtabs_matching_filename(char const*, char const*, gdb::function_view<bool (symtab*)>) gdb/symfile-debug.c:219 (gdb+0xda5aee) #3 iterate_over_symtabs(char const*, gdb::function_view<bool (symtab*)>) gdb/symtab.c:648 (gdb+0xdc439d) #4 lookup_symtab(char const*) gdb/symtab.c:662 (gdb+0xdc44a2) #5 classify_name gdb/c-exp.y:3083 (gdb+0x61afec) #6 c_yylex gdb/c-exp.y:3251 (gdb+0x61dd13) #7 c_yyparse() build/gdb/c-exp.c.tmp:1988 (gdb+0x61f07e) #8 c_parse(parser_state*) gdb/c-exp.y:3417 (gdb+0x62d864) #9 language_defn::parser(parser_state*) const gdb/language.c:598 (gdb+0x9771c5) #10 parse_exp_in_context gdb/parse.c:414 (gdb+0xb10a9b) #11 parse_expression(char const*, innermost_block_tracker*, enum_flags<parser_flag>) gdb/parse.c:462 (gdb+0xb110ae) #12 process_print_command_args gdb/printcmd.c:1321 (gdb+0xb4bf0c) #13 print_command_1 gdb/printcmd.c:1335 (gdb+0xb4ca2a) #14 print_command gdb/printcmd.c:1468 (gdb+0xb4cd5a) #15 do_simple_func gdb/cli/cli-decode.c:95 (gdb+0x65b078) #16 cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 (gdb+0x65ed53) #17 execute_command(char const*, int) gdb/top.c:575 (gdb+0xe3a76a) #18 catch_command_errors gdb/main.c:518 (gdb+0xa1837d) #19 execute_cmdargs gdb/main.c:617 (gdb+0xa1853f) #20 captured_main_1 gdb/main.c:1289 (gdb+0xa1aa58) #21 captured_main gdb/main.c:1310 (gdb+0xa1b95a) #22 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xa1b95a) #23 main gdb/gdb.c:39 (gdb+0x42506a) Previous read of size 1 at 0x7b200000410d by thread T1: #0 write_gdbindex gdb/dwarf2/index-write.c:1214 (gdb+0x75bb30) #1 write_dwarf_index(dwarf2_per_bfd*, char const*, char const*, char const*, dw_index_kind) gdb/dwarf2/index-write.c:1469 (gdb+0x75f803) #2 index_cache::store(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/index-cache.c:173 (gdb+0x755a36) #3 cooked_index::maybe_write_index(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/cooked-index.c:642 (gdb+0x71c96d) #4 operator() gdb/dwarf2/cooked-index.c:471 (gdb+0x71c96d) #5 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x71c96d) #6 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x72a57c) #7 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x72a5db) #8 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72a5db) #9 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x72a5db) #10 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x72a5db) #11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x72a5db) #12 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x724954) #13 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x724954) #14 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x72434a) #15 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72434a) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x72434a) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x72434a) #18 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x72434a) #19 pthread_once <null> (libtsan.so.0+0x4457c) #20 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72532b) #21 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x72532b) #22 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x174568d) #23 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x174568d) #24 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x174568d) #25 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x174568d) #26 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x1748040) #27 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x1748040) #28 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x1748040) #29 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x1748040) #30 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x1748040) #31 <null> <null> (libstdc++.so.6+0xdcac2) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:3077 in dw_expand_symtabs_matching_file_matcher(dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>) ... The race happens when issuing the "file $exec" command. The race is between: - a worker thread writing the index cache, and in the process reading dwarf2_per_cu_data::is_debug_type, and - the main thread writing to dwarf2_per_cu_data::mark. The two bitfields dwarf2_per_cu_data::mark and dwarf2_per_cu_data::is_debug_type share the same bitfield container. Fix this by making dwarf2_per_cu_data::mark a packed<unsigned int, 1>. Tested on x86_64-linux. PR symtab/30718 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30718
…g_types} With gdb build with -fsanitize=thread, and the exec from test-case gdb.base/index-cache.exp, I run into: ... $ rm -f ~/.cache/gdb/*; \ gdb -q -batch -iex "set index-cache enabled on" index-cache \ -ex "print foobar" ... WARNING: ThreadSanitizer: data race (pid=25018) Write of size 1 at 0x7b200000410d by main thread: #0 dw2_get_file_names_reader gdb/dwarf2/read.c:2033 (gdb+0x7ab023) #1 dw2_get_file_names gdb/dwarf2/read.c:2130 (gdb+0x7ab023) #2 dw_expand_symtabs_matching_file_matcher(dwarf2_per_objfile*, gdb::function_view<bool (char const*, bool)>) gdb/dwarf2/read.c:3105 (gdb+0x7ac6e9) #3 cooked_index_functions::expand_symtabs_matching(objfile*, gdb::function_view<bool (char const*, bool)>, lookup_name_info const*, gdb::function_view<bool (char const*)>, gdb::function_view<bool (compunit_symtab*)>, enum_flags<block_search_flag_values>, domain_enum, search_domain) gdb/dwarf2/read.c:16812 (gdb+0x7d040f) #4 objfile::map_symtabs_matching_filename(char const*, char const*, gdb::function_view<bool (symtab*)>) gdb/symfile-debug.c:219 (gdb+0xda5b6e) #5 iterate_over_symtabs(char const*, gdb::function_view<bool (symtab*)>) gdb/symtab.c:648 (gdb+0xdc441d) #6 lookup_symtab(char const*) gdb/symtab.c:662 (gdb+0xdc4522) #7 classify_name gdb/c-exp.y:3083 (gdb+0x61afec) #8 c_yylex gdb/c-exp.y:3251 (gdb+0x61dd13) #9 c_yyparse() build/gdb/c-exp.c.tmp:1988 (gdb+0x61f07e) #10 c_parse(parser_state*) gdb/c-exp.y:3417 (gdb+0x62d864) #11 language_defn::parser(parser_state*) const gdb/language.c:598 (gdb+0x977245) #12 parse_exp_in_context gdb/parse.c:414 (gdb+0xb10b1b) #13 parse_expression(char const*, innermost_block_tracker*, enum_flags<parser_flag>) gdb/parse.c:462 (gdb+0xb1112e) #14 process_print_command_args gdb/printcmd.c:1321 (gdb+0xb4bf8c) #15 print_command_1 gdb/printcmd.c:1335 (gdb+0xb4caaa) #16 print_command gdb/printcmd.c:1468 (gdb+0xb4cdda) #17 do_simple_func gdb/cli/cli-decode.c:95 (gdb+0x65b078) #18 cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 (gdb+0x65ed53) #19 execute_command(char const*, int) gdb/top.c:575 (gdb+0xe3a7ea) #20 catch_command_errors gdb/main.c:518 (gdb+0xa183fd) #21 execute_cmdargs gdb/main.c:617 (gdb+0xa185bf) #22 captured_main_1 gdb/main.c:1289 (gdb+0xa1aad8) #23 captured_main gdb/main.c:1310 (gdb+0xa1b9da) #24 gdb_main(captured_main_args*) gdb/main.c:1339 (gdb+0xa1b9da) #25 main gdb/gdb.c:39 (gdb+0x42506a) Previous read of size 1 at 0x7b200000410d by thread T2: #0 write_gdbindex gdb/dwarf2/index-write.c:1214 (gdb+0x75bb30) #1 write_dwarf_index(dwarf2_per_bfd*, char const*, char const*, char const*, dw_index_kind) gdb/dwarf2/index-write.c:1469 (gdb+0x75f803) #2 index_cache::store(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/index-cache.c:173 (gdb+0x755a36) #3 cooked_index::maybe_write_index(dwarf2_per_bfd*, index_cache_store_context const&) gdb/dwarf2/cooked-index.c:642 (gdb+0x71c96d) #4 operator() gdb/dwarf2/cooked-index.c:471 (gdb+0x71c96d) #5 _M_invoke /usr/include/c++/7/bits/std_function.h:316 (gdb+0x71c96d) #6 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x72a57c) #7 void std::__invoke_impl<void, std::function<void ()>&>(std::__invoke_other, std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:60 (gdb+0x72a5db) #8 std::__invoke_result<std::function<void ()>&>::type std::__invoke<std::function<void ()>&>(std::function<void ()>&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72a5db) #9 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}::operator()() const /usr/include/c++/7/future:1421 (gdb+0x72a5db) #10 std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void>::operator()() const /usr/include/c++/7/future:1362 (gdb+0x72a5db) #11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) /usr/include/c++/7/bits/std_function.h:302 (gdb+0x72a5db) #12 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (gdb+0x724954) #13 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) /usr/include/c++/7/future:561 (gdb+0x724954) #14 void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x72434a) #15 std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x72434a) #16 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const /usr/include/c++/7/mutex:672 (gdb+0x72434a) #17 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::operator()() const /usr/include/c++/7/mutex:677 (gdb+0x72434a) #18 std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#2}::_FUN() /usr/include/c++/7/mutex:677 (gdb+0x72434a) #19 pthread_once <null> (libtsan.so.0+0x4457c) #20 __gthread_once /usr/include/c++/7/x86_64-suse-linux/bits/gthr-default.h:699 (gdb+0x72532b) #21 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) /usr/include/c++/7/mutex:684 (gdb+0x72532b) #22 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) /usr/include/c++/7/future:401 (gdb+0x174570d) #23 std::__future_base::_Task_state<std::function<void ()>, std::allocator<int>, void ()>::_M_run() /usr/include/c++/7/future:1423 (gdb+0x174570d) #24 std::packaged_task<void ()>::operator()() /usr/include/c++/7/future:1556 (gdb+0x174570d) #25 gdb::thread_pool::thread_function() gdbsupport/thread-pool.cc:242 (gdb+0x174570d) #26 void std::__invoke_impl<void, void (gdb::thread_pool::*)(), gdb::thread_pool*>(std::__invoke_memfun_deref, void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:73 (gdb+0x17480c0) #27 std::__invoke_result<void (gdb::thread_pool::*)(), gdb::thread_pool*>::type std::__invoke<void (gdb::thread_pool::*)(), gdb::thread_pool*>(void (gdb::thread_pool::*&&)(), gdb::thread_pool*&&) /usr/include/c++/7/bits/invoke.h:95 (gdb+0x17480c0) #28 decltype (__invoke((_S_declval<0ul>)(), (_S_declval<1ul>)())) std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/7/thread:234 (gdb+0x17480c0) #29 std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> >::operator()() /usr/include/c++/7/thread:243 (gdb+0x17480c0) #30 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (gdb::thread_pool::*)(), gdb::thread_pool*> > >::_M_run() /usr/include/c++/7/thread:186 (gdb+0x17480c0) #31 <null> <null> (libstdc++.so.6+0xdcac2) ... SUMMARY: ThreadSanitizer: data race gdb/dwarf2/read.c:2033 in dw2_get_file_names_reader ... The race happens when issuing the "file $exec" command. The race is between: - a worker thread writing the index cache, and in the process reading dwarf2_per_cu_data::is_debug_type, and - the main thread writing to dwarf2_per_cu_data::files_read. The two bitfields dwarf2_per_cu_data::files_read and dwarf2_per_cu_data::is_debug_type share the same bitfield container. Fix this by making dwarf2_per_cu_data::files_read a packed<bool, 1>. Tested on x86_64-linux. PR symtab/30718 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30718
It was pointed out on the mailing list[1] that after this commit: commit b1e0126 Date: Wed Jun 21 14:18:54 2023 +0100 gdb: don't resume vfork parent while child is still running the test gdb.base/vfork-follow-parent.exp now has some failures when run with the native-gdbserver or native-extended-gdbserver boards: FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout) The reason that these failures don't show up when run on the standard unix board is that the test is only run in the default operating mode, so for Linux this will be all-stop on top of non-stop. If we adjust the test script so that it runs in the default mode and with target-non-stop turned off, then we see the same failures on the unix board. This commit includes this change. The way that the test is written means that it is not (currently) possible to turn on non-stop mode and have the test still work, so this commit does not do that. I have also updated the test script so that the vfork child performs an exec as well as the current exit. Exec and exit are the two ways in which a vfork child can release the vfork parent, so testing both of these cases is useful I think. In this test the inferior performs a vfork and the vfork-child immediately exits. The vfork-parent will wait for the vfork-child and then blocks waiting for gdb. Once gdb has released the vfork-parent, the vfork-parent also exits. In the test that fails, GDB sets 'detach-on-fork off' and then runs to the vfork. At this point the test tries to just "continue", but this fails as the vfork-parent is still selected, and the parent can't continue until the vfork-child completes. As the vfork-child is stopped by GDB the parent will never stop once resumed, so GDB refuses to resume it. The test script then sets 'schedule-multiple on' and once again continues. This time GDB, in theory, resumes both the parent and the child, the parent will be held blocked by the kernel, but the child will run until it exits, and which point GDB stops again, this time with inferior 2, the newly exited vfork-child, selected. What happens after this in the test script is irrelevant as far as this failure is concerned. To understand why the test started failing we should consider the behaviour of four different cases: 1. All-stop-on-non-stop before commit b1e0126, 2. All-stop-on-non-stop after commit b1e0126, 3. All-stop-on-all-stop before commit b1e0126, and 4. All-stop-on-all-stop after commit b1e0126. Only case #4 is failing after commit b1e0126, but I think the other cases are interesting because, (a) they inform how we might fix the regression, and (b) it turns out the behaviour of #2 changed too with the commit, but the change was harmless. For #1 All-stop-on-non-stop before commit b1e0126, what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-non-stop, every thread is resumed individually, so GDB tries to resume both the vfork-parent and the vfork-child, both of which succeed, 3. The vfork-parent is held stopped by the kernel, 4. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 5. At this point we might take two paths depending on which event GDB handles first, if GDB handles the VFORK_DONE first then: (a) As GDB is controlling both parent and child the VFORK_DONE is ignored (see handle_vfork_done), the vfork-parent will be resumed, (b) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. Alternatively, if GDB selects the EXITED event first then: (c) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. (d) At some future time the user resumes the vfork-parent, at which point the VFORK_DONE is reported to GDB, however, GDB is ignoring the VFORK_DONE (see handle_vfork_done), so the parent is resumed. For case #2, all-stop-on-non-stop after commit b1e0126, the important difference is in step (2) above, now, instead of resuming both the vfork-parent and the vfork-child, only the vfork-child is resumed. As such, when we get to step (5), only a single event, the EXITED event is reported. GDB handles the EXITED just as in (5)(c), then, later, when the user resumes the vfork-parent, the VFORKED_DONE is immediately delivered from the kernel, but this is ignored just as in (5)(d), and so, though the pattern of when the vfork-parent is resumed changes, the overall pattern of which events are reported and when, doesn't actually change. In fact, by not resuming the vfork-parent, the order of events (in this test) is now deterministic, which (maybe?) is a good thing. If we now consider case #3, all-stop-on-all-stop before commit b1e0126, then what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-all-stop, the resume is passed down to the linux-nat target, the vfork-parent is the event thread, while the vfork-child is a sibling of the event thread, 3. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this causes the vfork-child to be resumed. Then in linux_nat_target::resume, the event thread, the vfork-parent, is also resumed. 4. The vfork-parent is held stopped by the kernel, 5. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 6. We are now in a situation identical to step (5) as for all-stop-on-non-stop above, GDB selects one of the events to handle, and whichever we select the user sees the correct behaviour. And so, finally, we can consider #4, all-stop-on-all-stop after commit b1e0126, this is the case that started failing. We start out just like above, in proceed, the resume_ptid is -1 (resume everything), due to schedule multiple being on. And just like above, due to the target being all-stop, we call proceed_resume_thread_checked just once, for the current thread, which, remember, is the vfork-parent thread. The change in commit b1e0126 was to avoid resuming a vfork-parent thread, read the commit message for the justification for this change. However, this means that GDB now rejects resuming the vfork-parent in this case, which means that nothing gets resumed! Obviously, if nothing resumes, then nothing will ever stop, and so GDB appears to hang. I considered a couple of solutions which, in the end, I didn't go with, these were: 1. Move the vfork-parent check out of proceed_resume_thread_checked, and place it in proceed, but only on the all-stop-on-non-stop path, this should still address the issue seen in b1e0126, but would avoid the issue seen here. I rejected this just because it didn't feel great to split the checks that exist in proceed_resume_thread_checked like this, 2. Extend the condition in proceed_resume_thread_checked by adding a target_is_non_stop_p check. This would have the same effect as idea 1, but leaves all the checks in the same place, which I think would be better, but this still just didn't feel right to me, and so, What I noticed was that for the all-stop-on-non-stop, after commit b1e0126, we only resumed the vfork-child, and this seems fine. The vfork-parent isn't going to run anyway (the kernel will hold it back), so if feels like we there's no harm in just waiting for the child to complete, and then resuming the parent. So then I started looking at follow_fork, which is called from the top of proceed. This function already has the task of switching between the parent and child based on which the user wishes to follow. So, I wondered, could we use this to switch to the vfork-child in the case that we are attached to both? Turns out this is pretty simple to do. Having done that, now the process is for all-stop-on-all-stop after commit b1e0126, and with this new fix is: 1. GDB calls proceed with the vfork-parent selected, but, 2. In follow_fork, and follow_fork_inferior, GDB switches the selected thread to be that of the vfork-child, 3. Back in proceed user_visible_resume_ptid returns -1 (everything) as the resume_ptid still, but now, 4. When GDB calls proceed_resume_thread_checked, the vfork-child is the current selected thread, this is not a vfork-parent, and so GDB allows the proceed to continue to the linux-nat target, 5. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this does not resume the vfork-parent (because it is a vfork-parent), and then the vfork-child is resumed as this is the event thread, At this point we are back in the same situation as for all-stop-on-non-stop after commit b1e0126, that is, the vfork-child is resumed, while the vfork-parent is held stopped by GDB. Eventually the vfork-child will exit or exec, at which point the vfork-parent will be resumed. [1] https://inbox.sourceware.org/gdb-patches/3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca/
After running a number of programs under Windows gdb and detaching them, I typed run in gdb, and got a hang, here: (top-gdb) bt #0 sharing_input_terminal (pid=4672) at /home/pedro/gdb/src/gdb/mingw-hdep.c:388 #1 0x00007ff71a2d8678 in sharing_input_terminal (inf=0x23bf23dafb0) at /home/pedro/gdb/src/gdb/inflow.c:269 #2 0x00007ff71a2d887b in child_terminal_save_inferior (self=0x23bf23de060) at /home/pedro/gdb/src/gdb/inflow.c:423 #3 0x00007ff71a2c80c0 in inf_child_target::terminal_save_inferior (this=0x23bf23de060) at /home/pedro/gdb/src/gdb/inf-child.c:111 #4 0x00007ff71a429c0f in target_terminal_is_ours_kind (desired_state=target_terminal_state::is_ours_for_output) at /home/pedro/gdb/src/gdb/target.c:1037 #5 0x00007ff71a429e02 in target_terminal::ours_for_output () at /home/pedro/gdb/src/gdb/target.c:1094 #6 0x00007ff71a2ccc8e in post_create_inferior (from_tty=0) at /home/pedro/gdb/src/gdb/infcmd.c:245 #7 0x00007ff71a2cd431 in run_command_1 (args=0x0, from_tty=0, run_how=RUN_NORMAL) at /home/pedro/gdb/src/gdb/infcmd.c:502 #8 0x00007ff71a2cd58b in run_command (args=0x0, from_tty=0) at /home/pedro/gdb/src/gdb/infcmd.c:527 The problem is that the loop around GetConsoleProcessList looped forever, because there were exactly 10 processes to return. GetConsoleProcessList's documentation says: If the buffer is too small to hold all the valid process identifiers, the return value is the required number of array elements. The function will have stored no identifiers in the buffer. In this situation, use the return value to allocate a buffer that is large enough to store the entire list and call the function again. In this case, the buffer wasn't too small, it was exactly the right size, so we should have broken out of the loop. We didn't due to a "<" check that should have been "<=". That is fixed by this patch. Approved-By: Tom Tromey <tom@tromey.com> Reviewed-By: Eli Zaretskii <eliz@gnu.org> Change-Id: I14e4909f2ac2fa83d0d9b6e64418b5831ac4e4e3
This commit fixes an issue that was discovered while writing the tests for the previous commit. I noticed that, when GDB restarts an inferior, the executable_changed event would trigger twice. The first notification would originate from: #0 exec_file_attach (filename=0x4046680 "/tmp/hello.x", from_tty=0) at ../../src/gdb/exec.c:513 #1 0x00000000006f3adb in reopen_exec_file () at ../../src/gdb/corefile.c:122 #2 0x0000000000e6a3f2 in generic_mourn_inferior () at ../../src/gdb/target.c:3682 #3 0x0000000000995121 in inf_child_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/inf-child.c:192 #4 0x0000000000995cff in inf_ptrace_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/inf-ptrace.c:125 #5 0x0000000000a32472 in linux_nat_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3609 #6 0x0000000000e68a40 in target_mourn_inferior (ptid=...) at ../../src/gdb/target.c:2761 #7 0x0000000000a323ec in linux_nat_target::kill (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3593 #8 0x0000000000e64d1c in target_kill () at ../../src/gdb/target.c:924 #9 0x00000000009a19bc in kill_if_already_running (from_tty=1) at ../../src/gdb/infcmd.c:328 #10 0x00000000009a1a6f in run_command_1 (args=0x0, from_tty=1, run_how=RUN_STOP_AT_MAIN) at ../../src/gdb/infcmd.c:381 #11 0x00000000009a20a5 in start_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:527 #12 0x000000000068dc5d in do_simple_func (args=0x0, from_tty=1, c=0x35c7200) at ../../src/gdb/cli/cli-decode.c:95 While the second originates from: #0 exec_file_attach (filename=0x3d7a1d0 "/tmp/hello.x", from_tty=0) at ../../src/gdb/exec.c:513 #1 0x0000000000dfe525 in reread_symbols (from_tty=1) at ../../src/gdb/symfile.c:2517 #2 0x00000000009a1a98 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_STOP_AT_MAIN) at ../../src/gdb/infcmd.c:398 #3 0x00000000009a20a5 in start_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:527 #4 0x000000000068dc5d in do_simple_func (args=0x0, from_tty=1, c=0x35c7200) at ../../src/gdb/cli/cli-decode.c:95 In the first case the call to exec_file_attach first passes through reopen_exec_file. The reopen_exec_file performs a modification time check on the executable file, and only calls exec_file_attach if the executable has changed on disk since it was last loaded. However, in the second case things work a little differently. In this case GDB is really trying to reread the debug symbol. As such, we iterate over the objfiles list, and for each of those we check the modification time, if the file on disk has changed then we reload the debug symbols from that file. However, there is an additional check, if the objfile has the same name as the executable then we will call exec_file_attach, but we do so without checking the cached modification time that indicates when the executable was last reloaded, as a result, we reload the executable twice. In this commit I propose that reread_symbols be changed to unconditionally call reopen_exec_file before performing the objfile iteration. This will ensure that, if the executable has changed, then the executable will be reloaded, however, if the executable has already been recently reloaded, we will not reload it for a second time. After handling the executable, GDB can then iterate over the objfiles list and reload them in the normal way. With this done I now see the executable reloaded only once when GDB restarts an inferior, which means I can remove the kfail that I added to the gdb.python/py-exec-file.exp test in the previous commit. Approved-By: Tom Tromey <tom@tromey.com>
It was pointed out on the mailing list that a recently added test (gdb.python/py-progspace-events.exp) was failing when run with the native-extended-gdbserver board. This test was added with this commit: commit 59912fb Date: Tue Sep 19 11:45:36 2023 +0100 gdb: add Python events for program space addition and removal It turns out though that the test is failing due to a existing bug in GDB, the new test just exposes the problem. Additionally, the failure really doesn't even rely on the new functionality added in the above commit. I reduced the test to a simple set of steps that reproduced the failure and tested against GDB 13, and the test passes; so the bug was introduced since then. In fact, the bug was introduced with this commit: commit a282736 Date: Fri Sep 8 15:48:16 2023 +0100 gdb: remove final user of the executable_changed observer This commit changed how the per-inferior auxv data cache is managed, specifically, when the cache is cleared, and it is this that leads to the failure. This bug is interesting because it exposes a number of issues with GDB, I'll explain all of the problems I see, though ultimately, I only propose fixing one problem in this commit, which is enough to resolve the crash we are currently seeing. The crash that we are seeing manifests like this: ... [Inferior 2 (process 3970384) exited normally] +inferior 1 [Switching to inferior 1 [process 3970383] (/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events)] [Switching to thread 1.1 (Thread 3970383.3970383)] #0 breakpt () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-progspace-events.c:28 28 { /* Nothing. */ } (gdb) step +step terminate called after throwing an instance of 'gdb_exception_error' Fatal signal: Aborted ... etc ... What's happening is that GDB attempts to refill the auxv cache as a result of the gdbarch_has_shared_address_space call in program_space::~program_space, the backtrace looks like this: #0 0x00007fb4f419a9a5 in raise () from /lib64/libpthread.so.0 #1 0x00000000008b635d in handle_fatal_signal (sig=6) at ../../src/gdb/event-top.c:912 #2 <signal handler called> #3 0x00007fb4f38e3625 in raise () from /lib64/libc.so.6 #4 0x00007fb4f38cc8d9 in abort () from /lib64/libc.so.6 #5 0x00007fb4f3c70756 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6 #6 0x00007fb4f3c7c6dc in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 #7 0x00007fb4f3c7b6e9 in __cxa_call_terminate () from /lib64/libstdc++.so.6 #8 0x00007fb4f3c7c094 in __gxx_personality_v0 () from /lib64/libstdc++.so.6 #9 0x00007fb4f3a80c63 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1 #10 0x00007fb4f3a8154e in _Unwind_Resume () from /lib64/libgcc_s.so.1 #11 0x0000000000e8832d in target_read_alloc_1<unsigned char> (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2266 #12 0x0000000000e73dea in target_read_alloc (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2315 #13 0x000000000058248c in target_read_auxv_raw (ops=0x408a3a0) at ../../src/gdb/auxv.c:379 #14 0x000000000058243d in target_read_auxv () at ../../src/gdb/auxv.c:368 #15 0x000000000058255c in target_auxv_search (match=0x0, valp=0x7ffdee17c598) at ../../src/gdb/auxv.c:415 #16 0x0000000000a464bb in linux_is_uclinux () at ../../src/gdb/linux-tdep.c:433 #17 0x0000000000a464f6 in linux_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/linux-tdep.c:440 #18 0x0000000000510eae in gdbarch_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/gdbarch.c:4889 #19 0x0000000000bc7558 in program_space::~program_space (this=0x4544aa0, __in_chrg=<optimized out>) at ../../src/gdb/progspace.c:124 #20 0x00000000009b245d in delete_inferior (inf=0x47b3de0) at ../../src/gdb/inferior.c:290 #21 0x00000000009b2c10 in prune_inferiors () at ../../src/gdb/inferior.c:480 #22 0x00000000009c5e3e in fetch_inferior_event () at ../../src/gdb/infrun.c:4558 #23 0x000000000099b4dc in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42 #24 0x0000000000cbc64f in remote_async_serial_handler (scb=0x4090a30, context=0x408a6b0) at ../../src/gdb/remote.c:14859 #25 0x0000000000d83d3a in run_async_handler_and_reschedule (scb=0x4090a30) at ../../src/gdb/ser-base.c:138 #26 0x0000000000d83e1f in fd_event (error=0, context=0x4090a30) at ../../src/gdb/ser-base.c:189 So this is problem #1, if we throw an exception while deleting a program_space then this is not caught, and is going to crash GDB. Problem #2 becomes evident when we ask why GDB is throwing an error in this case; the error is thrown because the remote target, operating in non-async mode, can't read the auxv data while an inferior is running and GDB is waiting for a stop reply. The problem here then, is why does GDB get into a position where it tries to interact with the remote target in this way, at this time? The problem is caused by the prune_inferiors call which can be seen in the above backtrace. In prune_inferiors we check if the inferior is deletable, and if it is, we delete it. The problem is, I think, we should also check if the target is currently in a state that would allow us to delete the inferior. We don't currently have such a check available, we'd need to add one, but for the remote target, this would return false if the remote is in async mode and the remote is currently waiting for a stop reply. With this change in place GDB would defer deleting the inferior until the remote target has stopped, at which point GDB would be able to refill the auxv cache successfully. And then, problem #3 becomes evident when we ask why GDB is needing to refill the auxv cache now when it didn't need to for GDB 13. This is where the second commit mentioned above (a282736) comes in. Prior to this commit, the auxv cache was cleared by the executable_changed observer, while after that commit the auxv cache was cleared by the new_objfile observer -- but only when the new_objfile observer is used in the special mode that actually means that all objfiles have been unloaded (I know, the overloading of the new_objfile observer is horrible, and unnecessary, but it's not really important for this bug). The difference arises because the new_objfile observer is triggered from clear_symtab_users, which in turn is called from program_space::~program_space. The new_objfile observer for auxv does this: static void auxv_new_objfile_observer (struct objfile *objfile) { if (objfile == nullptr) invalidate_auxv_cache_inf (current_inferior ()); } That is, when all the objfiles are unloaded, we clear the auxv cache for the current inferior. The problem is, then when we look at the prune_inferiors -> delete_inferior -> ~program_space path, we see that the current inferior is not going to be an inferior that exists within the program_space being deleted; delete_inferior removes the deleted inferior from the global inferior list, and then only deletes the program_space if program_space::empty() returns true, which is only the case if the current inferior isn't within the program_space to delete, and no other inferior exists within that program_space either. What this means is that when the new_objfile observer is called we can't rely on the current inferior having any relationship with the program space in which the objfiles were removed. This was an error in the commit a282736, the only thing we can rely on is the current program space. As a result of this mistake, after commit a282736, GDB was sometimes clearing the auxv cache for a random inferior. In the native target case this was harmless as we can always refill the cache when needed, but in the remote target case, if we need to refill the cache when the remote target is executing, then we get the crash we observed. And additionally, if we think about this a little more, we see that commit a282736 made another mistake. When all the objfiles are removed, they are removed from a program_space, a program_space might contain multiple inferiors, so surely, we should clear the auxv cache for all of the matching inferiors? Given these two insights, that the current_inferior is not relevant, only the current_program_space, and that we should be clearing the cache for all inferiors in the current_program_space, we can update auxv_new_objfile_observer to: if (objfile == nullptr) { for (inferior *inf : all_inferiors ()) { if (inf->pspace == current_program_space) invalidate_auxv_cache_inf (inf); } } With this change we now correctly clear the auxv cache for the correct inferiors, and GDB no longer needs to refill the cache at an inconvenient time, this avoids the crash we were seeing. And finally, we reach problem #4. Inspired by the observation that using the current_inferior from within the ~program_space function was not correct, I added some debug to see if current_inferior() was called anywhere else (below ~program_space), and the answer is yes, it's called a often. Mostly the culprit is GDB doing: current_inferior ()->top_target ()-> .... But I think all of these calls are most likely doing the wrong thing, and only work because the top target in all these cases is shared between all inferiors, e.g. it's the native target, or the remote target for all inferiors. But if we had a truly multi-connection setup, then we might start to see odd behaviour. Problem #1 I'm just ignoring for now, I guess at some point we might run into this again, and then we'd need to solve this. But in this case I wasn't sure what a "good" solution would look like. We need the auxv data in order to implement the linux_is_uclinux() function. If we can't get the auxv data then what should we do, assume yes, or assume no? The right answer would probably be to propagate the error back up the stack, but then we reach ~program_space, and throwing exceptions from a destructor is problematic, so we'd need to catch and deal at this point. The linux_is_uclinux() call is made from within gdbarch_has_shared_address_space(), which is used like: if (!gdbarch_has_shared_address_space (target_gdbarch ())) delete this->aspace; So, we would have to choose; delete the address space or not. If we delete it on error, then we might delete an address space that is shared within another program space. If we don't delete the address space, then we might leak it. Neither choice is great. A better solution might be to have the address spaces be reference counted, then we could remove the gdbarch_has_shared_address_space call completely, and just rely on the reference count to auto-delete the address space when appropriate. The solution for problem #2 I already hinted at above, we should have a new target_can_delete_inferiors() call, which should be called from prune_inferiors, this would prevent GDB from trying to delete inferiors when a (remote) target is in a state where we know it can't delete the inferior. Deleting an inferior often (always?) requires sending packets to the remote, and if the remote is waiting for a stop reply then this will never work, so the pruning should be deferred in this case. The solution for problem #3 is included in this commit. And, for problem #4, I'm not sure what the right solution is. Maybe delete_inferior should ensure the inferior to be deleted is in place when ~program_space is called? But that seems a little weird, as the current inferior would, in theory, still be using the current program_space... Anyway, after this commit, the gdb.python/py-progspace-events.exp test now passes when run with the native-extended-remote board. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30935 Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I41f0e6e2d7ecc1e5e55ec170f37acd4052f46eaf
Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 #2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 #3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>
I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Running the gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase added later in the series against gdbserver, after the TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run into an infinite loop in stop_all_threads, leading to a timeout: FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout) The is really a latent bug, and it is about the fact that stop_all_threads stops listening to events from a target as soon as it sees a TARGET_WAITKIND_NO_RESUMED, ignoring that TARGET_WAITKIND_NO_RESUMED may be delayed. handle_no_resumed knows how to handle delayed no-resumed events, but stop_all_threads was never taught to. In more detail, here's what happens with that testcase: #1 - Multiple threads report breakpoint hits to gdb. #2 - gdb picks one events, and it's for thread 1. All other stops are left pending. thread 1 needs to move past a breakpoint, so gdb stops all threads to start an inline step over for thread 1. While stopping threads, some of the threads that were still running report events that are also left pending. #2 - gdb steps thread 1 #3 - Thread 1 exits while stepping (it steps over an exit syscall), gdbserver reports thread exit for thread 1 #4 - Thread 1 was the last resumed thread, so gdbserver also reports no-resumed: [remote] Notification received: Stop:w0;p3445d0.3445d3 [remote] Sending packet: $vStopped#55 [remote] Packet received: N [remote] Sending packet: $vStopped#55 [remote] Packet received: OK #5 - gdb processes the thread exit for thread 1, finishes the step over and restarts threads. #6 - gdb picks the next event to process out of one of the resumed threads with pending events: [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11 #7 - This is again a breakpoint hit and the breakpoint needs to be stepped over too, so gdb starts a step-over dance again. #8 - We reach stop_all_threads, which finds that some threads need to be stopped. #9 - wait_one finally consumes the no-resumed event queue by #4. Seeing this, wait_one disable target async, to stop listening for events out of the remote target. #10 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. #11 - Because the remote target is no longer async, and there are no other targets, wait_one return no-resumed immediately without polling the remote target. #12 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. goto #11, looping forever. Fix this by explicitly enabling/re-enabling target async on targets that can async, before waiting for stops. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f
On aarch64-linux, with gcc 13.2.1, I run into: ... (gdb) backtrace^M #0 break_here () at solib-search.c:30^M #1 0x0000fffff7f20194 in lib2_func4 () at solib-search-lib2.c:50^M #2 0x0000fffff7f70194 in lib1_func3 () at solib-search-lib1.c:50^M #3 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M #4 0x0000fffff7f70174 in lib1_func1 () at solib-search-lib1.c:30^M #5 0x00000000004101b4 in main () at solib-search.c:23^M (gdb) PASS: gdb.base/solib-search.exp: \ backtrace (with wrong libs) (data collection) FAIL: gdb.base/solib-search.exp: backtrace (with wrong libs) ... The FAIL is generated by this code in the test-case: ... if { $expect_fail } { # If the backtrace output is correct the test isn't sufficiently # testing what it should. if { $count == $total_expected } { set fail 1 } ... The test-case: - builds two versions of two shared libs, a "right" and "wrong" version, the difference being an additional dummy function (called spacer function), - uses the "right" version to generate a core file, - uses the "wrong" version to interpret the core file, and - generates a backtrace. The intent is that the backtrace is incorrect due to using the "wrong" version, but actually it's correct. This is because the spacer functions aren't large enough. Fix this by increasing the size of the spacer functions by adding a dummy loop, after which we have, as expected, an incorrect backtrace: ... (gdb) backtrace^M #0 break_here () at solib-search.c:30^M #1 0x0000fffff7f201c0 in ?? ()^M #2 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M #3 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M #4 0x0000fffff7f70174 in lib1_func1 () at solib-search-lib1.c:30^M #5 0x00000000004101b4 in main () at solib-search.c:23^M (gdb) PASS: gdb.base/solib-search.exp: \ backtrace (with wrong libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with wrong libs) ... Tested on aarch64-linux.
The testsuite for SCFI contains target-specific tests. When a test is executed with --scfi=experimental command line option, the CFI annotations in the test .s files are skipped altogether by the GAS for processing. The CFI directives in the input assembly files are, however, validated by running the assembler one more time without --scfi=experimental. Some testcases are used to highlight those asm constructs that the SCFI machinery in GAS currently does not support: - Only System V AMD64 ABI is supported for now. Using either --32 or --x32 with SCFI results in hard error. See scfi-unsupported-1.s. - Untraceable stack-pointer manipulation in function epilougue and prologue. See scfi-unsupported-2.s. - Using Dynamically Realigned Arguement Pointer (DRAP) register to realign the stack. For SCFI, the CFA must be only REG_SP or REG_FP based. See scfi-unsupported-drap-1.s Some testcases are used to highlight some diagnostics that the SCFI machinery in GAS currently issues, with an intent to help user correct inadvertent errors in their hand-written asm. An error is issued when GAS finds that input asm is not amenable to correct CFI synthesis. - (#1) "Warning: SCFI: Asymetrical register restore" - (#2) "Error: SCFI: usage of REG_FP as scratch not supported" - (#3) "Error: SCFI: unsupported stack manipulation pattern" In case of (#2) and (#3), SCFI generation is skipped for the respective function. Above is a subset of the warnings/errors implemented in the code. gas/testsuite/: * gas/scfi/README: New test. * gas/scfi/x86_64/ginsn-add-1.l: New test. * gas/scfi/x86_64/ginsn-add-1.s: New test. * gas/scfi/x86_64/ginsn-dw2-regnum-1.l: New test. * gas/scfi/x86_64/ginsn-dw2-regnum-1.s: New test. * gas/scfi/x86_64/ginsn-pop-1.l: New test. * gas/scfi/x86_64/ginsn-pop-1.s: New test. * gas/scfi/x86_64/ginsn-push-1.l: New test. * gas/scfi/x86_64/ginsn-push-1.s: New test. * gas/scfi/x86_64/scfi-add-1.d: New test. * gas/scfi/x86_64/scfi-add-1.l: New test. * gas/scfi/x86_64/scfi-add-1.s: New test. * gas/scfi/x86_64/scfi-add-2.d: New test. * gas/scfi/x86_64/scfi-add-2.l: New test. * gas/scfi/x86_64/scfi-add-2.s: New test. * gas/scfi/x86_64/scfi-asm-marker-1.d: New test. * gas/scfi/x86_64/scfi-asm-marker-1.l: New test. * gas/scfi/x86_64/scfi-asm-marker-1.s: New test. * gas/scfi/x86_64/scfi-asm-marker-2.d: New test. * gas/scfi/x86_64/scfi-asm-marker-2.l: New test. * gas/scfi/x86_64/scfi-asm-marker-2.s: New test. * gas/scfi/x86_64/scfi-asm-marker-3.d: New test. * gas/scfi/x86_64/scfi-asm-marker-3.l: New test. * gas/scfi/x86_64/scfi-asm-marker-3.s: New test. * gas/scfi/x86_64/scfi-bp-sp-1.d: New test. * gas/scfi/x86_64/scfi-bp-sp-1.l: New test. * gas/scfi/x86_64/scfi-bp-sp-1.s: New test. * gas/scfi/x86_64/scfi-bp-sp-2.d: New test. * gas/scfi/x86_64/scfi-bp-sp-2.l: New test. * gas/scfi/x86_64/scfi-bp-sp-2.s: New test. * gas/scfi/x86_64/scfi-callee-saved-1.d: New test. * gas/scfi/x86_64/scfi-callee-saved-1.l: New test. * gas/scfi/x86_64/scfi-callee-saved-1.s: New test. * gas/scfi/x86_64/scfi-callee-saved-2.d: New test. * gas/scfi/x86_64/scfi-callee-saved-2.l: New test. * gas/scfi/x86_64/scfi-callee-saved-2.s: New test. * gas/scfi/x86_64/scfi-callee-saved-3.d: New test. * gas/scfi/x86_64/scfi-callee-saved-3.l: New test. * gas/scfi/x86_64/scfi-callee-saved-3.s: New test. * gas/scfi/x86_64/scfi-callee-saved-4.d: New test. * gas/scfi/x86_64/scfi-callee-saved-4.l: New test. * gas/scfi/x86_64/scfi-callee-saved-4.s: New test. * gas/scfi/x86_64/scfi-cfg-1.d: New test. * gas/scfi/x86_64/scfi-cfg-1.l: New test. * gas/scfi/x86_64/scfi-cfg-1.s: New test. * gas/scfi/x86_64/scfi-cfg-2.d: New test. * gas/scfi/x86_64/scfi-cfg-2.l: New test. * gas/scfi/x86_64/scfi-cfg-2.s: New test. * gas/scfi/x86_64/scfi-cfi-label-1.d: New test. * gas/scfi/x86_64/scfi-cfi-label-1.l: New test. * gas/scfi/x86_64/scfi-cfi-label-1.s: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.d: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.l: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.s: New test. * gas/scfi/x86_64/scfi-cofi-1.d: New test. * gas/scfi/x86_64/scfi-cofi-1.l: New test. * gas/scfi/x86_64/scfi-cofi-1.s: New test. * gas/scfi/x86_64/scfi-diag-1.l: New test. * gas/scfi/x86_64/scfi-diag-1.s: New test. * gas/scfi/x86_64/scfi-diag-2.l: New test. * gas/scfi/x86_64/scfi-diag-2.s: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.d: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.l: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.s: New test. * gas/scfi/x86_64/scfi-enter-1.d: New test. * gas/scfi/x86_64/scfi-enter-1.l: New test. * gas/scfi/x86_64/scfi-enter-1.s: New test. * gas/scfi/x86_64/scfi-fp-diag-2.l: New test. * gas/scfi/x86_64/scfi-fp-diag-2.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-5.s: New test. * gas/scfi/x86_64/scfi-lea-1.d: New test. * gas/scfi/x86_64/scfi-lea-1.l: New test. * gas/scfi/x86_64/scfi-lea-1.s: New test. * gas/scfi/x86_64/scfi-leave-1.d: New test. * gas/scfi/x86_64/scfi-leave-1.l: New test. * gas/scfi/x86_64/scfi-leave-1.s: New test. * gas/scfi/x86_64/scfi-pushq-1.d: New test. * gas/scfi/x86_64/scfi-pushq-1.l: New test. * gas/scfi/x86_64/scfi-pushq-1.s: New test. * gas/scfi/x86_64/scfi-pushsection-1.d: New test. * gas/scfi/x86_64/scfi-pushsection-1.l: New test. * gas/scfi/x86_64/scfi-pushsection-1.s: New test. * gas/scfi/x86_64/scfi-pushsection-2.d: New test. * gas/scfi/x86_64/scfi-pushsection-2.l: New test. * gas/scfi/x86_64/scfi-pushsection-2.s: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.d: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.l: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.s: New test. * gas/scfi/x86_64/scfi-simple-1.d: New test. * gas/scfi/x86_64/scfi-simple-1.l: New test. * gas/scfi/x86_64/scfi-simple-1.s: New test. * gas/scfi/x86_64/scfi-simple-2.d: New test. * gas/scfi/x86_64/scfi-simple-2.l: New test. * gas/scfi/x86_64/scfi-simple-2.s: New test. * gas/scfi/x86_64/scfi-sub-1.d: New test. * gas/scfi/x86_64/scfi-sub-1.l: New test. * gas/scfi/x86_64/scfi-sub-1.s: New test. * gas/scfi/x86_64/scfi-sub-2.d: New test. * gas/scfi/x86_64/scfi-sub-2.l: New test. * gas/scfi/x86_64/scfi-sub-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-2.l: New test. * gas/scfi/x86_64/scfi-unsupported-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-3.l: New test. * gas/scfi/x86_64/scfi-unsupported-3.s: New test. * gas/scfi/x86_64/scfi-unsupported-4.l: New test. * gas/scfi/x86_64/scfi-unsupported-4.s: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-2.l: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-drap-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-drap-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-insn-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-insn-1.s: New test. * gas/scfi/x86_64/scfi-x86-64.exp: New file.
Bug PR gdb/28313 describes attaching to a process when the executable has been deleted. The bug is for S390 and describes how a user sees a message 'PC not saved'. On x86-64 (GNU/Linux) I don't see a 'PC not saved' message, but instead I see this: (gdb) attach 901877 Attaching to process 901877 No executable file now. warning: Could not load vsyscall page because no executable was specified 0x00007fa9d9c121e7 in ?? () (gdb) bt #0 0x00007fa9d9c121e7 in ?? () #1 0x00007fa9d9c1211e in ?? () #2 0x0000000000000007 in ?? () #3 0x000000002dc8b18d in ?? () #4 0x0000000000000000 in ?? () (gdb) Notice that the addresses in the backtrace don't seem right, quickly heading to 0x7 and finally ending at 0x0. What's going on, in both the s390 case and the x86-64 case is that the architecture's prologue scanner is going wrong and causing the stack unwinding to fail. The prologue scanner goes wrong because GDB has no unwind information. And GDB has no unwind information because, of course, the executable has been deleted. Notice in the example session above we get this line in the output: No executable file now. which indicates that GDB failed to find an executable to debug. For GNU/Linux when GDB tries to find an executable for a given pid we end up calling linux_proc_pid_to_exec_file in gdb/nat/linux-procfs.c. Within this function we call `readlink` on /proc/PID/exe to find the path of the actual executable. If the `readlink` call fails then we already fallback on using /proc/PID/exe as the path to the executable to debug. However, when the executable has been deleted the `readlink` call doesn't fail, but the path that is returned points to a non-existent file. I propose that we add an `access` call to linux_proc_pid_to_exec_file to check that the target file exists and can be read. If the target can't be read then we should fall back to /proc/PID/exe (assuming that /proc/PID/exe can be read). Now on x86-64 the output looks like this: (gdb) attach 901877 Attaching to process 901877 Reading symbols from /proc/901877/exe... Reading symbols from /lib64/libc.so.6... (No debugging symbols found in /lib64/libc.so.6) Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007fa9d9c121e7 in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007fa9d9c121e7 in nanosleep () from /lib64/libc.so.6 #1 0x00007fa9d9c1211e in sleep () from /lib64/libc.so.6 #2 0x000000000040117e in spin_forever () at attach-test.c:17 #3 0x0000000000401198 in main () at attach-test.c:24 (gdb) which is much better. I've also tagged the bug PR gdb/29782 which concerns the test gdb.server/connect-with-no-symbol-file.exp. After making this change, when running gdb.server/connect-with-no-symbol-file.exp GDB would now pick up the /proc/PID/exe file as the executable in some cases. As GDB is not restarted for the multiple iterations of this test GDB (or rather BFD) would given a warning/error like: (gdb) PASS: gdb.server/connect-with-no-symbol-file.exp: sysroot=target:: action=permission: setup: disconnect set sysroot target: BFD: reopening /proc/3283001/exe: No such file or directory (gdb) FAIL: gdb.server/connect-with-no-symbol-file.exp: sysroot=target:: action=permission: setup: adjust sysroot What's happening is that an executable found for an earlier iteration of the test is still registered for the inferior when we are setting up for a second iteration of the test. When the sysroot changes, if there's an executable registered GDB tries to reopen it, but in this case the file has disappeared (the previous inferior has exited by this point). I did think about maybe, when the executable is /proc/PID/exe, we should auto-delete the file from the inferior. But in the end I thought this was a bad idea. Not only would this require a lot of special code in GDB just to support this edge case: we'd need to track if the exe file name came from /proc and should be auto-deleted, or we'd need target specific code to check if a path should be auto-deleted..... ... in addition, we'd still want to warn the user when we auto-deleted the file from the inferior, otherwise they might be surprised to find their inferior suddenly has no executable attached, so we wouldn't actually reduce the number of warnings the user sees. So in the end I figured that the best solution is to just update the test to avoid the warning. This is easily done by manually removing the executable from the inferior once each iteration of the test has completed. Now, in bug PR gdb/29782 GDB is clearly managing to pick up an executable from the NFS cache somehow. I guess what's happening is that when the original file is deleted /proc/PID/exe is actually pointing to a file in the NFS cache which is only deleted at some later point, and so when GDB starts up we do manage to associate a file with the inferior, this results in the same message being emitted from BFD as I was seeing. The fix included in this commit should also fix that bug. One final note: On x86-64 GNU/Linux, the gdb.server/connect-with-no-symbol-file.exp test will produce 2 core files. This is due to a bug in gdbserver that is nothing to do with this test. These core files are created before and after this commit. I am working on a fix for the gdbserver issue, but will post that separately. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28313 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29782 Approved-By: Tom Tromey <tom@tromey.com>
On arm-linux the linaro CI occasionally reports: ... (gdb) up 10 #4 0x0001b864 in pthread_join () (gdb) FAIL: gdb.threads/staticthreads.exp: up 10 ... while this is expected: ... (gdb) up 10 #3 0x00010568 in main (argc=1, argv=0xfffeede4) at staticthreads.c:76 76 pthread_join (thread, NULL); (gdb) PASS: gdb.threads/staticthreads.exp: up 10 ... Thiago investigated the problem, and using valgrind found an invalid read in arm_exidx_fill_cache. The problem happens as follows: - an objfile and corresponding per_bfd are allocated - some memory is allocated in arm_exidx_new_objfile using objfile->objfile_obstack, for the "exception table entry cache". - a symbol reread is triggered, and the objfile, including the objfile_obstack, is destroyed - a new objfile is allocated, using the same per_bfd - again arm_exidx_new_objfile is called, but since the same per_bfd is used, it doesn't allocate any new memory for the "exception table entry cache". - the "exception table entry cache" is accessed by arm_exidx_fill_cache, and we have a use-after-free. This is a regression since commit a2726d4 ("[ARM] Store exception handling information per-bfd instead of per-objfile"), which changed the "exception table entry cache" from per-objfile to per-bfd, but failed to update the obstack_alloc. Fix this by using objfile->per_bfd->storage_obstack instead of objfile->objfile_obstack. I couldn't reproduce the FAIL myself, but Thiago confirmed that the patch fixes it. Tested on arm-linux. Approved-By: Luis Machado <luis.machado@arm.com> PR tdep/31254 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31254
When running test-case gdb.dap/eof.exp, it occasionally coredumps. The thread triggering the coredump is: ... #0 0x0000ffff42bb2280 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x0000ffff42b65800 [PAC] in raise () from /lib64/libc.so.6 #2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11) at gdb/event-top.c:926 #3 0x00000000007b0470 in handle_sigsegv (sig=11) at gdb/event-top.c:976 #4 <signal handler called> #5 0x0000000000606080 in cli_ui_out::do_message (this=0xffff2f7ed728, style=..., format=0xffff0c002af1 "%s", args=...) at gdb/cli-out.c:232 #6 0x0000000000ce6358 in ui_out::call_do_message (this=0xffff2f7ed728, style=..., format=0xffff0c002af1 "%s") at gdb/ui-out.c:584 #7 0x0000000000ce6610 in ui_out::vmessage (this=0xffff2f7ed728, in_style=..., format=0x16f93ea "", args=...) at gdb/ui-out.c:621 #8 0x0000000000ce3a9c in ui_file::vprintf (this=0xfffffbea1b18, ...) at gdb/ui-file.c:74 #9 0x0000000000d2b148 in gdb_vprintf (stream=0xfffffbea1b18, format=0x16f93e8 "%s", args=...) at gdb/utils.c:1898 #10 0x0000000000d2b23c in gdb_printf (stream=0xfffffbea1b18, format=0x16f93e8 "%s") at gdb/utils.c:1913 #11 0x0000000000ab5208 in gdbpy_write (self=0x33fe35d0, args=0x342ec280, kw=0x345c08b0) at gdb/python/python.c:1464 #12 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0 #13 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall () from /lib64/libpython3.12.so.1.0 #14 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 #15 0x0000ffff434d8cd0 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0 #16 0x0000ffff434b9824 [PAC] in PyObject_CallOneArg () from /lib64/libpython3.12.so.1.0 #17 0x0000ffff43557674 [PAC] in PyFile_WriteObject () from /lib64/libpython3.12.so.1.0 #18 0x0000ffff435577a0 [PAC] in PyFile_WriteString () from /lib64/libpython3.12.so.1.0 #19 0x0000ffff43465354 [PAC] in thread_excepthook () from /lib64/libpython3.12.so.1.0 #20 0x0000ffff434ac6e0 [PAC] in cfunction_vectorcall_O () from /lib64/libpython3.12.so.1.0 #21 0x0000ffff434a32d8 [PAC] in PyObject_Vectorcall () from /lib64/libpython3.12.so.1.0 #22 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 #23 0x0000ffff434d8d88 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0 #24 0x0000ffff435e0ef4 [PAC] in thread_run () from /lib64/libpython3.12.so.1.0 #25 0x0000ffff43591ec0 [PAC] in pythread_wrapper () from /lib64/libpython3.12.so.1.0 #26 0x0000ffff42bb0584 [PAC] in start_thread () from /lib64/libc.so.6 #27 0x0000ffff42c1fd4c [PAC] in thread_start () from /lib64/libc.so.6 ... The direct cause for the coredump seems to be that cli_ui_out::do_message is trying to write to a stream variable which does not look sound: ... (gdb) p *stream $8 = {_vptr.ui_file = 0x0, m_applied_style = {m_foreground = {m_simple = true, { m_value = 0, {m_red = 0 '\000', m_green = 0 '\000', m_blue = 0 '\000'}}}, m_background = {m_simple = 32, {m_value = 65535, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 0 '\000'}}}, m_intensity = (unknown: 0x438fe710), m_reverse = 255}} ... The string that is being printed is: ... (gdb) p str $9 = "Exception in thread " ... so AFAICT this is a DAP thread running into an exception and trying to print it. If we look at the state of gdb's main thread, we have: ... #0 0x0000ffff42bac914 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 #1 0x0000ffff42bafb44 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6 #2 0x0000ffff43466e9c [PAC] in take_gil () from /lib64/libpython3.12.so.1.0 #3 0x0000ffff43484fe0 [PAC] in PyEval_RestoreThread () from /lib64/libpython3.12.so.1.0 #4 0x0000000000ab8698 [PAC] in gdbpy_allow_threads::~gdbpy_allow_threads ( this=0xfffffbea1cf8, __in_chrg=<optimized out>) at gdb/python/python-internal.h:769 #5 0x0000000000ab2fec in execute_gdb_command (self=0x33fe35d0, args=0x34297b60, kw=0x34553d20) at gdb/python/python.c:681 #6 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0 #7 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall () from /lib64/libpython3.12.so.1.0 #8 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 #9 0x0000ffff4353bce8 [PAC] in _PyObject_VectorcallTstate.lto_priv.3 () from /lib64/libpython3.12.so.1.0 #10 0x0000000000ab87fc [PAC] in gdbpy_event::operator() (this=0xffff14005900) at gdb/python/python.c:1061 #11 0x0000000000ab93e8 in std::__invoke_impl<void, gdbpy_event&> (__f=...) at /usr/include/c++/13/bits/invoke.h:61 #12 0x0000000000ab9204 in std::__invoke_r<void, gdbpy_event&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:111 #13 0x0000000000ab8e90 in std::_Function_handler<..>::_M_invoke(...) (...) at /usr/include/c++/13/bits/std_function.h:290 #14 0x000000000062e0d0 in std::function<void ()>::operator()() const ( this=0xffff14005830) at /usr/include/c++/13/bits/std_function.h:591 #15 0x0000000000b67f14 in run_events (error=0, client_data=0x0) at gdb/run-on-main-thread.c:76 #16 0x000000000157e290 in handle_file_event (file_ptr=0x33dae3a0, ready_mask=1) at gdbsupport/event-loop.cc:573 #17 0x000000000157e760 in gdb_wait_for_event (block=1) at gdbsupport/event-loop.cc:694 #18 0x000000000157d464 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:264 #19 0x0000000000943a84 in start_event_loop () at gdb/main.c:401 #20 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465 #21 0x000000000094567c in captured_main (data=0xfffffbea23e8) at gdb/main.c:1335 #22 0x0000000000945700 in gdb_main (args=0xfffffbea23e8) at gdb/main.c:1354 #23 0x0000000000423ab4 in main (argc=14, argv=0xfffffbea2578) at gdb/gdb.c:39 ... AFAIU, there's a race between the two threads on gdb_stderr: - the DAP thread samples the gdb_stderr value, and uses it a bit later to print to - the gdb main thread changes the gdb_stderr value forth and back, using a temporary value for string capture purposes The non-sound stream value is caused by gdb_stderr being sampled while pointing to a str_file object, and used once the str_file object is already destroyed. The error here is that the DAP thread attempts to print to gdb_stderr. Fix this by adding a thread_wrapper that: - catches all exceptions and logs them to dap.log, and - while we're at it, logs when exiting and using the thread_wrapper for each DAP thread. Tested on aarch64-linux. Approved-By: Tom Tromey <tom@tromey.com>
When running test-case gdb.dap/eof.exp, we're likely to get a coredump due to a segfault in new_threadstate. At the point of the core dump, the gdb main thread looks like: ... (gdb) bt #0 0x0000fffee30d2280 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x0000fffee3085800 [PAC] in raise () from /lib64/libc.so.6 #2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11) at gdb/event-top.c:926 #3 0x00000000007b0470 in handle_sigsegv (sig=11) at gdb/event-top.c:976 #4 <signal handler called> #5 0x0000fffee3a4db14 in new_threadstate () from /lib64/libpython3.12.so.1.0 #6 0x0000fffee3ab0548 [PAC] in PyGILState_Ensure () from /lib64/libpython3.12.so.1.0 #7 0x0000000000a6d034 [PAC] in gdbpy_gil::gdbpy_gil (this=0xffffcb279738) at gdb/python/python-internal.h:787 #8 0x0000000000ab87ac in gdbpy_event::~gdbpy_event (this=0xfffea8001ee0, __in_chrg=<optimized out>) at gdb/python/python.c:1051 #9 0x0000000000ab9460 in std::_Function_base::_Base_manager<...>::_M_destroy (__victim=...) at /usr/include/c++/13/bits/std_function.h:175 #10 0x0000000000ab92dc in std::_Function_base::_Base_manager<...>::_M_manager (__dest=..., __source=..., __op=std::__destroy_functor) at /usr/include/c++/13/bits/std_function.h:203 #11 0x0000000000ab8f14 in std::_Function_handler<...>::_M_manager(...) (...) at /usr/include/c++/13/bits/std_function.h:282 #12 0x000000000042dd9c in std::_Function_base::~_Function_base (this=0xfffea8001c10, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:244 #13 0x000000000042e654 in std::function<void ()>::~function() (this=0xfffea8001c10, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:334 #14 0x0000000000b68e60 in std::_Destroy<std::function<void ()> >(...) (...) at /usr/include/c++/13/bits/stl_construct.h:151 #15 0x0000000000b68cd0 in std::_Destroy_aux<false>::__destroy<...>(...) (...) at /usr/include/c++/13/bits/stl_construct.h:163 #16 0x0000000000b689d8 in std::_Destroy<...>(...) (...) at /usr/include/c++/13/bits/stl_construct.h:196 #17 0x0000000000b68414 in std::_Destroy<...>(...) (...) at /usr/include/c++/13/bits/alloc_traits.h:948 #18 std::vector<...>::~vector() (this=0x2a183c8 <runnables>) at /usr/include/c++/13/bits/stl_vector.h:732 #19 0x0000fffee3088370 in __run_exit_handlers () from /lib64/libc.so.6 #20 0x0000fffee3088450 [PAC] in exit () from /lib64/libc.so.6 #21 0x0000000000c95600 [PAC] in quit_force (exit_arg=0x0, from_tty=0) at gdb/top.c:1822 #22 0x0000000000609140 in quit_command (args=0x0, from_tty=0) at gdb/cli/cli-cmds.c:508 #23 0x0000000000c926a4 in quit_cover () at gdb/top.c:300 #24 0x00000000007b09d4 in async_disconnect (arg=0x0) at gdb/event-top.c:1230 #25 0x0000000000548acc in invoke_async_signal_handlers () at gdb/async-event.c:234 #26 0x000000000157d2d4 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:199 #27 0x0000000000943a84 in start_event_loop () at gdb/main.c:401 #28 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465 #29 0x000000000094567c in captured_main (data=0xffffcb279d08) at gdb/main.c:1335 #30 0x0000000000945700 in gdb_main (args=0xffffcb279d08) at gdb/main.c:1354 #31 0x0000000000423ab4 in main (argc=14, argv=0xffffcb279e98) at gdb/gdb.c:39 ... The direct cause of the segfault is calling PyGILState_Ensure after calling Py_Finalize. AFAICT the problem is a race between the gdb main thread and DAP's JSON writer thread. On one side, we have the following events: - DAP's JSON reader thread reads an EOF, and lets DAP's main thread known by writing None into read_queue - DAP's main thread lets DAP's JSON writer thread known by writing None into write_queue - DAP's JSON writer thread sees the None in its queue, and calls send_gdb("quit") - a corresponding gdbpy_event is deposited in the runnables vector, to be run by the gdb main thread On the other side, we have the following events: - the gdb main thread receives a SIGHUP - the corresponding handler calls quit_force, which calls do_final_cleanups - one of the final cleanups is finalize_python, which calls Py_Finalize - quit_force calls exit, which triggers the exit handlers - one of the exit handlers is the destructor of the runnables vector - destruction of the vector triggers destruction of the remaining element - the remaining element is a gdbpy_event, and the destructor (indirectly) calls PyGILState_Ensure It's good to note that both events (EOF and SIGHUP) are caused by this line in the test-case: ... catch "close -i $gdb_spawn_id" ... where "expect close" closes the stdin and stdout file descriptors, which causes the SIGHUP to be send. So, for the system I'm running this on, the send_gdb("quit") is actually not needed. I'm not sure if we support any systems where it's actually needed. Fix this by removing the send_gdb("quit"). Tested on aarch64-linux. PR dap/31306 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31306
When building gdb with -O0 -fsanitize=address, and running test-case gdb.ada/uninitialized_vars.exp, I run into: ... (gdb) info locals a = 0 z = (a => 1, b => false, c => 2.0) ================================================================= ==66372==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000097f58 at pc 0xffff52c0da1c bp 0xffffc90a1d40 sp 0xffffc90a1d80 READ of size 4 at 0x602000097f58 thread T0 #0 0xffff52c0da18 in memmove (/lib64/libasan.so.8+0x6da18) #1 0xbcab24 in unsigned char* std::__copy_move_backward<false, true, std::random_access_iterator_tag>::__copy_move_b<unsigned char const, unsigned char>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:748 #2 0xbc9bf4 in unsigned char* std::__copy_move_backward_a2<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:769 #3 0xbc898c in unsigned char* std::__copy_move_backward_a1<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:778 #4 0xbc715c in unsigned char* std::__copy_move_backward_a<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:807 #5 0xbc4e6c in unsigned char* std::copy_backward<unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:867 #6 0xbc2934 in void gdb::copy<unsigned char const, unsigned char>(gdb::array_view<unsigned char const>, gdb::array_view<unsigned char>) gdb/../gdbsupport/array-view.h:223 #7 0x20e0100 in value::contents_copy_raw(value*, long, long, long) gdb/value.c:1239 #8 0x20e9830 in value::primitive_field(long, int, type*) gdb/value.c:3078 #9 0x20e98f8 in value_field(value*, int) gdb/value.c:3095 #10 0xcafd64 in print_field_values gdb/ada-valprint.c:658 #11 0xcb0fa0 in ada_val_print_struct_union gdb/ada-valprint.c:857 #12 0xcb1bb4 in ada_value_print_inner(value*, ui_file*, int, value_print_options const*) gdb/ada-valprint.c:1042 #13 0xc66e04 in ada_language::value_print_inner(value*, ui_file*, int, value_print_options const*) const (/home/vries/gdb/build/gdb/gdb+0xc66e04) #14 0x20ca1e8 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1092 #15 0x20caabc in common_val_print_checked(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1184 #16 0x196c524 in print_variable_and_value(char const*, symbol*, frame_info_ptr, ui_file*, int) gdb/printcmd.c:2355 #17 0x1d99ca0 in print_variable_and_value_data::operator()(char const*, symbol*) gdb/stack.c:2308 #18 0x1dabca0 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::operator()(gdb::fv_detail::erased_callable, char const*, symbol*) const gdb/../gdbsupport/function-view.h:305 #19 0x1dabd14 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::_FUN(gdb::fv_detail::erased_callable, char const*, symbol*) gdb/../gdbsupport/function-view.h:299 #20 0x1dab34c in gdb::function_view<void (char const*, symbol*)>::operator()(char const*, symbol*) const gdb/../gdbsupport/function-view.h:289 #21 0x1d9963c in iterate_over_block_locals gdb/stack.c:2240 #22 0x1d99790 in iterate_over_block_local_vars(block const*, gdb::function_view<void (char const*, symbol*)>) gdb/stack.c:2259 #23 0x1d9a598 in print_frame_local_vars gdb/stack.c:2380 #24 0x1d9afac in info_locals_command(char const*, int) gdb/stack.c:2458 #25 0xfd7b30 in do_simple_func gdb/cli/cli-decode.c:95 #26 0xfe5a2c in cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 #27 0x1f03790 in execute_command(char const*, int) gdb/top.c:575 #28 0x1384080 in command_handler(char const*) gdb/event-top.c:566 #29 0x1384e2c in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:802 #30 0x1f731e4 in tui_command_line_handler gdb/tui/tui-interp.c:104 #31 0x1382a58 in gdb_rl_callback_handler gdb/event-top.c:259 #32 0x21dbb80 in rl_callback_read_char readline/readline/callback.c:290 #33 0x1382510 in gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 #34 0x138277c in gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 #35 0x1fe9b40 in stdin_event_handler gdb/ui.c:155 #36 0x35ff1bc in handle_file_event gdbsupport/event-loop.cc:573 #37 0x35ff9d8 in gdb_wait_for_event gdbsupport/event-loop.cc:694 #38 0x35fd284 in gdb_do_one_event(int) gdbsupport/event-loop.cc:264 #39 0x1768080 in start_event_loop gdb/main.c:408 #40 0x17684c4 in captured_command_loop gdb/main.c:472 #41 0x176cfc8 in captured_main gdb/main.c:1342 #42 0x176d088 in gdb_main(captured_main_args*) gdb/main.c:1361 #43 0xb73edc in main gdb/gdb.c:39 #44 0xffff519b09d8 in __libc_start_call_main (/lib64/libc.so.6+0x309d8) #45 0xffff519b0aac in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x30aac) #46 0xb73c2c in _start (/home/vries/gdb/build/gdb/gdb+0xb73c2c) 0x602000097f58 is located 0 bytes after 8-byte region [0x602000097f50,0x602000097f58) allocated by thread T0 here: #0 0xffff52c65218 in calloc (/lib64/libasan.so.8+0xc5218) #1 0xcbc278 in xcalloc gdb/alloc.c:97 #2 0x35f21e8 in xzalloc(unsigned long) gdbsupport/common-utils.cc:29 #3 0x20de270 in value::allocate_contents(bool) gdb/value.c:937 #4 0x20edc08 in value::fetch_lazy() gdb/value.c:4033 #5 0x20dadc0 in value::entirely_covered_by_range_vector(std::vector<range, std::allocator<range> > const&) gdb/value.c:229 #6 0xcb2298 in value::entirely_optimized_out() gdb/value.h:560 #7 0x20ca6fc in value_check_printable gdb/valprint.c:1133 #8 0x20caa8c in common_val_print_checked(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1182 #9 0x196c524 in print_variable_and_value(char const*, symbol*, frame_info_ptr, ui_file*, int) gdb/printcmd.c:2355 #10 0x1d99ca0 in print_variable_and_value_data::operator()(char const*, symbol*) gdb/stack.c:2308 #11 0x1dabca0 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::operator()(gdb::fv_detail::erased_callable, char const*, symbol*) const gdb/../gdbsupport/function-view.h:305 #12 0x1dabd14 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::_FUN(gdb::fv_detail::erased_callable, char const*, symbol*) gdb/../gdbsupport/function-view.h:299 #13 0x1dab34c in gdb::function_view<void (char const*, symbol*)>::operator()(char const*, symbol*) const gdb/../gdbsupport/function-view.h:289 #14 0x1d9963c in iterate_over_block_locals gdb/stack.c:2240 #15 0x1d99790 in iterate_over_block_local_vars(block const*, gdb::function_view<void (char const*, symbol*)>) gdb/stack.c:2259 #16 0x1d9a598 in print_frame_local_vars gdb/stack.c:2380 #17 0x1d9afac in info_locals_command(char const*, int) gdb/stack.c:2458 #18 0xfd7b30 in do_simple_func gdb/cli/cli-decode.c:95 #19 0xfe5a2c in cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 #20 0x1f03790 in execute_command(char const*, int) gdb/top.c:575 #21 0x1384080 in command_handler(char const*) gdb/event-top.c:566 #22 0x1384e2c in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:802 #23 0x1f731e4 in tui_command_line_handler gdb/tui/tui-interp.c:104 #24 0x1382a58 in gdb_rl_callback_handler gdb/event-top.c:259 #25 0x21dbb80 in rl_callback_read_char readline/readline/callback.c:290 #26 0x1382510 in gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 #27 0x138277c in gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 #28 0x1fe9b40 in stdin_event_handler gdb/ui.c:155 #29 0x35ff1bc in handle_file_event gdbsupport/event-loop.cc:573 SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib64/libasan.so.8+0x6da18) in memmove ... The error happens when trying to print either variable y or y2: ... type Variable_Record (A : Boolean := True) is record case A is when True => B : Integer; when False => C : Float; D : Integer; end case; end record; Y : Variable_Record := (A => True, B => 1); Y2 : Variable_Record := (A => False, C => 1.0, D => 2); ... when the variables are uninitialized. The error happens only when printing the entire variable: ... (gdb) p y.a $2 = 216 (gdb) p y.b There is no member named b. (gdb) p y.c $3 = 9.18340949e-41 (gdb) p y.d $4 = 1 (gdb) p y <AddressSanitizer: heap-buffer-overflow> ... The error happens as follows: - field a functions as discriminant, choosing either the b, or c+d variant. - when y.a happens to be set to 216, as above, gdb interprets this as the variable having the c+d variant (which is why trying to print y.b fails). - when printing y, gdb allocates a value, copies the bytes into it from the target, and then prints the value. - gdb allocates the value using the type size, which is 8. It's 8 because that's what the DW_AT_byte_size indicates. Note that for valid values of a, it gives correct results: if a is 0 (c+d variant), size is 12, if a is 1 (b variant), size is 8. - gdb tries to print field d, which is at an 8 byte offset, and that results in a out-of-bounds access for the allocated 8-byte value. Fix this by handling this case in value::contents_copy_raw, such that we have: ... (gdb) p y $1 = (a => 24, c => 9.18340949e-41, d => <error reading variable: access outside bounds of object>) ... An alternative (additional) fix could be this: in compute_variant_fields_inner gdb reads the discriminant y.a to decide which variant is active. It would be nice to detect that the value (y.a == 24) is not a valid Boolean, and give up on choosing a variant altoghether. However, the situation regarding the internal type CODE_TYPE_BOOL is currently ambiguous (see PR31282) and it's not possible to reliably decide what valid values are. The test-case source file gdb.ada/uninitialized-variable-record/parse.adb is a reduced version of gdb.ada/uninitialized_vars/parse.adb, so it copies the copyright years. Note that the test-case needs gcc-12 or newer, it's unsupported for older gcc versions. [ So, it would be nice to rewrite it into a dwarf assembly test-case. ] The test-case loops over all languages. This is inherited from an earlier attempt to fix this, which had language-specific fixes (in print_field_values, cp_print_value_fields, pascal_object_print_value_fields and f_language::value_print_inner). I've left this in, but I suppose it's not strictly necessary anymore. Tested on x86_64-linux. PR exp/31258 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31258
From the Python API, we can execute GDB commands via gdb.execute. If the command gives an exception, however, we need to recover the GDB prompt and enable stdin, because the exception does not reach top-level GDB or normal_stop. This was done in commit commit 1ba1ac8 Author: Andrew Burgess <andrew.burgess@embecosm.com> Date: Tue Nov 19 11:17:20 2019 +0000 gdb: Enable stdin on exception in execute_gdb_command with the following code: catch (const gdb_exception &except) { /* If an exception occurred then we won't hit normal_stop (), or have an exception reach the top level of the event loop, which are the two usual places in which stdin would be re-enabled. So, before we convert the exception and continue back in Python, we should re-enable stdin here. */ async_enable_stdin (); GDB_PY_HANDLE_EXCEPTION (except); } In this patch, we explain what happens when we run a GDB command in the context of a synchronous command, e.g. via Python observer notifications. As an example, suppose we have the following objfile event listener, specified in a file named file.py: ~~~ import gdb class MyListener: def __init__(self): gdb.events.new_objfile.connect(self.handle_new_objfile_event) self.processed_objfile = False def handle_new_objfile_event(self, event): if self.processed_objfile: return print("loading " + event.new_objfile.filename) self.processed_objfile = True gdb.execute('add-inferior -no-connection') gdb.execute('inferior 2') gdb.execute('target remote | gdbserver - /tmp/a.out') gdb.execute('inferior 1') the_listener = MyListener() ~~~ Using this Python file, we see the behavior below: $ gdb -q -ex "source file.py" -ex "run" --args a.out Reading symbols from a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 [New inferior 2] Added inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] stdin/stdout redirected Process /tmp/a.out created; pid = 3075406 Remote debugging using stdio Reading /tmp/a.out from remote target... ... [Switching to inferior 1 [process 3075400] (/tmp/a.out)] [Switching to thread 1.1 (process 3075400)] #0 0x00007ffff7fe3290 in ?? () from /lib64/ld-linux-x86-64.so.2 (gdb) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 1 (process 3075400) exited normally] Note how the GDB prompt comes in-between the debugger output. We have this obscure behavior, because the executed command, "target remote", triggers an invocation of `normal_stop` that enables stdin. After that, however, the Python notification context completes and GDB continues with its normal flow of executing the 'run' command. This can be seen in the call stack below: (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 #1 0x00005555561c3acd in normal_stop () at src/gdb/infrun.c:9432 #2 0x00005555561b328e in start_remote (from_tty=0) at src/gdb/infrun.c:3801 #3 0x0000555556441224 in remote_target::start_remote_1 (this=0x5555587882e0, from_tty=0, extended_p=0) at src/gdb/remote.c:5225 #4 0x000055555644166c in remote_target::start_remote (this=0x5555587882e0, from_tty=0, extended_p=0) at src/gdb/remote.c:5316 #5 0x00005555564430cf in remote_target::open_1 (name=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0, extended_p=0) at src/gdb/remote.c:6175 #6 0x0000555556441707 in remote_target::open (name=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0) at src/gdb/remote.c:5338 #7 0x00005555565ea63f in open_target (args=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0, command=0x555558589280) at src/gdb/target.c:824 #8 0x0000555555f0d89a in cmd_func (cmd=0x555558589280, args=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0) at src/gdb/cli/cli-decode.c:2735 #9 0x000055555661fb42 in execute_command (p=0x55555878529e "t", from_tty=0) at src/gdb/top.c:575 #10 0x0000555555f1a506 in execute_control_command_1 (cmd=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:529 #11 0x0000555555f1abea in execute_control_command (cmd=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:701 #12 0x0000555555f19fc7 in execute_control_commands (cmdlines=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:411 #13 0x0000555556400d91 in execute_gdb_command (self=0x7ffff43b5d00, args=0x7ffff440ab60, kw=0x0) at src/gdb/python/python.c:700 #14 0x00007ffff7a96023 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #15 0x00007ffff7a4dadc in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #16 0x00007ffff79e9a1c in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #17 0x00007ffff7b303af in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #18 0x00007ffff7a50358 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #19 0x00007ffff7a4f3f4 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #20 0x00007ffff7a4f883 in PyObject_CallFunctionObjArgs () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #21 0x00005555563a9758 in evpy_emit_event (event=0x7ffff42b5430, registry=0x7ffff42b4690) at src/gdb/python/py-event.c:104 #22 0x00005555563cb874 in emit_new_objfile_event (objfile=0x555558761700) at src/gdb/python/py-newobjfileevent.c:52 #23 0x00005555563b53bc in python_new_objfile (objfile=0x555558761700) at src/gdb/python/py-inferior.c:195 #24 0x0000555555d6dff0 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61 #25 0x0000555555d6be18 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111 #26 0x0000555555d69661 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffd080: 0x555558761700) at /usr/include/c++/11/bits/std_function.h:290 #27 0x0000555556314caf in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555585b5860, __args#0=0x555558761700) at /usr/include/c++/11/bits/std_function.h:590 #28 0x000055555631444e in gdb::observers::observable<objfile*>::notify (this=0x55555836eea0 <gdb::observers::new_objfile>, args#0=0x555558761700) at src/gdb/../gdbsupport/observable.h:166 #29 0x0000555556599b3f in symbol_file_add_with_addrs (abfd=..., name=0x55555875d310 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1125 #30 0x0000555556599ca4 in symbol_file_add_from_bfd (abfd=..., name=0x55555875d310 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1160 #31 0x0000555556546371 in solib_read_symbols (so=..., flags=...) at src/gdb/solib.c:692 #32 0x0000555556546f0f in solib_add (pattern=0x0, from_tty=0, readsyms=1) at src/gdb/solib.c:1015 #33 0x0000555556539891 in enable_break (info=0x55555874e180, from_tty=0) at src/gdb/solib-svr4.c:2416 #34 0x000055555653b305 in svr4_solib_create_inferior_hook (from_tty=0) at src/gdb/solib-svr4.c:3058 #35 0x0000555556547cee in solib_create_inferior_hook (from_tty=0) at src/gdb/solib.c:1217 #36 0x0000555556196f6a in post_create_inferior (from_tty=0) at src/gdb/infcmd.c:275 #37 0x0000555556197670 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at src/gdb/infcmd.c:486 #38 0x000055555619783f in run_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:512 #39 0x0000555555f0798d in do_simple_func (args=0x0, from_tty=1, c=0x555558567510) at src/gdb/cli/cli-decode.c:95 #40 0x0000555555f0d89a in cmd_func (cmd=0x555558567510, args=0x0, from_tty=1) at src/gdb/cli/cli-decode.c:2735 #41 0x000055555661fb42 in execute_command (p=0x7fffffffe2c4 "", from_tty=1) at src/gdb/top.c:575 #42 0x000055555626303b in catch_command_errors (command=0x55555661f4ab <execute_command(char const*, int)>, arg=0x7fffffffe2c1 "run", from_tty=1, do_bp_actions=true) at src/gdb/main.c:513 #43 0x000055555626328a in execute_cmdargs (cmdarg_vec=0x7fffffffdaf0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffda3c) at src/gdb/main.c:612 #44 0x0000555556264849 in captured_main_1 (context=0x7fffffffdd40) at src/gdb/main.c:1293 #45 0x0000555556264a7f in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1314 #46 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 #47 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) The use of the "target remote" command here is just an example. In principle, we would reproduce the problem with any command that triggers an invocation of `normal_stop`. To omit enabling the stdin in `normal_stop`, we would have to check the context we are in. Since we cannot do that, we add a new field to `struct ui` to track whether the prompt was already blocked, and set the tracker flag in the Python context before executing a GDB command. After applying this patch, the output becomes ... Reading symbols from a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 [New inferior 2] Added inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] stdin/stdout redirected Process /tmp/a.out created; pid = 3032261 Remote debugging using stdio Reading /tmp/a.out from remote target... ... [Switching to inferior 1 [process 3032255] (/tmp/a.out)] [Switching to thread 1.1 (process 3032255)] #0 0x00007ffff7fe3290 in ?? () from /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 1 (process 3032255) exited normally] (gdb) Let's now consider a secondary scenario, where the command executed from the Python raises an error. As an example, suppose we have the Python file below: def handle_new_objfile_event(self, event): ... print("loading " + event.new_objfile.filename) self.processed_objfile = True gdb.execute('print a') The executed command, "print a", gives an error because "a" is not defined. Without this patch, we see the behavior below, where the prompt is again placed incorrectly: ... Reading symbols from /tmp/a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 Python Exception <class 'gdb.error'>: No symbol "a" in current context. (gdb) [Inferior 1 (process 3980401) exited normally] This time, `async_enable_stdin` is called from the 'catch' block in `execute_gdb_command`: (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 #1 0x0000555556400f0a in execute_gdb_command (self=0x7ffff43b5d00, args=0x7ffff440ab60, kw=0x0) at src/gdb/python/python.c:713 #2 0x00007ffff7a96023 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #3 0x00007ffff7a4dadc in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #4 0x00007ffff79e9a1c in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #5 0x00007ffff7b303af in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #6 0x00007ffff7a50358 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #7 0x00007ffff7a4f3f4 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #8 0x00007ffff7a4f883 in PyObject_CallFunctionObjArgs () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #9 0x00005555563a9758 in evpy_emit_event (event=0x7ffff42b5430, registry=0x7ffff42b4690) at src/gdb/python/py-event.c:104 #10 0x00005555563cb874 in emit_new_objfile_event (objfile=0x555558761410) at src/gdb/python/py-newobjfileevent.c:52 #11 0x00005555563b53bc in python_new_objfile (objfile=0x555558761410) at src/gdb/python/py-inferior.c:195 #12 0x0000555555d6dff0 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61 #13 0x0000555555d6be18 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111 #14 0x0000555555d69661 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffd080: 0x555558761410) at /usr/include/c++/11/bits/std_function.h:290 #15 0x0000555556314caf in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555585b5860, __args#0=0x555558761410) at /usr/include/c++/11/bits/std_function.h:590 #16 0x000055555631444e in gdb::observers::observable<objfile*>::notify (this=0x55555836eea0 <gdb::observers::new_objfile>, args#0=0x555558761410) at src/gdb/../gdbsupport/observable.h:166 #17 0x0000555556599b3f in symbol_file_add_with_addrs (abfd=..., name=0x55555875d020 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1125 #18 0x0000555556599ca4 in symbol_file_add_from_bfd (abfd=..., name=0x55555875d020 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1160 #19 0x0000555556546371 in solib_read_symbols (so=..., flags=...) at src/gdb/solib.c:692 #20 0x0000555556546f0f in solib_add (pattern=0x0, from_tty=0, readsyms=1) at src/gdb/solib.c:1015 #21 0x0000555556539891 in enable_break (info=0x55555874a670, from_tty=0) at src/gdb/solib-svr4.c:2416 #22 0x000055555653b305 in svr4_solib_create_inferior_hook (from_tty=0) at src/gdb/solib-svr4.c:3058 #23 0x0000555556547cee in solib_create_inferior_hook (from_tty=0) at src/gdb/solib.c:1217 #24 0x0000555556196f6a in post_create_inferior (from_tty=0) at src/gdb/infcmd.c:275 #25 0x0000555556197670 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at src/gdb/infcmd.c:486 #26 0x000055555619783f in run_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:512 #27 0x0000555555f0798d in do_simple_func (args=0x0, from_tty=1, c=0x555558567510) at src/gdb/cli/cli-decode.c:95 #28 0x0000555555f0d89a in cmd_func (cmd=0x555558567510, args=0x0, from_tty=1) at src/gdb/cli/cli-decode.c:2735 #29 0x000055555661fb42 in execute_command (p=0x7fffffffe2c4 "", from_tty=1) at src/gdb/top.c:575 #30 0x000055555626303b in catch_command_errors (command=0x55555661f4ab <execute_command(char const*, int)>, arg=0x7fffffffe2c1 "run", from_tty=1, do_bp_actions=true) at src/gdb/main.c:513 #31 0x000055555626328a in execute_cmdargs (cmdarg_vec=0x7fffffffdaf0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffda3c) at src/gdb/main.c:612 #32 0x0000555556264849 in captured_main_1 (context=0x7fffffffdd40) at src/gdb/main.c:1293 #33 0x0000555556264a7f in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1314 #34 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 #35 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) Again, after we enable stdin, GDB continues with its normal flow of the 'run' command and receives the inferior's exit event, where it would have enabled stdin, if we had not done it prematurely. (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 #1 0x00005555561c3acd in normal_stop () at src/gdb/infrun.c:9432 #2 0x00005555561b5bf1 in fetch_inferior_event () at src/gdb/infrun.c:4700 #3 0x000055555618d6a7 in inferior_event_handler (event_type=INF_REG_EVENT) at src/gdb/inf-loop.c:42 #4 0x000055555620ecdb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4316 #5 0x0000555556f33035 in handle_file_event (file_ptr=0x5555587024e0, ready_mask=1) at src/gdbsupport/event-loop.cc:573 #6 0x0000555556f3362f in gdb_wait_for_event (block=0) at src/gdbsupport/event-loop.cc:694 #7 0x0000555556f322cd in gdb_do_one_event (mstimeout=-1) at src/gdbsupport/event-loop.cc:217 #8 0x0000555556262df8 in start_event_loop () at src/gdb/main.c:407 #9 0x0000555556262f85 in captured_command_loop () at src/gdb/main.c:471 #10 0x0000555556264a84 in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1324 #11 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 #12 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) The solution implemented by this patch addresses the problem. After applying the patch, the output becomes $ gdb -q -ex "source file.py" -ex "run" --args a.out Reading symbols from /tmp/a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 Python Exception <class 'gdb.error'>: No symbol "a" in current context. [Inferior 1 (process 3984511) exited normally] (gdb) Regression-tested on X86_64 Linux using the default board file (i.e. unix). Co-Authored-By: Oguzhan Karakaya <oguzhan.karakaya@intel.com> Reviewed-By: Guinevere Larsen <blarsen@redhat.com> Approved-By: Tom Tromey <tom@tromey.com>
…ro linux When running test-case gdb.threads/attach-stopped.exp on aarch64-linux, using the manjaro linux distro, I get: ... (gdb) thread apply all bt^M ^M Thread 2 (Thread 0xffff8d8af120 (LWP 278116) "attach-stopped"):^M #0 0x0000ffff8d964864 in clock_nanosleep () from /usr/lib/libc.so.6^M #1 0x0000ffff8d969cac in nanosleep () from /usr/lib/libc.so.6^M #2 0x0000ffff8d969b68 in sleep () from /usr/lib/libc.so.6^M #3 0x0000aaaade370828 in func (arg=0x0) at attach-stopped.c:29^M #4 0x0000ffff8d930aec in ?? () from /usr/lib/libc.so.6^M #5 0x0000ffff8d99a5dc in ?? () from /usr/lib/libc.so.6^M ^M Thread 1 (Thread 0xffff8db62020 (LWP 278111) "attach-stopped"):^M #0 0x0000ffff8d92d2d8 in ?? () from /usr/lib/libc.so.6^M #1 0x0000ffff8d9324b8 in ?? () from /usr/lib/libc.so.6^M #2 0x0000aaaade37086c in main () at attach-stopped.c:45^M (gdb) FAIL: gdb.threads/attach-stopped.exp: threaded: attach2 to stopped bt ... The problem is that the test-case expects to see start_thread: ... gdb_test "thread apply all bt" ".*sleep.*start_thread.*" \ "$threadtype: attach2 to stopped bt" ... but lack of symbols makes that impossible. Fix this by allowing " in ?? () from " as well. Tested on aarch64-linux. PR testsuite/31451 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31451
When running test-case gdb.server/connect-with-no-symbol-file.exp on aarch64-linux (specifically, an opensuse leap 15.5 container on a fedora asahi 39 system), I run into: ... (gdb) detach^M Detaching from program: target:connect-with-no-symbol-file, process 185104^M Ending remote debugging.^M terminate called after throwing an instance of 'gdb_exception_error'^M ... The detailed backtrace of the corefile is: ... (gdb) bt #0 0x0000ffff75504f54 in raise () from /lib64/libpthread.so.0 #1 0x00000000007a86b4 in handle_fatal_signal (sig=6) at gdb/event-top.c:926 #2 <signal handler called> #3 0x0000ffff74b977b4 in raise () from /lib64/libc.so.6 #4 0x0000ffff74b98c18 in abort () from /lib64/libc.so.6 #5 0x0000ffff74ea26f4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #6 0x0000ffff74ea011c in ?? () from /usr/lib64/libstdc++.so.6 #7 0x0000ffff74ea0180 in std::terminate() () from /usr/lib64/libstdc++.so.6 #8 0x0000ffff74ea0464 in __cxa_throw () from /usr/lib64/libstdc++.so.6 #9 0x0000000001548870 in throw_it (reason=RETURN_ERROR, error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...) at gdbsupport/common-exceptions.cc:203 #10 0x0000000001548920 in throw_verror (error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...) at gdbsupport/common-exceptions.cc:211 #11 0x0000000001548a00 in throw_error (error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed") at gdbsupport/common-exceptions.cc:226 #12 0x0000000000ac8f2c in remote_target::readchar (this=0x233d3d90, timeout=2) at gdb/remote.c:9856 #13 0x0000000000ac9f04 in remote_target::getpkt (this=0x233d3d90, buf=0x233d40a8, forever=false, is_notif=0x0) at gdb/remote.c:10326 #14 0x0000000000acf3d0 in remote_target::remote_hostio_send_command (this=0x233d3d90, command_bytes=13, which_packet=17, remote_errno=0xfffff1a3cf38, attachment=0xfffff1a3ce88, attachment_len=0xfffff1a3ce90) at gdb/remote.c:12567 #15 0x0000000000ad03bc in remote_target::fileio_fstat (this=0x233d3d90, fd=3, st=0xfffff1a3d020, remote_errno=0xfffff1a3cf38) at gdb/remote.c:12979 #16 0x0000000000c39878 in target_fileio_fstat (fd=0, sb=0xfffff1a3d020, target_errno=0xfffff1a3cf38) at gdb/target.c:3315 #17 0x00000000007eee5c in target_fileio_stream::stat (this=0x233d4400, abfd=0x2323fc40, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:467 #18 0x00000000007f012c in <lambda(bfd*, void*, stat*)>::operator()(bfd *, void *, stat *) const (__closure=0x0, abfd=0x2323fc40, stream=0x233d4400, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:955 #19 0x00000000007f015c in <lambda(bfd*, void*, stat*)>::_FUN(bfd *, void *, stat *) () at gdb/gdb_bfd.c:956 #20 0x0000000000f9b838 in opncls_bstat (abfd=0x2323fc40, sb=0xfffff1a3d020) at bfd/opncls.c:665 #21 0x0000000000f90adc in bfd_stat (abfd=0x2323fc40, statbuf=0xfffff1a3d020) at bfd/bfdio.c:431 #22 0x000000000065fe20 in reopen_exec_file () at gdb/corefile.c:52 #23 0x0000000000c3a3e8 in generic_mourn_inferior () at gdb/target.c:3642 #24 0x0000000000abf3f0 in remote_unpush_target (target=0x233d3d90) at gdb/remote.c:6067 #25 0x0000000000aca8b0 in remote_target::mourn_inferior (this=0x233d3d90) at gdb/remote.c:10587 #26 0x0000000000c387cc in target_mourn_inferior ( ptid=<error reading variable: Cannot access memory at address 0x2d310>) at gdb/target.c:2738 #27 0x0000000000abfff0 in remote_target::remote_detach_1 (this=0x233d3d90, inf=0x22fce540, from_tty=1) at gdb/remote.c:6421 #28 0x0000000000ac0094 in remote_target::detach (this=0x233d3d90, inf=0x22fce540, from_tty=1) at gdb/remote.c:6436 #29 0x0000000000c37c3c in target_detach (inf=0x22fce540, from_tty=1) at gdb/target.c:2526 #30 0x0000000000860424 in detach_command (args=0x0, from_tty=1) at gdb/infcmd.c:2817 #31 0x000000000060b594 in do_simple_func (args=0x0, from_tty=1, c=0x231431a0) at gdb/cli/cli-decode.c:94 #32 0x00000000006108c8 in cmd_func (cmd=0x231431a0, args=0x0, from_tty=1) at gdb/cli/cli-decode.c:2741 #33 0x0000000000c65a94 in execute_command (p=0x232e52f6 "", from_tty=1) at gdb/top.c:570 #34 0x00000000007a7d2c in command_handler (command=0x232e52f0 "") at gdb/event-top.c:566 #35 0x00000000007a8290 in command_line_handler (rl=...) at gdb/event-top.c:802 #36 0x0000000000c9092c in tui_command_line_handler (rl=...) at gdb/tui/tui-interp.c:103 #37 0x00000000007a750c in gdb_rl_callback_handler (rl=0x23385330 "detach") at gdb/event-top.c:258 #38 0x0000000000d910f4 in rl_callback_read_char () at readline/readline/callback.c:290 #39 0x00000000007a7338 in gdb_rl_callback_read_char_wrapper_noexcept () at gdb/event-top.c:194 #40 0x00000000007a73f0 in gdb_rl_callback_read_char_wrapper (client_data=0x22fbf640) at gdb/event-top.c:233 #41 0x0000000000cbee1c in stdin_event_handler (error=0, client_data=0x22fbf640) at gdb/ui.c:154 #42 0x000000000154ed60 in handle_file_event (file_ptr=0x232be730, ready_mask=1) at gdbsupport/event-loop.cc:572 #43 0x000000000154f21c in gdb_wait_for_event (block=1) at gdbsupport/event-loop.cc:693 #44 0x000000000154dec4 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:263 #45 0x0000000000910f98 in start_event_loop () at gdb/main.c:400 #46 0x0000000000911130 in captured_command_loop () at gdb/main.c:464 #47 0x0000000000912b5c in captured_main (data=0xfffff1a3db58) at gdb/main.c:1338 #48 0x0000000000912bf4 in gdb_main (args=0xfffff1a3db58) at gdb/main.c:1357 #49 0x00000000004170f4 in main (argc=10, argv=0xfffff1a3dcc8) at gdb/gdb.c:38 (gdb) ... The abort happens because a c++ exception escapes to c code, specifically opncls_bstat in bfd/opncls.c. Compiling with -fexceptions works around this. Fix this by catching the exception just before it escapes, in stat_trampoline and likewise in few similar spot. Add a new template catch_exceptions to do so in a consistent way. Tested on aarch64-linux. Approved-by: Pedro Alves <pedro@palves.net> PR remote/31577 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31577
If threads are disabled, either by --disable-threading explicitely, or by missing std::thread support, you get the following ASAN error when loading symbols: ==7310==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000002128 at pc 0x00000098794a bp 0x7ffe37e6af70 sp 0x7ffe37e6af68 READ of size 1 at 0x614000002128 thread T0 #0 0x987949 in index_cache_store_context::store() const ../../gdb/dwarf2/index-cache.c:163 #1 0x943467 in cooked_index_worker::write_to_cache(cooked_index const*, deferred_warnings*) const ../../gdb/dwarf2/cooked-index.c:601 #2 0x1705e39 in std::function<void ()>::operator()() const /gcc/9/include/c++/9.2.0/bits/std_function.h:690 #3 0x1705e39 in gdb::task_group::impl::~impl() ../../gdbsupport/task-group.cc:38 0x614000002128 is located 232 bytes inside of 408-byte region [0x614000002040,0x6140000021d8) freed by thread T0 here: #0 0x7fd75ccf8ea5 in operator delete(void*, unsigned long) ../../.././libsanitizer/asan/asan_new_delete.cc:177 #1 0x9462e5 in cooked_index::index_for_writing() ../../gdb/dwarf2/cooked-index.h:689 #2 0x9462e5 in operator() ../../gdb/dwarf2/cooked-index.c:657 #3 0x9462e5 in _M_invoke /gcc/9/include/c++/9.2.0/bits/std_function.h:300 It's happening because cooked_index_worker::wait always returns true in this case, which tells cooked_index::wait it can delete the m_state cooked_index_worker member, but cooked_index_worker::write_to_cache tries to access it immediately afterwards. Fixed by making cooked_index_worker::wait only return true if desired_state is CACHE_DONE, same as if threading was enabled, so m_state will not be prematurely deleted. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31694 Approved-By: Tom Tromey <tom@tromey.com>
Similar to the x86_64 testcases, some .s files contain the corresponding CFI directives. This helps in validating the synthesized CFI by running those tests with and without the --scfi=experimental command line option. GAS issues some diagnostics, enabled by default, with --scfi=experimental. The diagnostics have been added with an intent to help user correct inadvertent errors in their hand-written asm. An error is issued when GAS finds that input asm is not amenable to accurate CFI synthesis. The existing scfi-diag-*.s tests in the gas/testsuite/gas/scfi/x86_64 directory test some SCFI diagnostics already: - (#1) "Warning: SCFI: Asymetrical register restore" - (#2) "Error: SCFI: usage of REG_FP as scratch not supported" - (#3) "Error: SCFI: unsupported stack manipulation pattern" - (#4) "Error: untraceable control flow for func 'XXX'" In the newly added aarch64 testsuite, further tests for additional diagnostics have been added: - scfi-diag-1.s in this patch highlights an aarch64-specific diagnostic: (#5) "Warning: SCFI: ignored probable save/restore op with reg offset" Additionally, some testcases are added to showcase the (currently) unsupported patterns, e.g., scfi-unsupported-1.s mov x16, 4384 sub sp, sp, x16 gas/testsuite/: * gas/scfi/README: Update comment to include aarch64. * gas/scfi/aarch64/scfi-aarch64.exp: New file. * gas/scfi/aarch64/ginsn-arith-1.l: New test. * gas/scfi/aarch64/ginsn-arith-1.s: New test. * gas/scfi/aarch64/ginsn-cofi-1.l: New test. * gas/scfi/aarch64/ginsn-cofi-1.s: New test. * gas/scfi/aarch64/ginsn-ldst-1.l: New test. * gas/scfi/aarch64/ginsn-ldst-1.s: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-1.d: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-1.l: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-1.s: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-2.d: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-2.l: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-2.s: New test. * gas/scfi/aarch64/scfi-cb-1.d: New test. * gas/scfi/aarch64/scfi-cb-1.l: New test. * gas/scfi/aarch64/scfi-cb-1.s: New test. * gas/scfi/aarch64/scfi-cfg-1.d: New test. * gas/scfi/aarch64/scfi-cfg-1.l: New test. * gas/scfi/aarch64/scfi-cfg-1.s: New test. * gas/scfi/aarch64/scfi-cfg-2.d: New test. * gas/scfi/aarch64/scfi-cfg-2.l: New test. * gas/scfi/aarch64/scfi-cfg-2.s: New test. * gas/scfi/aarch64/scfi-cfg-3.d: New test. * gas/scfi/aarch64/scfi-cfg-3.l: New test. * gas/scfi/aarch64/scfi-cfg-3.s: New test. * gas/scfi/aarch64/scfi-cfg-4.l: New test. * gas/scfi/aarch64/scfi-cfg-4.s: New test. * gas/scfi/aarch64/scfi-cond-br-1.d: New test. * gas/scfi/aarch64/scfi-cond-br-1.l: New test. * gas/scfi/aarch64/scfi-cond-br-1.s: New test. * gas/scfi/aarch64/scfi-diag-1.l: New test. * gas/scfi/aarch64/scfi-diag-1.s: New test. * gas/scfi/aarch64/scfi-diag-2.l: New test. * gas/scfi/aarch64/scfi-diag-2.s: New test. * gas/scfi/aarch64/scfi-diag-3.l: New test. * gas/scfi/aarch64/scfi-diag-3.s: New test. * gas/scfi/aarch64/scfi-ldrp-1.d: New test. * gas/scfi/aarch64/scfi-ldrp-1.l: New test. * gas/scfi/aarch64/scfi-ldrp-1.s: New test. * gas/scfi/aarch64/scfi-ldrp-2.d: New test. * gas/scfi/aarch64/scfi-ldrp-2.l: New test. * gas/scfi/aarch64/scfi-ldrp-2.s: New test. * gas/scfi/aarch64/scfi-ldstnap-1.d: New test. * gas/scfi/aarch64/scfi-ldstnap-1.l: New test. * gas/scfi/aarch64/scfi-ldstnap-1.s: New test. * gas/scfi/aarch64/scfi-strp-1.d: New test. * gas/scfi/aarch64/scfi-strp-1.l: New test. * gas/scfi/aarch64/scfi-strp-1.s: New test. * gas/scfi/aarch64/scfi-strp-2.d: New test. * gas/scfi/aarch64/scfi-strp-2.l: New test. * gas/scfi/aarch64/scfi-strp-2.s: New test. * gas/scfi/aarch64/scfi-unsupported-1.l: New test. * gas/scfi/aarch64/scfi-unsupported-1.s: New test. * gas/scfi/aarch64/scfi-unsupported-2.l: New test. * gas/scfi/aarch64/scfi-unsupported-2.s: New test.
Since commit b1da98a ("gdb: remove use of alloca in new_macro_definition"), if cached_argv is empty, we call macro_bcache with a nullptr data. This ends up caught by UBSan deep down in the bcache code: $ ./gdb -nx -q --data-directory=data-directory /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp -readnow Reading symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp... Expanding full symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp... /home/smarchi/src/binutils-gdb/gdb/bcache.c:195:12: runtime error: null pointer passed as argument 2, which is declared to never be null The backtrace: #1 0x00007ffff619a05d in __ubsan::__ubsan_handle_nonnull_arg_abort (Data=<optimized out>) at ../../../../src/libsanitizer/ubsan/ubsan_handlers.cpp:750 #2 0x000055556337fba2 in gdb::bcache::insert (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.c:195 #3 0x0000555564b49222 in gdb::bcache::insert<char const*, void> (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.h:158 #4 0x0000555564b481fa in macro_bcache<char const*> (t=0x62100007ae70, addr=0x0, len=0) at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:117 #5 0x0000555564b42b4a in new_macro_definition (t=0x62100007ae70, kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:573 #6 0x0000555564b44674 in macro_define_internal (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:777 #7 0x0000555564b44ae2 in macro_define_function (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:816 #8 0x0000555563f62fc8 in parse_macro_definition (file=0x6210000ab9e0, line=469, body=0x62a00003af2a "__va_arg_pack() __builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:203 This can be reproduced by running gdb.base/macscp.exp. Avoid calling macro_bcache if the macro doesn't have any arguments. Change-Id: I33b5a7c3b3a93d5adba98983fcaae9c8522c383d
The commit: commit c6b4867 Date: Thu Mar 30 19:21:22 2023 +0100 gdb: parse pending breakpoint thread/task immediately Introduce a use bug where the value of a temporary variable was being used after it had gone out of scope. This was picked up by the address sanitizer and would result in this error: (gdb) maintenance selftest create_breakpoint_parse_arg_string Running selftest create_breakpoint_parse_arg_string. ================================================================= ==2265825==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fbb08046511 at pc 0x000001632230 bp 0x7fff7c2fb770 sp 0x7fff7c2fb768 READ of size 1 at 0x7fbb08046511 thread T0 #0 0x163222f in create_breakpoint_parse_arg_string(char const*, std::unique_ptr<char, gdb::xfree_deleter<char> >*, int*, int*, int*, std::unique_ptr<char, gdb::xfree_deleter<char> >*, bool*) ../../src/gdb/break-cond-parse.c:496 #1 0x1633026 in test ../../src/gdb/break-cond-parse.c:582 #2 0x163391b in create_breakpoint_parse_arg_string_tests ../../src/gdb/break-cond-parse.c:649 #3 0x12cfebc in void std::__invoke_impl<void, void (*&)()>(std::__invoke_other, void (*&)()) /usr/include/c++/13/bits/invoke.h:61 #4 0x12cc8ee in std::enable_if<is_invocable_r_v<void, void (*&)()>, void>::type std::__invoke_r<void, void (*&)()>(void (*&)()) /usr/include/c++/13/bits/invoke.h:111 #5 0x12c81e5 in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) /usr/include/c++/13/bits/std_function.h:290 #6 0x18bb51d in std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591 #7 0x4193ef9 in selftests::run_tests(gdb::array_view<char const* const>, bool) ../../src/gdbsupport/selftest.cc:100 #8 0x21c2206 in maintenance_selftest ../../src/gdb/maint.c:1172 ... etc ... The problem was caused by three lines like this one: thread_info *thr = parse_thread_id (std::string (t.get_value ()).c_str (), &tmptok); After parsing the thread-id TMPTOK would be left pointing into the temporary string which had been created on this line. When on the next line we did this: gdb_assert (*tmptok == '\0'); The value of *TMPTOK is undefined. Fix this by creating the std::string earlier in the scope. Now the contents of the string will remain valid when we check *TMPTOK. The address sanitizer issue is now resolved.
The binary provided with bug 32165 [1] has 36139 ELF sections. GDB crashes on it with (note that my GDB is build with -D_GLIBCXX_DEBUG=1: $ ./gdb -nx -q --data-directory=data-directory ./vmlinux Reading symbols from ./vmlinux... (No debugging symbols found in ./vmlinux) (gdb) info func /usr/include/c++/14.2.1/debug/vector:508: In function: std::debug::vector<_Tp, _Allocator>::reference std::debug::vector<_Tp, _Allocator>::operator[](size_type) [with _Tp = long unsigned int; _Allocator = std::allocator<long unsigned int>; reference = long unsigned int&; size_type = long unsigned int] Error: attempt to subscript container with out-of-bounds index -29445, but container only holds 36110 elements. Objects involved in the operation: sequence "this" @ 0x514000007340 { type = std::debug::vector<unsigned long, std::allocator<unsigned long> >; } The crash occurs here: #3 0x00007ffff5e334c3 in __GI_abort () at abort.c:79 #4 0x00007ffff689afc4 in __gnu_debug::_Error_formatter::_M_error (this=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/debug.cc:1320 #5 0x0000555561119a16 in std::__debug::vector<unsigned long, std::allocator<unsigned long> >::operator[] (this=0x514000007340, __n=18446744073709522171) at /usr/include/c++/14.2.1/debug/vector:508 #6 0x0000555562e288e8 in minimal_symbol::value_address (this=0x5190000bb698, objfile=0x514000007240) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:517 #7 0x0000555562e5a131 in global_symbol_searcher::expand_symtabs (this=0x7ffff0f5c340, objfile=0x514000007240, preg=std::optional [no contained value]) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:4983 #8 0x0000555562e5d2ed in global_symbol_searcher::search (this=0x7ffff0f5c340) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5189 #9 0x0000555562e5ffa4 in symtab_symbol_info (quiet=false, exclude_minsyms=false, regexp=0x0, kind=FUNCTION_DOMAIN, t_regexp=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5361 #10 0x0000555562e6131b in info_functions_command (args=0x0, from_tty=1) at /home/smarchi/src/binutils-gdb/gdb/symtab.c:5525 That is, at this line of `minimal_symbol::value_address`, where `objfile->section_offsets` is an `std::vector`: return (CORE_ADDR (this->unrelocated_address ()) + objfile->section_offsets[this->section_index ()]); A section index of -29445 is suspicious. The minimal_symbol at play here is: (top-gdb) p m_name $1 = 0x521001de10af "_sinittext" So I restarted debugging, breaking on: (top-gdb) b general_symbol_info::set_section_index if $_streq("_sinittext", m_name) And I see that weird -29445 value: (top-gdb) frame #0 general_symbol_info::set_section_index (this=0x525000082390, idx=-29445) at /home/smarchi/src/binutils-gdb/gdb/symtab.h:611 611 { m_section = idx; } But going up one frame, the section index is 36091: (top-gdb) frame #1 0x0000555562426526 in minimal_symbol_reader::record_full (this=0x7ffff0ead560, name="_sinittext", copy_name=false, address=-2111475712, ms_type=mst_text, section=36091) at /home/smarchi/src/binutils-gdb/gdb/minsyms.c:1228 1228 msymbol->set_section_index (section); It seems like the problem is just that the type used for the section index (short) is not big enough. Change from short to int. If somebody insists, we could even go long long / int64_t, but I doubt it's necessary. With that fixed, I get: (gdb) info func All defined functions: Non-debugging symbols: 0xffffffff81000000 _stext 0xffffffff82257000 _sinittext 0xffffffff822b4ebb _einittext [1] https://sourceware.org/bugzilla/show_bug.cgi?id=32165 Change-Id: Icb1c3de9474ff5adef7e0bbbf5e0b67b279dee04 Reviewed-By: Tom de Vries <tdevries@suse.de> Reviewed-by: Keith Seitz <keiths@redhat.com>
When building gdb with gcc 12 and -fsanitize=threads while renabling background dwarf reading by setting dwarf_synchronous to false, I run into: ... (gdb) file amd64-watchpoint-downgrade Reading symbols from amd64-watchpoint-downgrade... (gdb) watch global_var ================== WARNING: ThreadSanitizer: data race (pid=20124) Read of size 8 at 0x7b80000500d8 by main thread: #0 cooked_index_entry::full_name(obstack*, bool) const cooked-index.c:220 #1 cooked_index::get_main_name(obstack*, language*) const cooked-index.c:735 #2 cooked_index_worker::wait(cooked_state, bool) cooked-index.c:559 #3 cooked_index::wait(cooked_state, bool) cooked-index.c:631 #4 cooked_index_functions::wait(objfile*, bool) cooked-index.h:729 #5 cooked_index_functions::compute_main_name(objfile*) cooked-index.h:806 #6 objfile::compute_main_name() symfile-debug.c:461 #7 find_main_name symtab.c:6503 #8 main_language() symtab.c:6608 #9 set_initial_language_callback symfile.c:1634 #10 get_current_language() language.c:96 ... Previous write of size 8 at 0x7b80000500d8 by thread T1: #0 cooked_index_shard::finalize(parent_map_map const*) \ dwarf2/cooked-index.c:409 #1 operator() cooked-index.c:663 ... ... SUMMARY: ThreadSanitizer: data race cooked-index.c:220 in \ cooked_index_entry::full_name(obstack*, bool) const ================== Hardware watchpoint 1: global_var (gdb) PASS: gdb.arch/amd64-watchpoint-downgrade.exp: watch global_var ... This was also reported in PR31715. This is due do gcc PR110799 [1], generating wrong code with -fhoist-adjacent-loads, and causing a false positive for -fsanitize=threads. Work around the gcc PR by forcing -fno-hoist-adjacent-loads for gcc <= 13 and -fsanitize=threads. Tested in that same configuration on x86_64-linux. Remaining ThreadSanitizer problems are the ones reported in PR31626 (gdb.rust/dwindex.exp) and PR32247 (gdb.trace/basic-libipa.exp). PR gdb/31715 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31715 Tested-By: Bernd Edlinger <bernd.edlinger@hotmail.de> Approved-By: Tom Tromey <tom@tromey.com> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
When calling a function with double arguments, I get this asan error: ==7920==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x0053131ece38 at pc 0x7ff79697a68f bp 0x0053131ec790 sp 0x0053131ebf40 READ of size 16 at 0x0053131ece38 thread T0 #0 0x7ff79697a68e in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long long), void const*, void const*, unsigned long long) C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:814 #1 0x7ff79697aebd in memcmp C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:845 #2 0x7ff79697aebd in memcmp C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:840 #3 0x7ff7927e237f in regcache::raw_write(int, gdb::array_view<unsigned char const>) C:/gdb/src/gdb.git/gdb/regcache.c:874 #4 0x7ff7927e3c85 in regcache::cooked_write(int, gdb::array_view<unsigned char const>) C:/gdb/src/gdb.git/gdb/regcache.c:914 #5 0x7ff7927e5d89 in regcache::cooked_write(int, unsigned char const*) C:/gdb/src/gdb.git/gdb/regcache.c:933 #6 0x7ff7911d5965 in amd64_windows_store_arg_in_reg C:/gdb/src/gdb.git/gdb/amd64-windows-tdep.c:216 Address 0x0053131ece38 is located in stack of thread T0 at offset 40 in frame #0 0x7ff7911d565f in amd64_windows_store_arg_in_reg C:/gdb/src/gdb.git/gdb/amd64-windows-tdep.c:208 This frame has 4 object(s): [32, 40) 'buf' (line 211) <== Memory access at offset 40 overflows this variable It's because the first 4 double arguments are passed via XMM registers, and they need a buffer of 16 bytes, even if we only use 8 bytes of them. Approved-By: Tom Tromey <tom@tromey.com>
On Windows gcore is not implemented, and if you try it, you get an heap-use-after-free error: (gdb) gcore C:/gdb/build64/gdb-git-python3/gdb/testsuite/outputs/gdb.base/gcore-buffer-overflow/gcore-buffer-overflow.test warning: cannot close "================================================================= ==10108==ERROR: AddressSanitizer: heap-use-after-free on address 0x1259ea503110 at pc 0x7ff6806e3936 bp 0x0062e01ed990 sp 0x0062e01ed140 READ of size 111 at 0x1259ea503110 thread T0 #0 0x7ff6806e3935 in strlen C:/gcc/src/gcc-14.2.0/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:391 #1 0x7ff6807169c4 in __pformat_puts C:/gcc/src/mingw-w64-v12.0.0/mingw-w64-crt/stdio/mingw_pformat.c:558 #2 0x7ff6807186c1 in __mingw_pformat C:/gcc/src/mingw-w64-v12.0.0/mingw-w64-crt/stdio/mingw_pformat.c:2514 #3 0x7ff680713614 in __mingw_vsnprintf C:/gcc/src/mingw-w64-v12.0.0/mingw-w64-crt/stdio/mingw_vsnprintf.c:41 #4 0x7ff67f34419f in vsnprintf(char*, unsigned long long, char const*, char*) C:/msys64/mingw64/x86_64-w64-mingw32/include/stdio.h:484 #5 0x7ff67f34419f in string_vprintf[abi:cxx11](char const*, char*) C:/gdb/src/gdb.git/gdbsupport/common-utils.cc:106 #6 0x7ff67b37b739 in cli_ui_out::do_message(ui_file_style const&, char const*, char*) C:/gdb/src/gdb.git/gdb/cli-out.c:227 #7 0x7ff67ce3d030 in ui_out::call_do_message(ui_file_style const&, char const*, ...) C:/gdb/src/gdb.git/gdb/ui-out.c:571 #8 0x7ff67ce4255a in ui_out::vmessage(ui_file_style const&, char const*, char*) C:/gdb/src/gdb.git/gdb/ui-out.c:740 #9 0x7ff67ce2c873 in ui_file::vprintf(char const*, char*) C:/gdb/src/gdb.git/gdb/ui-file.c:73 #10 0x7ff67ce7f83d in gdb_vprintf(ui_file*, char const*, char*) C:/gdb/src/gdb.git/gdb/utils.c:1881 #11 0x7ff67ce7f83d in vwarning(char const*, char*) C:/gdb/src/gdb.git/gdb/utils.c:181 #12 0x7ff67f3530eb in warning(char const*, ...) C:/gdb/src/gdb.git/gdbsupport/errors.cc:33 #13 0x7ff67baed27f in gdb_bfd_close_warning C:/gdb/src/gdb.git/gdb/gdb_bfd.c:437 #14 0x7ff67baed27f in gdb_bfd_close_or_warn C:/gdb/src/gdb.git/gdb/gdb_bfd.c:646 #15 0x7ff67baed27f in gdb_bfd_unref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.c:739 #16 0x7ff68094b6f2 in gdb_bfd_ref_policy::decref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.h:82 #17 0x7ff68094b6f2 in gdb::ref_ptr<bfd, gdb_bfd_ref_policy>::~ref_ptr() C:/gdb/src/gdb.git/gdbsupport/gdb_ref_ptr.h:91 #18 0x7ff67badf4d2 in gcore_command C:/gdb/src/gdb.git/gdb/gcore.c:176 0x1259ea503110 is located 16 bytes inside of 4064-byte region [0x1259ea503100,0x1259ea5040e0) freed by thread T0 here: #0 0x7ff6806b1687 in free C:/gcc/src/gcc-14.2.0/libsanitizer/asan/asan_malloc_win.cpp:90 #1 0x7ff67f2ae807 in objalloc_free C:/gdb/src/gdb.git/libiberty/objalloc.c:187 #2 0x7ff67d7f56e3 in _bfd_free_cached_info C:/gdb/src/gdb.git/bfd/opncls.c:247 #3 0x7ff67d7f2782 in _bfd_delete_bfd C:/gdb/src/gdb.git/bfd/opncls.c:180 #4 0x7ff67d7f5df9 in bfd_close_all_done C:/gdb/src/gdb.git/bfd/opncls.c:960 #5 0x7ff67d7f62ec in bfd_close C:/gdb/src/gdb.git/bfd/opncls.c:925 #6 0x7ff67baecd27 in gdb_bfd_close_or_warn C:/gdb/src/gdb.git/gdb/gdb_bfd.c:643 #7 0x7ff67baecd27 in gdb_bfd_unref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.c:739 #8 0x7ff68094b6f2 in gdb_bfd_ref_policy::decref(bfd*) C:/gdb/src/gdb.git/gdb/gdb_bfd.h:82 #9 0x7ff68094b6f2 in gdb::ref_ptr<bfd, gdb_bfd_ref_policy>::~ref_ptr() C:/gdb/src/gdb.git/gdbsupport/gdb_ref_ptr.h:91 #10 0x7ff67badf4d2 in gcore_command C:/gdb/src/gdb.git/gdb/gcore.c:176 It happens because gdb_bfd_close_or_warn uses a bfd-internal name for the failing-close warning, after the close is finished, and the name already freed: static int gdb_bfd_close_or_warn (struct bfd *abfd) { int ret; const char *name = bfd_get_filename (abfd); for (asection *sect : gdb_bfd_sections (abfd)) free_one_bfd_section (sect); ret = bfd_close (abfd); if (!ret) gdb_bfd_close_warning (name, bfd_errmsg (bfd_get_error ())); return ret; } Fixed by making a copy of the name for the warning. Approved-By: Andrew Burgess <aburgess@redhat.com>
After the commit: commit b9de07a Date: Thu Oct 10 11:37:34 2024 +0100 gdb: fix handling of DW_AT_entry_pc of inlined subroutines GDB's buildbot CI testing highlighted this assertion failure: (gdb) c Continuing. ../../binutils-gdb/gdb/block.h:203: internal-error: set_entry_pc: Assertion `start >= this->start () && start < this->end ()' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- FAIL: gdb.base/break-probes.exp: run til our library loads (GDB internal error) This assertion was in the new function set_entry_pc and is asserting that the default_entry_pc() value is within the blocks start/end range. The default_entry_pc() is the value GDB will use as the entry-pc if the DWARF doesn't specifically override the entry-pc. This value is calculated as: 1. The start address of the first sub-range within the block, if the block has more than 1 range, or 2. The low address (from DW_AT_low_pc) for the block. If the block only has a single range then this means the block was defined with low/high pc attributes (case #2 above). These low/high pc values are what block::start() and block::end() return. This means that by definition, if the block is continuous, the above assert cannot trigger as 'start', the default_entry_pc() would be equivalent to block::start(). This means that, for the assert to trigger, the block must have multiple ranges, and the first address of the first range is not within the blocks low/high address range. This seems wrong. I inspected the state at the time the assert triggered and discovered the block's start() address. Then I removed the assert and restarted GDB. I was now able to inspect the blocks at the offending address: (gdb) maintenance info blocks 0x7ffff7dddaa4 Blocks at 0x7ffff7dddaa4: from objfile: [(objfile *) 0x44a37f0] /lib64/ld-linux-x86-64.so.2 [(block *) 0x46b30c0] 0x7ffff7ddd5a0..0x7ffff7dde8a6 entry pc: 0x7ffff7ddd5a0 is global block symbol count: 4 is contiguous [(block *) 0x46b3020] 0x7ffff7ddd5a0..0x7ffff7dde8a6 entry pc: 0x7ffff7ddd5a0 is static block symbol count: 9 is contiguous [(block *) 0x46b2f70] 0x7ffff7ddda00..0x7ffff7dddac3 entry pc: 0x7ffff7ddda00 function: __GI__dl_find_dso_for_object symbol count: 4 is contiguous [(block *) 0x46b2e10] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 inline function: __GI__dl_find_dso_for_object symbol count: 5 is contiguous [(block *) 0x46b2a40] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 symbol count: 1 is contiguous [(block *) 0x46b2970] 0x7ffff7dddaa4..0x7ffff7dddac3 entry pc: 0x7ffff7dddaa4 symbol count: 2 address ranges: 0x7ffff7ddda0e..0x7ffff7ddda77 0x7ffff7ddda90..0x7ffff7ddda96 I've left everything in for context, but the only really interesting bit is the very last block, it's low/high range is: 0x7ffff7dddaa4..0x7ffff7dddac3 but it has separate ranges: 0x7ffff7ddda0e..0x7ffff7ddda77 0x7ffff7ddda90..0x7ffff7ddda96 which are all outside the low/high range. This is what triggers the assert. But why does that block exist at all? What I believe is happening is that we're running into a bug in older versions of GCC. The buildbot failure was with an 8.5 gcc, and Tom de Vries also reported seeing failures when using version 7 and 8 gcc, but not with gcc 9 and onward. Looking at the DWARF I can see that the problematic block is created from this DIE: <4><15efb>: Abbrev Number: 83 (DW_TAG_lexical_block) <15efc> DW_AT_abstract_origin: <0x15e9f> <15efe> DW_AT_low_pc : 0x7ffff7dddaa4 <15f06> DW_AT_high_pc : 31 which links via DW_AT_abstract_origin to: <2><15e9f>: Abbrev Number: 80 (DW_TAG_lexical_block) <15ea0> DW_AT_ranges : 0x38e0 <15ea4> DW_AT_sibling : <0x15eca> And so we can see that <15efb> has got both low/high pc attributes and a ranges attribute. If I widen my checking to parents of DIE <15efb> then I see that they also have DW_AT_abstract_origin, however, there is something interesting going on, the parent DIEs are linking to a different DIE tree than <15efb>. What I believe is happening is this, we have an abstract instance tree, this is rooted at a DW_AT_subprogram, and contains all the blocks, variables, parameters, etc, that you would expect. As this is an abstract instance, then there are no low/high pc attributes, and no ranges attributes in this tree. This makes sense. Now elsewhere we have a DW_TAG_subprogram (not DW_TAG_inlined_subroutine) which links via DW_AT_abstract_origin to the abstract DW_AT_subprogram. This case is documented in the DWARF 5 spec in section 3.3.8.3, and describes an Out-of-Line Instance of an Inlined Subroutine. Within this out of line instance many of the DIE correctly link back, using DW_AT_abstract_origin to the abstract instance tree. This tree also includes the DIE <15e9f>, which is where our problem DIE references. Now, to really confuse things, within this out-of-line instance we have a DW_TAG_inlined_subroutine, which is another instance of the same abstract instance tree! This would seem to indicate a recursive call to the inline function, and the compiler, for some reason, needed to instantiate an out of line instance of this function. And it is within this nested, inlined subroutine, that the problem DIE exists. The problem DIE is referencing the corresponding DIE within the out of line instance tree, but I am convinced this must be a (long fixed) GCC bug, and that the problem DIE should be referencing the DIE within the abstract instance tree. I'm aware that the above is pretty confusing. The actual DWARF would be a around 200 lines long, so I'd like to avoid dumping it in here. But here's my attempt at representing what's going on in a minimal example. The numbers down the side represent the section offset, not the nesting level, and I've removed any attributes that are not relevant: <1> DW_TAG_subprogram <2> DW_TAG_lexical_block <3> DW_TAG_subprogram DW_AT_abstract_origin <1> <4> DW_TAG_lexical_block DW_AT_ranges ... <5> DW_TAG_inlined_subroutine DW_AT_abstract_origin <1> <6> DW_TAG_lexical_block DW_AT_abstract_origin <4> DW_AT_low_pc ... DW_AT_high_pc ... The lexical block at <6> is linking to <4> when it should be linking to <2>. There is one additional thing that we might wonder about, which is, when calculating the low/high pc range for a block, why does GDB not make use of the range information and expand the range beyond the defined low/high values? The answer to this is in dwarf_get_pc_bounds_ranges_or_highlow_pc in dwarf/read.c. This is where the low/high bounds are calculated. What we see is that GDB first checks for a low/high attribute pair, and if that is present, this defines the address range for the block. Only if there is no DW_AT_low_pc do we check for the DW_AT_ranges, and use that to define the extent of the block. And this makes sense, section 3.5 of the DWARF-5 spec says: The lexical block entry may have either a DW_AT_low_pc and DW_AT_high_pc pair of attributes or a DW_AT_ranges attribute whose values encode the contiguous or non-contiguous address ranges, respectively, of the machine instructions generated for the lexical block... Section 3.5 is specifically about lexical blocks, but the same wording, about it being either low/high OR ranges is repeated for other DW_TAG_ types. So this explains why GDB doesn't use the ranges to expand the problem blocks ranges; as the first DIE has low/high addresses, these are used, and the ranges is not consulted. It is only later in dwarf2_record_block_ranges that we create a range based off the low/high pc, and then also process the ranges data, this allows the problem block to exist with ranges that are outside the low/high range. To solve this I considered a number of options: 1. Prevent loading certain attributes from an abstract instance. Section 3.3.8.1 of the DWARF-5 spec talks about which attributes are appropriate to place in an abstract instance. Any attribute that might vary between instances should not appear in an abstract instance. DW_AT_ranges is included as an example in the non-exhaustive list of attributes that should not appear in an abstract instance. Currently in dwarf2_attr (dwarf2/read.c), when we see a DW_AT_abstract_origin attribute, we always follow this to try and find the attribute we are looking for. But we could change this function so that we prevent this following for attributes that we know should not be looked up in an abstract instance. This would solve the problem in this case by preventing us finding the DW_AT_ranges in the incorrect abstract instance. 2. Filter the ranges. Having established a blocks low/high address range in dwarf_get_pc_bounds_ranges_or_highlow_pc, we could allow dwarf2_record_block_ranges to parse the ranges, but we could reject any range that extends outside the blocks defined start and end addresses. For well behaved DWARF where we have either low/high or ranges, then the blocks start/end are defined from the range data, and so, by definition, every range would be acceptable. But in our problem case we would reject all of the invalid ranges. This is my least favourite solution as it feels like rejecting the ranges is tackling the problem too late on. 3. Don't try to parse ranges when we have low/high attributes. This option involves updating dwarf2_record_block_ranges to match the behaviour of dwarf_get_pc_bounds_ranges_or_highlow_pc, and, I believe, to match the DWARF spec: don't try to read range data from DW_AT_ranges if we have low/high pc attributes. In our case this solves the issue because the problematic DIE has the low/high attributes, and it then links to the wrong DIE which happens to have DW_AT_ranges. With this change in place we don't even look for the DW_AT_ranges. If the problem were reversed, and the initial DIE had DW_AT_ranges, but the incorrectly referenced DIE had the low/high pc attributes, we would pick up the wrong addresses, but this wouldn't trigger any asserts. The reason is that dwarf_get_pc_bounds_ranges_or_highlow_pc would also find the low/high addresses from the incorrectly referenced DIE, and so we would just end up with a block which had the wrong address ranges, but the block would be self consistent, which is different to the problem we hit here. In the end, in this commit I went with solution #3, having dwarf_get_pc_bounds_ranges_or_highlow_pc and dwarf2_record_block_ranges be consistent seems sensible. However, I do wonder if in the future we might want to explore solution #1 as an additional safety feature. With this patch in place I'm able to run the gdb.base/break-probes.exp without seeing the assert that CI testing highlighted. I see no regressions when testing on x86-64 GNU/Linux with gcc 9.3.1. Note: the diff in this commit looks big, but it's really just me indenting the code. Approved-By: Tom Tromey <tom@tromey.com>
Make method calls work. There are two problems here:
OP_FUNCALL
in our evaluatorThe text was updated successfully, but these errors were encountered: