Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent emulator exit on samples/userspace/shared_mem on qemu_x86_64 #22106

Closed
andrewboie opened this issue Jan 22, 2020 · 1 comment
Closed
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug

Comments

@andrewboie
Copy link
Contributor

andrewboie commented Jan 22, 2020

Describe the bug
Running this test case seems to eventually cause QEMU to exit, or get stuck. When it exits, there's no debug information printed. This doesn't seem to happen with SMP disabled, but I don't have enough data to say where the root cause lies. I'm splitting this into a separate ticket from #21317 as it might be a bug in the x86 port.

Given the maturity of QEMU's x86 emulation, I do not suspect QEMU. I am going to see if I can get this to blow up on HW however.

To Reproduce
Build and run samples/userspace/shared_mem on qemu_x86_64.
My zephyr tree is at revision e1052a0
The test may need to run for a while before this can be observed.

Expected behavior
The test should run indefinitely. Instead it often either gets stuck, or the emulator crashes. This is a sample, not a test case, so sanitycheck relies on regular expressions to determine success/failure, and often the issues appear after enough console output has been printed to consider it a pass.

Impact
This could be a bug in the kernel or the core x86 code. A thread running in user mode should never be able to cause the CPU to freak out like this.

Screenshots or console output
I managed to catch this in the act under GDB with a QEMU 4.2.0 that I built myself. Here's a stacktrace:

PT Sending Message 1'
CT Thread Receivedd Message
CT MSG: ofttbhfspgmeqzos
ENC Thread Received Data
ENC PT MSG: ofttbhfspgmeqzos

CT Thread Receivedd Message
CT MSG: messagetoencrypt
[Switching to Thread 0x7fffe8c0c700 (LWP 28948)]

Thread 3 "qemu-system-x86" hit Breakpoint 1, __GI_exit (status=1) at exit.c:138
138	exit.c: No such file or directory.
(gdb) bt
#0  0x00007ffff52a1d40 in __GI_exit (status=1) at exit.c:138
#1  0x0000555555aa5eb3 in debug_exit_write
    (opaque=<optimized out>, addr=<optimized out>, val=<optimized out>, width=<optimized out>)
    at hw/misc/debugexit.c:35
#2  0x000055555589313b in memory_region_write_accessor
    (mr=mr@entry=0x5555565ac490, addr=0, value=value@entry=0x7fffe8c07e98, size=size@entry=4, shift=)
    at /home/apboie/Downloads/qemu-4.2.0/memory.c:483
#3  0x0000555555890e1e in access_with_adjusted_size
    (addr=addr@entry=0, value=value@entry=0x7fffe8c07e98, size=size@entry=4, access_size_min=<optimi=
    0x555555893010 <memory_region_write_accessor>, mr=0x5555565ac490, attrs=...)
    at /home/apboie/Downloads/qemu-4.2.0/memory.c:544
#4  0x00005555558953f3 in memory_region_dispatch_write
    (mr=mr@entry=0x5555565ac490, addr=0, data=<optimized out>, 
    data@entry=0, op=op@entry=MO_32, attrs=attrs@entry=...)
    at /home/apboie/Downloads/qemu-4.2.0/memory.c:1475
#5  0x00005555558464bf in address_space_stl_internal
    (endian=DEVICE_NATIVE_ENDIAN, result=0x0, attrs=..., val=0, addr=<optimized out>, as=<optimized 2
#6  0x00005555558464bf in address_space_stl
    (as=<optimized out>, addr=<optimized out>, val=0, attrs=..., result=0x0)
    at /home/apboie/Downloads/qemu-4.2.0/memory_ldst.inc.c:346
#7  0x00007fffe8cfe461 in code_gen_buffer ()
#8  0x00005555558bf29c in cpu_tb_exec
    (itb=<optimized out>, cpu=0x7fffe8cfe380 <code_gen_buffer+987987>)
    at /home/apboie/Downloads/qemu-4.2.0/accel/tcg/cpu-exec.c:172
#9  0x00005555558bf29c in cpu_loop_exec_tb
    (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimized out>, cpu=0x7fffe8cfe388
#10 0x00005555558bf29c in cpu_exec (cpu=cpu@entry=0x55555666a620)
    at /home/apboie/Downloads/qemu-4.2.0/accel/tcg/cpu-exec.c:731
#11 0x0000555555885c10 in tcg_cpu_exec (cpu=0x55555666a620)
    at /home/apboie/Downloads/qemu-4.2.0/cpus.c:1473
#12 0x0000555555888134 in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x55555666a620)
    at /home/apboie/Downloads/qemu-4.2.0/cpus.c:1781
#13 0x0000555555cf2a83 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:519
#14 0x00007ffff5452669 in start_thread (arg=<optimized out>) at pthread_create.c:479
#15 0x00007ffff537a323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 

Looks like this is triggered by a write to a NULL memory address, likely when the CPU is trying to handle an exception, as normally we would get a comprehensible page fault error -- a triple fault. I've seen QEMU give up on triple fault before.

I've posted to the QEMU mailing list for advice on how to get more data about the emulated target's execution state when this happens. I don't know much about QEMU internals.

@andrewboie andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Jan 22, 2020
@andrewboie andrewboie self-assigned this Jan 22, 2020
@andrewboie
Copy link
Contributor Author

This isn't a triple fault, the emulator is exiting because we are getting to k_sys_fatal_error_handler(). We don't see anything on the console because this is a sample and CONFIG_LOG is turned off. If I enable it I get a fatal error which suggests that the same thread is running on two cpus at the same time. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

No branches or pull requests

1 participant