fib6_walk_continue: Unable to handle kernel paging request at virtual address 12005456 #11

legoater · 2017-07-07T07:38:11Z

I've seen this panic with 6d93f62 a number of times under qemu's palmetto-bmc machine. Unsure at this stage whether it's the kernel or something qemu is doing wrong and I admit my qemu build is a bit of a mashup, but here's the panic output:

Unable to handle kernel paging request at virtual address 12005456
pgd = c0004000
[12005456] *pgd=00000000
Internal error: Oops: 5 [#1] ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.1-00053-g6d93f625315a-dirty #579
Hardware name: ASpeed SoC
task: c040c700 ti: c0408000 task.ti: c0408000
PC is at fib6_walk_continue+0x74/0x168
LR is at fib6_clean_node+0xb8/0x160
pc : [<c02991d0>]    lr : [<c029af0c>]    psr: 80000113
sp : c0409d90  ip : 00000001  fp : c8c0f280
r10: c029ae54  r9 : c02992dc  r8 : 00000000
r7 : c0409e34  r6 : 00000633  r5 : 00000002  r4 : c0409dc8
r3 : 12005452  r2 : 00000000  r1 : c8e31b60  r0 : 00000000
Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 00093177  Table: 48da0000  DAC: 00000051
Process swapper (pid: 0, stack limit = 0xc0408190)
Stack: (0xc0409d90 to 0xc040a000)
9d80:                                     c0409dc8 c0422908 c0422d28 c02998b8
9da0: 00000000 c040a028 c0422908 c0299c40 00000000 00000000 c0299b9c 60000193
9dc0: c040ca88 c8c0f28c c0422cd4 c0422cd4 c8c0f2a8 12005452 c8fafaa0 00000000
9de0: c040ca00 00000000 00000004 c029ae54 c0426e50 c0422908 c02992dc 00000000
9e00: c0409e34 dc8ba501 c029785c c0422908 00000bb8 c040a028 c0422ca0 c0422d38
9e20: c0414200 c0422908 00000200 c029b040 00000001 00000bb8 00000000 dc8ba501
9e40: 00000000 c0409e74 ffffe000 c040a028 00000100 00000001 c029b0a8 c004e9c8
9e60: 00000001 00000000 c004e954 60000193 c040ca88 c09b11b4 c0577c10 00000000
9e80: c039fceb dc8ba501 c04139c0 c0414200 c0413dfc c0422ca0 c04139c0 c029b0a8
9ea0: c0422908 c004f150 c040a028 00000000 00000000 dc8ba501 00000000 c0408000
9ec0: 40000001 00200000 00000002 c042d8e0 0000000a c042d904 00000100 c0019b90
9ee0: c040c700 df407100 df40715c ffff9f29 00000002 c0414200 c042d900 00000001
9f00: c041ad24 00000000 c041ad24 00000000 df402200 00000001 dfffce80 c03f9a48
9f20: 00000000 c0019f18 00000000 c0045b10 df405020 00000001 c0409f60 00000000
9f40: 00000020 c00094a8 c000ad14 60000013 ffffffff c0409f94 c040a020 c000de30
9f60: 00000000 00093177 00092177 60000013 c0408000 00000001 ffffffff c040a028
9f80: c040a020 dfffce80 c03f9a48 00000000 600000d3 c0409fb0 c000ad08 c000ad14
9fa0: 60000013 ffffffff 00000053 c005e3e0 c0408000 c003ae58 c042d20c c03dbc4c
9fc0: ffffffff ffffffff 00000000 c03db6a8 c03f9a48 00000000 c042d334 c040a03c
9fe0: c03f9a44 c040dd24 40004000 41069265 403f7c2c 40008048 00000000 00000000
[<c02991d0>] (fib6_walk_continue) from [<c02998b8>] (fib6_walk+0x4c/0x68)
[<c02998b8>] (fib6_walk) from [<c0299c40>] (__fib6_clean_all+0xa4/0xfc)
[<c0299c40>] (__fib6_clean_all) from [<c029b040>] (fib6_run_gc+0x80/0xe8)
[<c029b040>] (fib6_run_gc) from [<c004e9c8>] (call_timer_fn+0x74/0x118)
[<c004e9c8>] (call_timer_fn) from [<c004f150>] (run_timer_softirq+0x1ac/0x1fc)
[<c004f150>] (run_timer_softirq) from [<c0019b90>] (__do_softirq+0xc0/0x1f0)
[<c0019b90>] (__do_softirq) from [<c0019f18>] (irq_exit+0x84/0xe8)
[<c0019f18>] (irq_exit) from [<c0045b10>] (__handle_domain_irq+0x84/0xa0)
[<c0045b10>] (__handle_domain_irq) from [<c00094a8>] (avic_handle_irq+0x68/0x70)
[<c00094a8>] (avic_handle_irq) from [<c000de30>] (__irq_svc+0x50/0x64)
Exception stack(0xc0409f60 to 0xc0409fa8)
9f60: 00000000 00093177 00092177 60000013 c0408000 00000001 ffffffff c040a028
9f80: c040a020 dfffce80 c03f9a48 00000000 600000d3 c0409fb0 c000ad08 c000ad14
9fa0: 60000013 ffffffff
[<c000de30>] (__irq_svc) from [<c000ad14>] (arch_cpu_idle+0x2c/0x34)
[<c000ad14>] (arch_cpu_idle) from [<c003ae58>] (cpu_startup_entry+0x7c/0xd8)
[<c003ae58>] (cpu_startup_entry) from [<c03dbc4c>] (start_kernel+0x354/0x3e0)
Code: c02991d0 c02991e4 c0299208 c0299260 (e5932004) 
---[ end trace 50fcc4bab94a343d ]---
Kernel panic - not syncing: Fatal exception in interrupt
---[ end Kernel panic - not syncing: Fatal exception in interrupt

The text was updated successfully, but these errors were encountered:

if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu crash. The backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080, io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c: #2 0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438 #3 0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47 #4 0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327 at migration/qemu-file-channel.c:83 #5 0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299 #6 0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562 #7 0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575 #8 0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655 #9 0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126 #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366 #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1 #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6 #13 0x00007f96fe858760 in ?? () #14 0x0000000000000000 in ?? () RDMA QIOChannel not implement io_set_aio_fd_handler. so qio_channel_set_aio_fd_handler will access NULL pointer. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash. The backtrace is: (gdb) bt #0 0x00007fb20aba91d7 in raise () from /lib64/libc.so.6 #1 0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6 #2 0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460 #5 0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768) at migration/qemu-file-channel.c:83 #6 0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299 #7 0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562 #8 0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575 #9 0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647 #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794 #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504 #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6 This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine(). Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

Because RDMA QIOChannel not implement shutdown function, If the to_dst_file was set error, the return path thread will wait forever. and the migration thread will wait return path thread exit. the backtrace of return path thread is: (gdb) bt #0 0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6 #1 0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000) at qemu-timer.c:325 #2 0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000) at migration/rdma.c:1501 #3 0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000, byte_len=0x7ef7091d0640) at migration/rdma.c:1580 #4 0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000, head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726 #5 0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720, expecting=3) at migration/rdma.c:1903 #6 0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8, size=32768) at migration/rdma.c:2714 #7 0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232 #8 0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0) at migration/qemu-file.c:502 #9 0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515 #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591 #11 0x00000000006a46d3 in source_return_path_thread ( opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331 #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f372a77635d in clone () from /lib64/libc.so.6 the backtrace of migration thread is: (gdb) bt #0 0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>) at util/qemu-thread-posix.c:504 #2 0x00000000006a4bc5 in await_return_path_close_on_source ( ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460 #3 0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>, current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980) at migration/migration.c:1695 #4 0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1837 #5 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f372a77635d in clone () from /lib64/libc.so.6 Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

tests/cdrom-test -p /x86_64/cdrom/boot/megasas Produces the following ASAN leak. ==25700==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x55a729f17738 in pci_dma_sglist_init /home/elmarco/src/qq/include/hw/pci/pci.h:818 #3 0x55a729f2a706 in megasas_map_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:698 #4 0x55a729f39421 in megasas_handle_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:1574 #5 0x55a729f3f70d in megasas_handle_frame /home/elmarco/src/qq/hw/scsi/megasas.c:1955 #6 0x55a729f40939 in megasas_mmio_write /home/elmarco/src/qq/hw/scsi/megasas.c:2119 #7 0x55a729f41102 in megasas_port_write /home/elmarco/src/qq/hw/scsi/megasas.c:2170 #8 0x55a729220e60 in memory_region_write_accessor /home/elmarco/src/qq/memory.c:527 #9 0x55a7292212b3 in access_with_adjusted_size /home/elmarco/src/qq/memory.c:594 #10 0x55a72922cf70 in memory_region_dispatch_write /home/elmarco/src/qq/memory.c:1473 #11 0x55a7290f5907 in flatview_write_continue /home/elmarco/src/qq/exec.c:3255 #12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294 #13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384 #14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395 #15 0x55a72929ecb0 in kvm_handle_io /home/elmarco/src/qq/accel/kvm/kvm-all.c:1729 #16 0x55a7292a0db5 in kvm_cpu_exec /home/elmarco/src/qq/accel/kvm/kvm-all.c:1969 #17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn /home/elmarco/src/qq/cpus.c:1215 #18 0x55a72a966a6c in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:504 #19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593) Move the qemu_sglist_destroy() from megasas_complete_command() to megasas_unmap_frame(), so map/unmap are balanced. Signed-off-by: Marc-AndrÃ© Lureau <marcandre.lureau@redhat.com> Message-Id: <20180814141247.32336-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Spotted by ASAN doing some manual testing: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f5fcdc75e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f5fcd47241d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) #2 0x55f989be92ce in timer_new /home/elmarco/src/qq/include/qemu/timer.h:561 #3 0x55f989be92ff in timer_new_ms /home/elmarco/src/qq/include/qemu/timer.h:630 #4 0x55f989c0219d in hmp_migrate /home/elmarco/src/qq/hmp.c:2038 #5 0x55f98955927b in handle_hmp_command /home/elmarco/src/qq/monitor.c:3498 #6 0x55f98955fb8c in monitor_command_cb /home/elmarco/src/qq/monitor.c:4371 #7 0x55f98ad40f11 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:393 #8 0x55f98955fa4f in monitor_read /home/elmarco/src/qq/monitor.c:4354 #9 0x55f98aae30d7 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175 #10 0x55f98aae317a in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187 #11 0x55f98aae940c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 #12 0x55f98ab63018 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 #13 0x7f5fcd46c8ac in g_main_dispatch gmain.c:3177 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901134652.25884-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 #3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 #4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 #5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 #6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 #2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 #3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 #4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 #5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 #6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 #30 0x00005631fbd81412 in main_loop () at vl.c:1866 #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Spotted by ASAN while running: $ tests/migration-test -p /x86_64/migration/postcopy/recovery ================================================================= ==18034==ERROR: LeakSanitizer: detected memory leaks Direct leak of 33864 byte(s) in 1 object(s) allocated from: #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183 #3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159 #4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78 #5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295 #6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111 #7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91 #8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108 #9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158 #10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89 #11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Recently, the test case has started failing because some job related functions want to drop the AioContext lock even though it hasn't been taken: (gdb) bt #0 0x00007f51c067c9fb in raise () from /lib64/libc.so.6 #1 0x00007f51c067e77d in abort () from /lib64/libc.so.6 #2 0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 #4 0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 #5 0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738 #6 0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986 #7 0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941 #8 0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148 #9 0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 #12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783 #13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402 #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 #15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478 #16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0 #20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 #21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581 It is yet unclear whether this should really be considered a bug in the test case or whether blk_unref() should work for callers that haven't taken the AioContext lock, but in order to fix the build tests quickly, just take the AioContext lock around blk_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Spotted by ASAN: ================================================================= ==11893==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1120 byte(s) in 28 object(s) allocated from: #0 0x7fd0515b0c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7fd050ffa3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x559e708b56a4 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:66 #3 0x559e708b4fe0 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:23 #4 0x559e708bda7d in parse_string /home/elmarco/src/qq/qobject/json-parser.c:143 #5 0x559e708c1009 in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:484 #6 0x559e708c1627 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:547 #7 0x559e708c1c67 in json_parser_parse /home/elmarco/src/qq/qobject/json-parser.c:573 #8 0x559e708bc0ff in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:92 #9 0x559e708d1655 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:292 #10 0x559e708d1fe1 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:339 #11 0x559e708bc856 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:121 #12 0x559e708b8b4b in qobject_from_jsonv /home/elmarco/src/qq/qobject/qjson.c:69 #13 0x559e708b8d02 in qobject_from_json /home/elmarco/src/qq/qobject/qjson.c:83 #14 0x559e708a74ae in from_json_str /home/elmarco/src/qq/tests/check-qjson.c:30 #15 0x559e708a9f83 in utf8_string /home/elmarco/src/qq/tests/check-qjson.c:781 #16 0x7fd05101bc49 in test_case_run gtestutils.c:2255 #17 0x7fd05101bc49 in g_test_run_suite_internal gtestutils.c:2339 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901211917.10372-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Let start from the beginning: Commit b9e413d (in 2.9) "block: explicitly acquire aiocontext in aio callbacks that need it" added pairs of aio_context_acquire/release to mirror_write_complete and mirror_read_complete, when they were aio callbacks for blk_aio_* calls. Then, commit 2e1990b (in 3.0) "block/mirror: Convert to coroutines" dropped these blk_aio_* calls, than mirror_write_complete and mirror_read_complete are not callbacks more, and don't need additional aiocontext acquiring. Furthermore, mirror_read_complete calls blk_co_pwritev inside these pair of aio_context_acquire/release, which leads to the following dead-lock with mirror: (gdb) info thr Id Target Id Frame 3 Thread (LWP 145412) "qemu-system-x86" syscall () 2 Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait () * 1 Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait () (gdb) bt #0 __lll_lock_wait () #1 _L_lock_812 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at util/qemu-thread-posix.c:66 #4 qemu_mutex_lock_iothread_impl #5 os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236 #6 main_loop_wait (nonblocking=0) at util/main-loop.c:497 #7 main_loop () at vl.c:1892 #8 main Printing contents of qemu_global_mutex, I see that "__owner = 145416", so, thr1 is main loop, and now it wants BQL, which is owned by thr2. (gdb) thr 2 (gdb) bt #0 __lll_lock_wait () #1 _L_lock_870 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561034d25dc0, ... #4 aio_context_acquire (ctx=0x561034d25d60) #5 dma_blk_cb #6 dma_blk_io #7 dma_blk_read #8 ide_dma_cb #9 bmdma_cmd_writeb #10 bmdma_write #11 memory_region_write_accessor #12 access_with_adjusted_size #15 flatview_write #16 address_space_write #17 address_space_rw #18 kvm_handle_io #19 kvm_cpu_exec #20 qemu_kvm_cpu_thread_fn #21 qemu_thread_start #22 start_thread #23 clone () Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio context mutex, which is owned by thr1. Classic dead-lock. Then, let's check that aio context is hold by mirror coroutine: just print coroutine stack of first tracked request in mirror job target: (gdb) [...] (gdb) qemu coroutine 0x561035dd0860 #0 qemu_coroutine_switch #1 qemu_coroutine_yield #2 qemu_co_mutex_lock_slowpath #3 qemu_co_mutex_lock #4 qcow2_co_pwritev #5 bdrv_driver_pwritev #6 bdrv_aligned_pwritev #7 bdrv_co_pwritev #8 blk_co_pwritev #9 mirror_read_complete () at block/mirror.c:232 #10 mirror_co_read () at block/mirror.c:370 #11 coroutine_trampoline #12 __start_context Yes it is mirror_read_complete calling blk_co_pwritev after acquiring aio context. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: > #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > <<EOF > {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done [1]: > #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 > #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 > #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 > #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 > #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 > #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 > #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 > #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 > #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Cc: qemu-stable@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240712140716.517911-1-f.ebner@proxmox.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> (cherry picked from commit 6475155) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

In regime_is_user() we assert if we're passed an ARMMMUIdx_E10_* mmuidx value. This used to make sense because we only used this function in ptw.c and would never use it on this kind of stage 1+2 mmuidx, only for an individual stage 1 or stage 2 mmuidx. However, when we implemented FEAT_E0PD we added a callsite in aa64_va_parameters(), which means this can now be called for stage 1+2 mmuidx values if the guest sets the TCG_ELX.{E0PD0,E0PD1} bits to enable use of the feature. This will then result in an assertion failure later, for instance on a TLBI operation: #6 0x00007ffff6d0e70f in g_assertion_message_expr (domain=0x0, file=0x55555676eeba "../../target/arm/internals.h", line=978, func=0x555556771d48 <__func__.5> "regime_is_user", expr=<optimised out>) at ../../../glib/gtestutils.c:3279 #7 0x0000555555f286d2 in regime_is_user (env=0x555557f2fe00, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #8 0x0000555555f3e31c in aa64_va_parameters (env=0x555557f2fe00, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #9 0x0000555555f3163b in tlbi_aa64_get_range (env=0x555557f2fe00, mmuidx=ARMMMUIdx_E10_0, value=106721347371041) at ../../target/arm/helper.c:5214 #10 0x0000555555f317e8 in do_rvae_write (env=0x555557f2fe00, value=106721347371041, idxmap=21, synced=true) at ../../target/arm/helper.c:5260 #11 0x0000555555f31925 in tlbi_aa64_rvae1is_write (env=0x555557f2fe00, ri=0x555557fbeae0, value=106721347371041) at ../../target/arm/helper.c:5302 #12 0x0000555556036f8f in helper_set_cp_reg64 (env=0x555557f2fe00, rip=0x555557fbeae0, value=106721347371041) at ../../target/arm/tcg/op_helper.c:965 Since we do know whether these mmuidx values are for usermode or not, we can easily make regime_is_user() handle them: ARMMMUIdx_E10_0 is user, and the other two are not. Cc: qemu-stable@nongnu.org Fixes: e4c93e4 ("target/arm: Implement FEAT_E0PD") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20241017172331.822587-1-peter.maydell@linaro.org (cherry picked from commit 1505b65) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

qemu-ga on a NetBSD -current VM terminates with a SIGSEGV upon receiving 'guest-set-time' command... Core was generated by `qemu-ga'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 88 *str[len] = '\0'; [Current thread is 1 (process 1112)] (gdb) bt #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 #1 0x000000000cd37b60 in ga_run_command (argv=argv@entry=0xffffff922a90, action=action@entry=0xcda34b8 "set hardware clock to system time", errp=errp@entry=0xffffff922a70, in_str=0x0) at ../qga/commands-posix.c:164 #2 0x000000000cd380c4 in qmp_guest_set_time (has_time=<optimized out>, time_ns=<optimized out>, errp=errp@entry=0xffffff922ad0) at ../qga/commands-posix.c:304 #3 0x000000000cd253d8 in qmp_marshal_guest_set_time (args=<optimized out>, ret=<optimized out>, errp=0xffffff922b48) at qga/qga-qapi-commands.c:193 #4 0x000000000cd4e71c in qmp_dispatch (cmds=cmds@entry=0xcdf5b18 <ga_commands>, request=request@entry=0xf3c711a4b000, allow_oob=allow_oob@entry=false, cur_mon=cur_mon@entry=0x0) at ../qapi/qmp-dispatch.c:220 #5 0x000000000cd36524 in process_event (opaque=0xf3c711a79000, obj=0xf3c711a4b000, err=0x0) at ../qga/main.c:677 #6 0x000000000cd526f0 in json_message_process_token (lexer=lexer@entry=0xf3c711a79018, input=0xf3c712072480, type=type@entry=JSON_RCURLY, x=28, y=1) at ../qobject/json-streamer.c:99 #7 0x000000000cd93860 in json_lexer_feed_char (lexer=lexer@entry=0xf3c711a79018, ch=125 '}', flush=flush@entry=false) at ../qobject/json-lexer.c:313 #8 0x000000000cd93a00 in json_lexer_feed (lexer=lexer@entry=0xf3c711a79018, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-lexer.c:350 #9 0x000000000cd5290c in json_message_parser_feed (parser=parser@entry=0xf3c711a79000, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-streamer.c:121 #10 0x000000000cd361fc in channel_event_cb (condition=<optimized out>, data=0xf3c711a79000) at ../qga/main.c:703 #11 0x000000000cd3710c in ga_channel_client_event (channel=<optimized out>, condition=<optimized out>, data=0xf3c711b2d300) at ../qga/channel-posix.c:94 #12 0x0000f3c7120d9bec in g_main_dispatch () from /usr/pkg/lib/libglib-2.0.so.0 #13 0x0000f3c7120dd25c in g_main_context_iterate_unlocked.constprop () from /usr/pkg/lib/libglib-2.0.so.0 #14 0x0000f3c7120ddbf0 in g_main_loop_run () from /usr/pkg/lib/libglib-2.0.so.0 #15 0x000000000cda00d8 in run_agent_once (s=0xf3c711a79000) at ../qga/main.c:1522 #16 run_agent (s=0xf3c711a79000) at ../qga/main.c:1559 #17 main (argc=<optimized out>, argv=<optimized out>) at ../qga/main.c:1671 (gdb) The commandline options used on the host machine... qemu-system-aarch64 \ -machine type=virt,pflash0=rom \ -m 8G \ -cpu host \ -smp 8 \ -accel hvf \ -device virtio-net-pci,netdev=unet \ -device virtio-blk-pci,drive=hd \ -drive file=netbsd.qcow2,if=none,id=hd \ -netdev user,id=unet,hostfwd=tcp::2223-:22 \ -object rng-random,filename=/dev/urandom,id=viornd0 \ -device virtio-rng-pci,rng=viornd0 \ -serial mon:stdio \ -display none \ -blockdev node-name=rom,driver=file,filename=/opt/homebrew/Cellar/qemu/9.0.2/share/qemu/edk2-aarch64-code.fd,read-only=true \ -chardev socket,path=/tmp/qga_netbsd.sock,server=on,wait=off,id=qga0 \ -device virtio-serial \ -device virtconsole,chardev=qga0,name=org.qemu.guest_agent.0 This patch rectifies the operator precedence while assigning the NUL terminator. Fixes: c3f32c1 Signed-off-by: Sunil Nimmagadda <sunil@nimmagadda.net> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/m15xppk9qg.fsf@nimmagadda.net Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com> (cherry picked from commit 9cfe110) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

…et_end() In multifd_mapped_ram_fdset_end() we call qtest_qmp() but forgot to unref the response QDict we get back, which means it is leaked: Indirect leak of 4120 byte(s) in 1 object(s) allocated from: #0 0x55c0c095d318 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f318) (BuildI d: 07f667506452d6c467dbc06fd95191966d3e91b4) #1 0x7f186f939c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x55c0c0ae9b01 in qdict_new qobject/qdict.c:30:13 #3 0x55c0c0afc16c in parse_object qobject/json-parser.c:317:12 #4 0x55c0c0afb90f in parse_value qobject/json-parser.c:545:16 #5 0x55c0c0afb579 in json_parser_parse qobject/json-parser.c:579:14 #6 0x55c0c0afa21d in json_message_process_token qobject/json-streamer.c:92:12 #7 0x55c0c0bca2e5 in json_lexer_feed_char qobject/json-lexer.c:313:13 #8 0x55c0c0bc97ce in json_lexer_feed qobject/json-lexer.c:350:9 #9 0x55c0c0afabbc in json_message_parser_feed qobject/json-streamer.c:121:5 #10 0x55c0c09cbd52 in qmp_fd_receive tests/qtest/libqmp.c:86:9 #11 0x55c0c09be69b in qtest_qmp_receive_dict tests/qtest/libqtest.c:760:12 #12 0x55c0c09bca77 in qtest_qmp_receive tests/qtest/libqtest.c:741:27 #13 0x55c0c09bee9d in qtest_vqmp tests/qtest/libqtest.c:812:12 #14 0x55c0c09bd257 in qtest_qmp tests/qtest/libqtest.c:835:16 #15 0x55c0c0a87747 in multifd_mapped_ram_fdset_end tests/qtest/migration-test.c:2393:12 #16 0x55c0c0a85eb3 in test_file_common tests/qtest/migration-test.c:1978:9 #17 0x55c0c0a746a3 in test_multifd_file_mapped_ram_fdset tests/qtest/migration-test.c:2437:5 #18 0x55c0c0a93237 in migration_test_wrapper tests/qtest/migration-helpers.c:458:5 #19 0x7f186f958aed in test_case_run debian/build/deb/../../../glib/gtestutils.c:2930:15 #20 0x7f186f958aed in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3018:16 #21 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #22 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #23 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #24 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #25 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #26 0x7f186f958faa in g_test_run_suite debian/build/deb/../../../glib/gtestutils.c:3109:18 #27 0x7f186f959055 in g_test_run debian/build/deb/../../../glib/gtestutils.c:2231:7 #28 0x7f186f959055 in g_test_run debian/build/deb/../../../glib/gtestutils.c:2218:1 #29 0x55c0c0a6e427 in main tests/qtest/migration-test.c:4033:11 Unref the object after we've confirmed that it is what we expect. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>

We create a gnutls_x509_privkey_t in test_tls_init(), but forget to deinit it in test_tls_cleanup(), resulting in leaks reported in hte migration test such as: Indirect leak of 8 byte(s) in 1 object(s) allocated from: #0 0x55fa6d11c12e in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f12e) (BuildId: 852a267993587f557f50e5715f352f43720077ba) #1 0x7f073982685d in __gmp_default_allocate (/lib/x86_64-linux-gnu/libgmp.so.10+0xa85d) (BuildId: f110719303ddbea25a5e89ff730fec520eed67b0) #2 0x7f0739836193 in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x1a193) (BuildId: f110719303ddbea25a5e89ff730fec520eed67b0) #3 0x7f0739836594 in __gmpz_import (/lib/x86_64-linux-gnu/libgmp.so.10+0x1a594) (BuildId: f110719303ddbea25a5e89ff730fec520eed67b0) #4 0x7f07398a91ed in nettle_mpz_set_str_256_u (/lib/x86_64-linux-gnu/libhogweed.so.6+0xb1ed) (BuildId: 3cc4a3474de72db89e9dcc93bfb95fe377f48c37) #5 0x7f073a146a5a (/lib/x86_64-linux-gnu/libgnutls.so.30+0x131a5a) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #6 0x7f073a07192c (/lib/x86_64-linux-gnu/libgnutls.so.30+0x5c92c) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #7 0x7f073a078333 (/lib/x86_64-linux-gnu/libgnutls.so.30+0x63333) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #8 0x7f073a0e8353 (/lib/x86_64-linux-gnu/libgnutls.so.30+0xd3353) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #9 0x7f073a0ef0ac in gnutls_x509_privkey_import (/lib/x86_64-linux-gnu/libgnutls.so.30+0xda0ac) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #10 0x55fa6d2547e3 in test_tls_load_key tests/unit/crypto-tls-x509-helpers.c:99:11 #11 0x55fa6d25460c in test_tls_init tests/unit/crypto-tls-x509-helpers.c:128:15 #12 0x55fa6d2495c4 in test_migrate_tls_x509_start_common tests/qtest/migration-test.c:1044:5 #13 0x55fa6d24c23a in test_migrate_tls_x509_start_reject_anon_client tests/qtest/migration-test.c:1216:12 #14 0x55fa6d23fb40 in test_precopy_common tests/qtest/migration-test.c:1789:21 #15 0x55fa6d236b7c in test_precopy_tcp_tls_x509_reject_anon_client tests/qtest/migration-test.c:2614:5 (Oddly, there is no reported leak in the x509 unit tests, even though those also use test_tls_init() and test_tls_cleanup().) Deinit the privkey in test_tls_cleanup(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: > #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > <<EOF > {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done [1]: > #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 > #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 > #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 > #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 > #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 > #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 > #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 > #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 > #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Cc: qemu-stable@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240712140716.517911-1-f.ebner@proxmox.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

In regime_is_user() we assert if we're passed an ARMMMUIdx_E10_* mmuidx value. This used to make sense because we only used this function in ptw.c and would never use it on this kind of stage 1+2 mmuidx, only for an individual stage 1 or stage 2 mmuidx. However, when we implemented FEAT_E0PD we added a callsite in aa64_va_parameters(), which means this can now be called for stage 1+2 mmuidx values if the guest sets the TCG_ELX.{E0PD0,E0PD1} bits to enable use of the feature. This will then result in an assertion failure later, for instance on a TLBI operation: #6 0x00007ffff6d0e70f in g_assertion_message_expr (domain=0x0, file=0x55555676eeba "../../target/arm/internals.h", line=978, func=0x555556771d48 <__func__.5> "regime_is_user", expr=<optimised out>) at ../../../glib/gtestutils.c:3279 #7 0x0000555555f286d2 in regime_is_user (env=0x555557f2fe00, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #8 0x0000555555f3e31c in aa64_va_parameters (env=0x555557f2fe00, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #9 0x0000555555f3163b in tlbi_aa64_get_range (env=0x555557f2fe00, mmuidx=ARMMMUIdx_E10_0, value=106721347371041) at ../../target/arm/helper.c:5214 #10 0x0000555555f317e8 in do_rvae_write (env=0x555557f2fe00, value=106721347371041, idxmap=21, synced=true) at ../../target/arm/helper.c:5260 #11 0x0000555555f31925 in tlbi_aa64_rvae1is_write (env=0x555557f2fe00, ri=0x555557fbeae0, value=106721347371041) at ../../target/arm/helper.c:5302 #12 0x0000555556036f8f in helper_set_cp_reg64 (env=0x555557f2fe00, rip=0x555557fbeae0, value=106721347371041) at ../../target/arm/tcg/op_helper.c:965 Since we do know whether these mmuidx values are for usermode or not, we can easily make regime_is_user() handle them: ARMMMUIdx_E10_0 is user, and the other two are not. Cc: qemu-stable@nongnu.org Fixes: e4c93e4 ("target/arm: Implement FEAT_E0PD") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20241017172331.822587-1-peter.maydell@linaro.org

qemu-ga on a NetBSD -current VM terminates with a SIGSEGV upon receiving 'guest-set-time' command... Core was generated by `qemu-ga'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 88 *str[len] = '\0'; [Current thread is 1 (process 1112)] (gdb) bt #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 #1 0x000000000cd37b60 in ga_run_command (argv=argv@entry=0xffffff922a90, action=action@entry=0xcda34b8 "set hardware clock to system time", errp=errp@entry=0xffffff922a70, in_str=0x0) at ../qga/commands-posix.c:164 #2 0x000000000cd380c4 in qmp_guest_set_time (has_time=<optimized out>, time_ns=<optimized out>, errp=errp@entry=0xffffff922ad0) at ../qga/commands-posix.c:304 #3 0x000000000cd253d8 in qmp_marshal_guest_set_time (args=<optimized out>, ret=<optimized out>, errp=0xffffff922b48) at qga/qga-qapi-commands.c:193 #4 0x000000000cd4e71c in qmp_dispatch (cmds=cmds@entry=0xcdf5b18 <ga_commands>, request=request@entry=0xf3c711a4b000, allow_oob=allow_oob@entry=false, cur_mon=cur_mon@entry=0x0) at ../qapi/qmp-dispatch.c:220 #5 0x000000000cd36524 in process_event (opaque=0xf3c711a79000, obj=0xf3c711a4b000, err=0x0) at ../qga/main.c:677 #6 0x000000000cd526f0 in json_message_process_token (lexer=lexer@entry=0xf3c711a79018, input=0xf3c712072480, type=type@entry=JSON_RCURLY, x=28, y=1) at ../qobject/json-streamer.c:99 #7 0x000000000cd93860 in json_lexer_feed_char (lexer=lexer@entry=0xf3c711a79018, ch=125 '}', flush=flush@entry=false) at ../qobject/json-lexer.c:313 #8 0x000000000cd93a00 in json_lexer_feed (lexer=lexer@entry=0xf3c711a79018, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-lexer.c:350 #9 0x000000000cd5290c in json_message_parser_feed (parser=parser@entry=0xf3c711a79000, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-streamer.c:121 #10 0x000000000cd361fc in channel_event_cb (condition=<optimized out>, data=0xf3c711a79000) at ../qga/main.c:703 #11 0x000000000cd3710c in ga_channel_client_event (channel=<optimized out>, condition=<optimized out>, data=0xf3c711b2d300) at ../qga/channel-posix.c:94 #12 0x0000f3c7120d9bec in g_main_dispatch () from /usr/pkg/lib/libglib-2.0.so.0 #13 0x0000f3c7120dd25c in g_main_context_iterate_unlocked.constprop () from /usr/pkg/lib/libglib-2.0.so.0 #14 0x0000f3c7120ddbf0 in g_main_loop_run () from /usr/pkg/lib/libglib-2.0.so.0 #15 0x000000000cda00d8 in run_agent_once (s=0xf3c711a79000) at ../qga/main.c:1522 #16 run_agent (s=0xf3c711a79000) at ../qga/main.c:1559 #17 main (argc=<optimized out>, argv=<optimized out>) at ../qga/main.c:1671 (gdb) The commandline options used on the host machine... qemu-system-aarch64 \ -machine type=virt,pflash0=rom \ -m 8G \ -cpu host \ -smp 8 \ -accel hvf \ -device virtio-net-pci,netdev=unet \ -device virtio-blk-pci,drive=hd \ -drive file=netbsd.qcow2,if=none,id=hd \ -netdev user,id=unet,hostfwd=tcp::2223-:22 \ -object rng-random,filename=/dev/urandom,id=viornd0 \ -device virtio-rng-pci,rng=viornd0 \ -serial mon:stdio \ -display none \ -blockdev node-name=rom,driver=file,filename=/opt/homebrew/Cellar/qemu/9.0.2/share/qemu/edk2-aarch64-code.fd,read-only=true \ -chardev socket,path=/tmp/qga_netbsd.sock,server=on,wait=off,id=qga0 \ -device virtio-serial \ -device virtconsole,chardev=qga0,name=org.qemu.guest_agent.0 This patch rectifies the operator precedence while assigning the NUL terminator. Fixes: c3f32c1 Signed-off-by: Sunil Nimmagadda <sunil@nimmagadda.net> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/m15xppk9qg.fsf@nimmagadda.net Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com>

This was referenced Jul 7, 2017

fib6_walk_continue: Unable to handle kernel paging request at virtual address 12005456 openbmc/linux#101

Closed

Crash at login on Qemu openbmc/linux#123

Closed

Crash and panic on Qemu openbmc/linux#104

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fib6_walk_continue: Unable to handle kernel paging request at virtual address 12005456 #11

fib6_walk_continue: Unable to handle kernel paging request at virtual address 12005456 #11

legoater commented Jul 7, 2017

fib6_walk_continue: Unable to handle kernel paging request at virtual address 12005456 #11

fib6_walk_continue: Unable to handle kernel paging request at virtual address 12005456 #11

Comments

legoater commented Jul 7, 2017