Guest kernel memory corruption #9

amboar · 2017-07-04T07:00:05Z

Tracking issue for memory corruption seen under qemu but not on hardware.

Qemu: pull-target-arm-20170613-381-g016b07da3ebf, Kernel: v4.12-16-g05a8a98ab0ac

[    2.440000] ftgmac100 1e660000.ethernet eth0: NCSI interface up
[    2.500000] Unable to handle kernel paging request at virtual address 12005452
[    2.500000] pgd = 80004000
[    2.500000] [12005452] *pgd=00000000
[    2.500000] Internal error: Oops: 5 [#1] ARM
[    2.500000] Modules linked in:
[    2.500000] CPU: 0 PID: 341 Comm: ifconfig Not tainted 4.12.0-00016-g05a8a98ab0ac-dirty #1003
[    2.500000] Hardware name: Generic DT based system
[    2.500000] task: 9f527820 task.stack: 88832000
[    2.500000] PC is at unlink_anon_vmas+0x60/0x1e8
[    2.500000] LR is at unlink_anon_vmas+0xa0/0x1e8
[    2.500000] pc : [<800a5114>]    lr : [<800a5154>]    psr: 60000153
[    2.500000] sp : 88833ec0  ip : 00000100  fp : 1200544a
[    2.500000] r10: 9f551820  r9 : 804b2f17  r8 : 8880fb58
[    2.500000] r7 : 804e1b2c  r6 : 8880fb94  r5 : 9f551820  r4 : 1200544a
[    2.500000] r3 : 1200544a  r2 : 9f551820  r1 : 9f78a3c0  r0 : 00000000
[    2.500000] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[    2.500000] Control: 00c5387d  Table: 88834008  DAC: 00000055
[    2.500000] Process ifconfig (pid: 341, stack limit = 0x88832188)
[    2.500000] Stack: (0x88833ec0 to 0x88834000)
[    2.500000] 3ec0: 9f5e838c 00002000 00000000 8880fb58 8880ff20 76f56000 88833f18 00002000
[    2.500000] 3ee0: 00000000 00000000 00000000 80099ea8 76f56000 8009b230 00000000 9f790160
[    2.500000] 3f00: 9f54e1c0 88833f40 9f54e1c0 9f54e1f4 00000000 800a05e4 9f54e1c0 00000001
[    2.500000] 3f20: 00000000 00000000 ffffffff 9f035aa0 a0000153 0000009e 00000400 9f7f6000
[    2.500000] 3f40: 9f419570 9f036e20 00000008 800ac844 9f7748c0 800b2620 804b1220 9f527820
[    2.500000] 3f60: 9f54e1c0 00000000 ffffe000 80014acc 9f527820 ffffe000 ffffe000 800180f4
[    2.500000] 3f80: 7ef8ce60 8002d4b0 88832000 000000f8 00000000 8001961c 76fa4258 80019694
[    2.500000] 3fa0: 76fa4258 8000a320 76fa4258 00000000 00000000 00040308 00000000 76f7e430
[    2.500000] 3fc0: 76fa4258 00000000 7ef8ccb0 000000f8 000b5d5c 00000000 76fb5f70 00000000
[    2.500000] 3fe0: 76fa3f00 7ef8cca0 76f93018 76f63ca8 60000150 00000000 e7fddef0 e7fddef0
[    2.500000] [<800a5114>] (unlink_anon_vmas) from [<80099ea8>] (free_pgtables+0x78/0xcc)
[    2.500000] [<80099ea8>] (free_pgtables) from [<800a05e4>] (exit_mmap+0xec/0x1e8)
[    2.500000] [<800a05e4>] (exit_mmap) from [<80014acc>] (mmput+0x30/0xc8)
[    2.500000] [<80014acc>] (mmput) from [<800180f4>] (do_exit+0x264/0x97c)
[    2.500000] [<800180f4>] (do_exit) from [<8001961c>] (do_group_exit+0x3c/0xa4)
[    2.500000] [<8001961c>] (do_group_exit) from [<80019694>] (__wake_up_parent+0x0/0x18)
[    2.500000] Code: e584200c eb001dc3 e1a0300b e1a0400b (e5b3b008)
[    2.500000] ---[ end trace 0c960b21fc9b1257 ]---
[    2.500000] Fixing recursive fault but reboot is needed!

[    2.340000] ftgmac100 1e660000.ethernet eth0: NCSI interface up
[    2.340000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    2.340000] pgd = 88808000
[    2.340000] [00000000] *pgd=88818831, *pte=00000000, *ppte=00000000
[    2.340000] Internal error: Oops: 17 [#1] ARM
[    2.340000] Modules linked in:
[    2.340000] CPU: 0 PID: 339 Comm: default.script Not tainted 4.12.0-00016-g05a8a98ab0ac-dirty #1003
[    2.340000] Hardware name: Generic DT based system
[    2.340000] task: 9f577ba0 task.stack: 8882a000
[    2.340000] PC is at anon_vma_clone+0x58/0x1b8
[    2.340000] LR is at anon_vma_clone+0x4c/0x1b8
[    2.340000] pc : [<800a52f4>]    lr : [<800a52e8>]    psr: a0000153
[    2.340000] sp : 8882beb0  ip : 9f764b30  fp : 9f55b220
[    2.340000] r10: 00000000  r9 : 9f78f1f4  r8 : 804e1b2c
[    2.340000] r7 : 9f55b220  r6 : 8881ed10  r5 : 9f78562c  r4 : 888623a0
[    2.340000] r3 : 0000001d  r2 : 9fbdd61c  r1 : 00000021  r0 : 888623a0
[    2.340000] Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[    2.340000] Control: 00c5387d  Table: 88808008  DAC: 00000055
[    2.340000] Process default.script (pid: 339, stack limit = 0x8882a188)
[    2.340000] Stack: (0x8882beb0 to 0x8882c000)
[    2.340000] bea0:                                     76f04000 9f78f1b8 8881640c 8881ed10
[    2.340000] bec0: 9f78f1b8 8881ed10 9f5581c0 9f78f1b8 00000000 00000000 804d4340 800a5478
[    2.340000] bee0: 00000001 01200011 8881ed10 9f5581c0 9f78f1b8 8001577c 00000000 00000000
[    2.340000] bf00: 9f558020 9f5299f0 9f529840 00000001 8881edd4 8881edc8 8881edc0 8881edd0
[    2.340000] bf20: 9f558054 9f5581f4 8882bf28 8882bf28 000007ff 01200011 00000000 00000000
[    2.340000] bf40: 00000000 00000000 8882a000 00000000 7eb672dc 80016358 00000000 00000000
[    2.340000] bf60: ffffffff 00000000 00000000 00000000 000a617c 00000000 76f17068 76f04630
[    2.340000] bf80: 00000000 00000078 8000a4c4 8882a000 00000000 800166b8 76f17068 00000000
[    2.340000] bfa0: 00000001 8000a320 76f17068 76f04630 01200011 00000000 00000000 00000000
[    2.340000] bfc0: 76f17068 76f04630 00000000 00000078 019ff008 76f17490 76f03ef0 7eb672dc
[    2.340000] bfe0: 76f03f00 7eb672b0 76ef571c 76ef5770 60000150 01200011 e7fddef0 e7fddef0
[    2.340000] [<800a52f4>] (anon_vma_clone) from [<800a5478>] (anon_vma_fork+0x24/0x150)
[    2.340000] [<800a5478>] (anon_vma_fork) from [<8001577c>] (copy_process.part.3+0x980/0x1428)
[    2.340000] [<8001577c>] (copy_process.part.3) from [<80016358>] (_do_fork+0x94/0x320)
[    2.340000] [<80016358>] (_do_fork) from [<800166b8>] (SyS_clone+0x20/0x28)
[    2.340000] [<800166b8>] (SyS_clone) from [<8000a320>] (ret_fast_syscall+0x0/0x34)
[    2.340000] Code: eb001c51 e2504000 0a00002e e595a004 (e59ab000)
[    2.340000] ---[ end trace 09c7b1238c91347e ]---

The text was updated successfully, but these errors were encountered:

legoater · 2017-07-04T07:07:37Z

this looks like openbmc/linux#102. Did it ever occur on a real system ? I

Also, may be we should move a couple of issues to the openbmc/qemu repository. :

openbmc/linux#134
openbmc/linux#123
openbmc/linux#104
openbmc/linux#103
openbmc/linux#102
openbmc/linux#101

amboar · 2017-07-04T07:09:05Z

I haven't seen them occur on a real system, and yeah, we should probably move those issues across, but I'm not sure how to do that.

The following segfault is encountered if the NBD server closes the UNIX domain socket immediately after negotiation: Program terminated with signal SIGSEGV, Segmentation fault. #0 aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 441 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, (gdb) bt #0 0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441 openbmc#1 0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207 openbmc#2 0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237 openbmc#3 0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836 openbmc#4 0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f +lags=0) at block/io.c:1086 openbmc#5 0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182 openbmc#6 0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032 openbmc#7 0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079 openbmc#8 0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79 openbmc#9 0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6 The problem is that nbd_client_init() uses nbd_client_attach_aio_context() -> aio_co_schedule(new_context, client->read_reply_co). Execution of read_reply_co is deferred to a BH which doesn't run until later. In the mean time blk_co_preadv() can be called and nbd_coroutine_end() calls aio_wake() on read_reply_co. At this point in time read_reply_co's ctx isn't set because it has never been entered yet. This patch simplifies the nbd_co_send_request() -> nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just nbd_co_send_request() -> nbd_co_receive_reply(). The request is "ended" if an error occurs at any point. Callers no longer have to invoke nbd_coroutine_end(). This cleanup also eliminates the segfault because we don't call aio_co_schedule() to wake up s->read_reply_co if sending the request failed. It is only necessary to wake up s->read_reply_co if a reply was received. Note this only happens with UNIX domain sockets on Linux. It doesn't seem possible to reproduce this with TCP sockets. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20170829122745.14309-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>

if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu crash. The backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080, io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c: #2 0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438 #3 0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47 #4 0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327 at migration/qemu-file-channel.c:83 #5 0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299 #6 0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562 #7 0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575 #8 0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655 #9 0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126 #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366 #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1 #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6 #13 0x00007f96fe858760 in ?? () #14 0x0000000000000000 in ?? () RDMA QIOChannel not implement io_set_aio_fd_handler. so qio_channel_set_aio_fd_handler will access NULL pointer. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash. The backtrace is: (gdb) bt #0 0x00007fb20aba91d7 in raise () from /lib64/libc.so.6 #1 0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6 #2 0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460 #5 0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768) at migration/qemu-file-channel.c:83 #6 0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299 #7 0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562 #8 0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575 #9 0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647 #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794 #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504 #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6 This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine(). Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

Because RDMA QIOChannel not implement shutdown function, If the to_dst_file was set error, the return path thread will wait forever. and the migration thread will wait return path thread exit. the backtrace of return path thread is: (gdb) bt #0 0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6 #1 0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000) at qemu-timer.c:325 #2 0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000) at migration/rdma.c:1501 #3 0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000, byte_len=0x7ef7091d0640) at migration/rdma.c:1580 #4 0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000, head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726 #5 0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720, expecting=3) at migration/rdma.c:1903 #6 0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8, size=32768) at migration/rdma.c:2714 #7 0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232 #8 0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0) at migration/qemu-file.c:502 #9 0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515 #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591 #11 0x00000000006a46d3 in source_return_path_thread ( opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331 #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f372a77635d in clone () from /lib64/libc.so.6 the backtrace of migration thread is: (gdb) bt #0 0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>) at util/qemu-thread-posix.c:504 #2 0x00000000006a4bc5 in await_return_path_close_on_source ( ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460 #3 0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>, current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980) at migration/migration.c:1695 #4 0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1837 #5 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f372a77635d in clone () from /lib64/libc.so.6 Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>

tests/cdrom-test -p /x86_64/cdrom/boot/megasas Produces the following ASAN leak. ==25700==ERROR: LeakSanitizer: detected memory leaks Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7f06f8faac48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7f06f87a73c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x55a729f17738 in pci_dma_sglist_init /home/elmarco/src/qq/include/hw/pci/pci.h:818 #3 0x55a729f2a706 in megasas_map_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:698 #4 0x55a729f39421 in megasas_handle_dcmd /home/elmarco/src/qq/hw/scsi/megasas.c:1574 #5 0x55a729f3f70d in megasas_handle_frame /home/elmarco/src/qq/hw/scsi/megasas.c:1955 #6 0x55a729f40939 in megasas_mmio_write /home/elmarco/src/qq/hw/scsi/megasas.c:2119 #7 0x55a729f41102 in megasas_port_write /home/elmarco/src/qq/hw/scsi/megasas.c:2170 #8 0x55a729220e60 in memory_region_write_accessor /home/elmarco/src/qq/memory.c:527 #9 0x55a7292212b3 in access_with_adjusted_size /home/elmarco/src/qq/memory.c:594 #10 0x55a72922cf70 in memory_region_dispatch_write /home/elmarco/src/qq/memory.c:1473 #11 0x55a7290f5907 in flatview_write_continue /home/elmarco/src/qq/exec.c:3255 #12 0x55a7290f5ceb in flatview_write /home/elmarco/src/qq/exec.c:3294 #13 0x55a7290f6457 in address_space_write /home/elmarco/src/qq/exec.c:3384 #14 0x55a7290f64a8 in address_space_rw /home/elmarco/src/qq/exec.c:3395 #15 0x55a72929ecb0 in kvm_handle_io /home/elmarco/src/qq/accel/kvm/kvm-all.c:1729 #16 0x55a7292a0db5 in kvm_cpu_exec /home/elmarco/src/qq/accel/kvm/kvm-all.c:1969 #17 0x55a7291c4212 in qemu_kvm_cpu_thread_fn /home/elmarco/src/qq/cpus.c:1215 #18 0x55a72a966a6c in qemu_thread_start /home/elmarco/src/qq/util/qemu-thread-posix.c:504 #19 0x7f06ed486593 in start_thread (/lib64/libpthread.so.0+0x7593) Move the qemu_sglist_destroy() from megasas_complete_command() to megasas_unmap_frame(), so map/unmap are balanced. Signed-off-by: Marc-AndrÃ© Lureau <marcandre.lureau@redhat.com> Message-Id: <20180814141247.32336-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Spotted by ASAN doing some manual testing: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f5fcdc75e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f5fcd47241d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) #2 0x55f989be92ce in timer_new /home/elmarco/src/qq/include/qemu/timer.h:561 #3 0x55f989be92ff in timer_new_ms /home/elmarco/src/qq/include/qemu/timer.h:630 #4 0x55f989c0219d in hmp_migrate /home/elmarco/src/qq/hmp.c:2038 #5 0x55f98955927b in handle_hmp_command /home/elmarco/src/qq/monitor.c:3498 #6 0x55f98955fb8c in monitor_command_cb /home/elmarco/src/qq/monitor.c:4371 #7 0x55f98ad40f11 in readline_handle_byte /home/elmarco/src/qq/util/readline.c:393 #8 0x55f98955fa4f in monitor_read /home/elmarco/src/qq/monitor.c:4354 #9 0x55f98aae30d7 in qemu_chr_be_write_impl /home/elmarco/src/qq/chardev/char.c:175 #10 0x55f98aae317a in qemu_chr_be_write /home/elmarco/src/qq/chardev/char.c:187 #11 0x55f98aae940c in fd_chr_read /home/elmarco/src/qq/chardev/char-fd.c:66 #12 0x55f98ab63018 in qio_channel_fd_source_dispatch /home/elmarco/src/qq/io/channel-watch.c:84 #13 0x7f5fcd46c8ac in g_main_dispatch gmain.c:3177 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901134652.25884-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

In qemu_laio_process_completions_and_submit, the AioContext is acquired before the ioq_submit iteration and after qemu_laio_process_completions, but the latter is not thread safe either. This change avoids a number of random crashes when the Main Thread and an IO Thread collide processing completions for the same AioContext. This is an example of such crash: - The IO Thread is trying to acquire the AioContext at aio_co_enter, which evidences that it didn't lock it before: Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)): #0 0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0 #2 0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0) at ../nptl/pthread_mutex_lock.c:114 #3 0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66 #4 0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493 #5 0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478 #6 0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406 #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true) at util/aio-posix.c:693 #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64 #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308 #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 - The Main Thread is also processing completions from the same AioContext, and crashes due to failed assertion at util/iov.c:78: Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)): #0 0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90 #2 0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:92 #3 0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101 #4 0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78 #5 0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410 #6 0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 #8 0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670) at block/linux-aio.c:237 #9 0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272 #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497 #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574 #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false) at util/aio-posix.c:604 #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273 #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390 #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590 #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774 #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401 #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449 #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080) at block/commit.c:92 #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973 #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90 #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118 #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436 #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201 #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854 #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215 #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238 #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497 #30 0x00005631fbd81412 in main_loop () at vl.c:1866 #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4647 - A closer examination shows that s->io_q.in_flight appears to have gone backwards: (gdb) frame 7 #7 0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670) at block/linux-aio.c:222 222 qemu_laio_process_completion(laiocb); (gdb) p s $2 = (LinuxAioState *) 0x7fdfc0297670 (gdb) p *s $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0, in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0, sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241} (gdb) p/x s->io_q.in_flight $4 = 0xfffffff0 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Spotted by ASAN while running: $ tests/migration-test -p /x86_64/migration/postcopy/recovery ================================================================= ==18034==ERROR: LeakSanitizer: detected memory leaks Direct leak of 33864 byte(s) in 1 object(s) allocated from: #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183 #3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159 #4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78 #5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295 #6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111 #7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91 #8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108 #9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158 #10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89 #11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Recently, the test case has started failing because some job related functions want to drop the AioContext lock even though it hasn't been taken: (gdb) bt #0 0x00007f51c067c9fb in raise () from /lib64/libc.so.6 #1 0x00007f51c067e77d in abort () from /lib64/libc.so.6 #2 0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 #4 0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 #5 0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738 #6 0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986 #7 0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941 #8 0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148 #9 0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 #12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783 #13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402 #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 #15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478 #16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0 #20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 #21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581 It is yet unclear whether this should really be considered a bug in the test case or whether blk_unref() should work for callers that haven't taken the AioContext lock, but in order to fix the build tests quickly, just take the AioContext lock around blk_unref(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Spotted by ASAN: ================================================================= ==11893==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1120 byte(s) in 28 object(s) allocated from: #0 0x7fd0515b0c48 in malloc (/lib64/libasan.so.5+0xeec48) #1 0x7fd050ffa3c5 in g_malloc (/lib64/libglib-2.0.so.0+0x523c5) #2 0x559e708b56a4 in qstring_from_str /home/elmarco/src/qq/qobject/qstring.c:66 #3 0x559e708b4fe0 in qstring_new /home/elmarco/src/qq/qobject/qstring.c:23 #4 0x559e708bda7d in parse_string /home/elmarco/src/qq/qobject/json-parser.c:143 #5 0x559e708c1009 in parse_literal /home/elmarco/src/qq/qobject/json-parser.c:484 #6 0x559e708c1627 in parse_value /home/elmarco/src/qq/qobject/json-parser.c:547 #7 0x559e708c1c67 in json_parser_parse /home/elmarco/src/qq/qobject/json-parser.c:573 #8 0x559e708bc0ff in json_message_process_token /home/elmarco/src/qq/qobject/json-streamer.c:92 #9 0x559e708d1655 in json_lexer_feed_char /home/elmarco/src/qq/qobject/json-lexer.c:292 #10 0x559e708d1fe1 in json_lexer_feed /home/elmarco/src/qq/qobject/json-lexer.c:339 #11 0x559e708bc856 in json_message_parser_feed /home/elmarco/src/qq/qobject/json-streamer.c:121 #12 0x559e708b8b4b in qobject_from_jsonv /home/elmarco/src/qq/qobject/qjson.c:69 #13 0x559e708b8d02 in qobject_from_json /home/elmarco/src/qq/qobject/qjson.c:83 #14 0x559e708a74ae in from_json_str /home/elmarco/src/qq/tests/check-qjson.c:30 #15 0x559e708a9f83 in utf8_string /home/elmarco/src/qq/tests/check-qjson.c:781 #16 0x7fd05101bc49 in test_case_run gtestutils.c:2255 #17 0x7fd05101bc49 in g_test_run_suite_internal gtestutils.c:2339 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180901211917.10372-1-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Let start from the beginning: Commit b9e413d (in 2.9) "block: explicitly acquire aiocontext in aio callbacks that need it" added pairs of aio_context_acquire/release to mirror_write_complete and mirror_read_complete, when they were aio callbacks for blk_aio_* calls. Then, commit 2e1990b (in 3.0) "block/mirror: Convert to coroutines" dropped these blk_aio_* calls, than mirror_write_complete and mirror_read_complete are not callbacks more, and don't need additional aiocontext acquiring. Furthermore, mirror_read_complete calls blk_co_pwritev inside these pair of aio_context_acquire/release, which leads to the following dead-lock with mirror: (gdb) info thr Id Target Id Frame 3 Thread (LWP 145412) "qemu-system-x86" syscall () 2 Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait () * 1 Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait () (gdb) bt #0 __lll_lock_wait () #1 _L_lock_812 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at util/qemu-thread-posix.c:66 #4 qemu_mutex_lock_iothread_impl #5 os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236 #6 main_loop_wait (nonblocking=0) at util/main-loop.c:497 #7 main_loop () at vl.c:1892 #8 main Printing contents of qemu_global_mutex, I see that "__owner = 145416", so, thr1 is main loop, and now it wants BQL, which is owned by thr2. (gdb) thr 2 (gdb) bt #0 __lll_lock_wait () #1 _L_lock_870 () #2 __GI___pthread_mutex_lock #3 qemu_mutex_lock_impl (mutex=0x561034d25dc0, ... #4 aio_context_acquire (ctx=0x561034d25d60) #5 dma_blk_cb #6 dma_blk_io #7 dma_blk_read #8 ide_dma_cb #9 bmdma_cmd_writeb #10 bmdma_write #11 memory_region_write_accessor #12 access_with_adjusted_size #15 flatview_write #16 address_space_write #17 address_space_rw #18 kvm_handle_io #19 kvm_cpu_exec #20 qemu_kvm_cpu_thread_fn #21 qemu_thread_start #22 start_thread #23 clone () Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio context mutex, which is owned by thr1. Classic dead-lock. Then, let's check that aio context is hold by mirror coroutine: just print coroutine stack of first tracked request in mirror job target: (gdb) [...] (gdb) qemu coroutine 0x561035dd0860 #0 qemu_coroutine_switch #1 qemu_coroutine_yield #2 qemu_co_mutex_lock_slowpath #3 qemu_co_mutex_lock #4 qcow2_co_pwritev #5 bdrv_driver_pwritev #6 bdrv_aligned_pwritev #7 bdrv_co_pwritev #8 blk_co_pwritev #9 mirror_read_complete () at block/mirror.c:232 #10 mirror_co_read () at block/mirror.c:370 #11 coroutine_trampoline #12 __start_context Yes it is mirror_read_complete calling blk_co_pwritev after acquiring aio context. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: > #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > <<EOF > {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done [1]: > #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 > #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 > #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 > #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 > #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 > #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 > #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 > #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 > #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Cc: qemu-stable@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240712140716.517911-1-f.ebner@proxmox.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> (cherry picked from commit 6475155) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

In regime_is_user() we assert if we're passed an ARMMMUIdx_E10_* mmuidx value. This used to make sense because we only used this function in ptw.c and would never use it on this kind of stage 1+2 mmuidx, only for an individual stage 1 or stage 2 mmuidx. However, when we implemented FEAT_E0PD we added a callsite in aa64_va_parameters(), which means this can now be called for stage 1+2 mmuidx values if the guest sets the TCG_ELX.{E0PD0,E0PD1} bits to enable use of the feature. This will then result in an assertion failure later, for instance on a TLBI operation: #6 0x00007ffff6d0e70f in g_assertion_message_expr (domain=0x0, file=0x55555676eeba "../../target/arm/internals.h", line=978, func=0x555556771d48 <__func__.5> "regime_is_user", expr=<optimised out>) at ../../../glib/gtestutils.c:3279 #7 0x0000555555f286d2 in regime_is_user (env=0x555557f2fe00, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #8 0x0000555555f3e31c in aa64_va_parameters (env=0x555557f2fe00, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #9 0x0000555555f3163b in tlbi_aa64_get_range (env=0x555557f2fe00, mmuidx=ARMMMUIdx_E10_0, value=106721347371041) at ../../target/arm/helper.c:5214 #10 0x0000555555f317e8 in do_rvae_write (env=0x555557f2fe00, value=106721347371041, idxmap=21, synced=true) at ../../target/arm/helper.c:5260 #11 0x0000555555f31925 in tlbi_aa64_rvae1is_write (env=0x555557f2fe00, ri=0x555557fbeae0, value=106721347371041) at ../../target/arm/helper.c:5302 #12 0x0000555556036f8f in helper_set_cp_reg64 (env=0x555557f2fe00, rip=0x555557fbeae0, value=106721347371041) at ../../target/arm/tcg/op_helper.c:965 Since we do know whether these mmuidx values are for usermode or not, we can easily make regime_is_user() handle them: ARMMMUIdx_E10_0 is user, and the other two are not. Cc: qemu-stable@nongnu.org Fixes: e4c93e4 ("target/arm: Implement FEAT_E0PD") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20241017172331.822587-1-peter.maydell@linaro.org (cherry picked from commit 1505b65) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

qemu-ga on a NetBSD -current VM terminates with a SIGSEGV upon receiving 'guest-set-time' command... Core was generated by `qemu-ga'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 88 *str[len] = '\0'; [Current thread is 1 (process 1112)] (gdb) bt #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 #1 0x000000000cd37b60 in ga_run_command (argv=argv@entry=0xffffff922a90, action=action@entry=0xcda34b8 "set hardware clock to system time", errp=errp@entry=0xffffff922a70, in_str=0x0) at ../qga/commands-posix.c:164 #2 0x000000000cd380c4 in qmp_guest_set_time (has_time=<optimized out>, time_ns=<optimized out>, errp=errp@entry=0xffffff922ad0) at ../qga/commands-posix.c:304 #3 0x000000000cd253d8 in qmp_marshal_guest_set_time (args=<optimized out>, ret=<optimized out>, errp=0xffffff922b48) at qga/qga-qapi-commands.c:193 #4 0x000000000cd4e71c in qmp_dispatch (cmds=cmds@entry=0xcdf5b18 <ga_commands>, request=request@entry=0xf3c711a4b000, allow_oob=allow_oob@entry=false, cur_mon=cur_mon@entry=0x0) at ../qapi/qmp-dispatch.c:220 #5 0x000000000cd36524 in process_event (opaque=0xf3c711a79000, obj=0xf3c711a4b000, err=0x0) at ../qga/main.c:677 #6 0x000000000cd526f0 in json_message_process_token (lexer=lexer@entry=0xf3c711a79018, input=0xf3c712072480, type=type@entry=JSON_RCURLY, x=28, y=1) at ../qobject/json-streamer.c:99 #7 0x000000000cd93860 in json_lexer_feed_char (lexer=lexer@entry=0xf3c711a79018, ch=125 '}', flush=flush@entry=false) at ../qobject/json-lexer.c:313 #8 0x000000000cd93a00 in json_lexer_feed (lexer=lexer@entry=0xf3c711a79018, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-lexer.c:350 #9 0x000000000cd5290c in json_message_parser_feed (parser=parser@entry=0xf3c711a79000, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-streamer.c:121 #10 0x000000000cd361fc in channel_event_cb (condition=<optimized out>, data=0xf3c711a79000) at ../qga/main.c:703 #11 0x000000000cd3710c in ga_channel_client_event (channel=<optimized out>, condition=<optimized out>, data=0xf3c711b2d300) at ../qga/channel-posix.c:94 #12 0x0000f3c7120d9bec in g_main_dispatch () from /usr/pkg/lib/libglib-2.0.so.0 #13 0x0000f3c7120dd25c in g_main_context_iterate_unlocked.constprop () from /usr/pkg/lib/libglib-2.0.so.0 #14 0x0000f3c7120ddbf0 in g_main_loop_run () from /usr/pkg/lib/libglib-2.0.so.0 #15 0x000000000cda00d8 in run_agent_once (s=0xf3c711a79000) at ../qga/main.c:1522 #16 run_agent (s=0xf3c711a79000) at ../qga/main.c:1559 #17 main (argc=<optimized out>, argv=<optimized out>) at ../qga/main.c:1671 (gdb) The commandline options used on the host machine... qemu-system-aarch64 \ -machine type=virt,pflash0=rom \ -m 8G \ -cpu host \ -smp 8 \ -accel hvf \ -device virtio-net-pci,netdev=unet \ -device virtio-blk-pci,drive=hd \ -drive file=netbsd.qcow2,if=none,id=hd \ -netdev user,id=unet,hostfwd=tcp::2223-:22 \ -object rng-random,filename=/dev/urandom,id=viornd0 \ -device virtio-rng-pci,rng=viornd0 \ -serial mon:stdio \ -display none \ -blockdev node-name=rom,driver=file,filename=/opt/homebrew/Cellar/qemu/9.0.2/share/qemu/edk2-aarch64-code.fd,read-only=true \ -chardev socket,path=/tmp/qga_netbsd.sock,server=on,wait=off,id=qga0 \ -device virtio-serial \ -device virtconsole,chardev=qga0,name=org.qemu.guest_agent.0 This patch rectifies the operator precedence while assigning the NUL terminator. Fixes: c3f32c1 Signed-off-by: Sunil Nimmagadda <sunil@nimmagadda.net> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/m15xppk9qg.fsf@nimmagadda.net Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com> (cherry picked from commit 9cfe110) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

…et_end() In multifd_mapped_ram_fdset_end() we call qtest_qmp() but forgot to unref the response QDict we get back, which means it is leaked: Indirect leak of 4120 byte(s) in 1 object(s) allocated from: #0 0x55c0c095d318 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f318) (BuildI d: 07f667506452d6c467dbc06fd95191966d3e91b4) #1 0x7f186f939c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x55c0c0ae9b01 in qdict_new qobject/qdict.c:30:13 #3 0x55c0c0afc16c in parse_object qobject/json-parser.c:317:12 #4 0x55c0c0afb90f in parse_value qobject/json-parser.c:545:16 #5 0x55c0c0afb579 in json_parser_parse qobject/json-parser.c:579:14 #6 0x55c0c0afa21d in json_message_process_token qobject/json-streamer.c:92:12 #7 0x55c0c0bca2e5 in json_lexer_feed_char qobject/json-lexer.c:313:13 #8 0x55c0c0bc97ce in json_lexer_feed qobject/json-lexer.c:350:9 #9 0x55c0c0afabbc in json_message_parser_feed qobject/json-streamer.c:121:5 #10 0x55c0c09cbd52 in qmp_fd_receive tests/qtest/libqmp.c:86:9 #11 0x55c0c09be69b in qtest_qmp_receive_dict tests/qtest/libqtest.c:760:12 #12 0x55c0c09bca77 in qtest_qmp_receive tests/qtest/libqtest.c:741:27 #13 0x55c0c09bee9d in qtest_vqmp tests/qtest/libqtest.c:812:12 #14 0x55c0c09bd257 in qtest_qmp tests/qtest/libqtest.c:835:16 #15 0x55c0c0a87747 in multifd_mapped_ram_fdset_end tests/qtest/migration-test.c:2393:12 #16 0x55c0c0a85eb3 in test_file_common tests/qtest/migration-test.c:1978:9 #17 0x55c0c0a746a3 in test_multifd_file_mapped_ram_fdset tests/qtest/migration-test.c:2437:5 #18 0x55c0c0a93237 in migration_test_wrapper tests/qtest/migration-helpers.c:458:5 #19 0x7f186f958aed in test_case_run debian/build/deb/../../../glib/gtestutils.c:2930:15 #20 0x7f186f958aed in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3018:16 #21 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #22 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #23 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #24 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #25 0x7f186f95880a in g_test_run_suite_internal debian/build/deb/../../../glib/gtestutils.c:3035:18 #26 0x7f186f958faa in g_test_run_suite debian/build/deb/../../../glib/gtestutils.c:3109:18 #27 0x7f186f959055 in g_test_run debian/build/deb/../../../glib/gtestutils.c:2231:7 #28 0x7f186f959055 in g_test_run debian/build/deb/../../../glib/gtestutils.c:2218:1 #29 0x55c0c0a6e427 in main tests/qtest/migration-test.c:4033:11 Unref the object after we've confirmed that it is what we expect. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>

In migrate_get_socket_address() we leak the SocketAddressList: (cd build/asan && \ ASAN_OPTIONS="fast_unwind_on_malloc=0:strip_path_prefix=/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../" QTEST_QEMU_BINARY=./qemu-system-x86_64 \ ./tests/qtest/migration-test --tap -k -p /x86_64/migration/multifd/tcp/tls/psk/match ) [...] Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x563d7f22f318 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f318) (BuildId: 2ad6282fb5d076c863ab87f41a345d46dc965ded) #1 0x7f9de3b39c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x563d7f3a119c in qobject_input_start_list qapi/qobject-input-visitor.c:336:17 #3 0x563d7f390fbf in visit_start_list qapi/qapi-visit-core.c:80:10 #4 0x563d7f3882ef in visit_type_SocketAddressList /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qapi/qapi-visit-sockets.c:519:10 #5 0x563d7f3658c9 in migrate_get_socket_address tests/qtest/migration-helpers.c:97:5 #6 0x563d7f362e24 in migrate_get_connect_uri tests/qtest/migration-helpers.c:111:13 #7 0x563d7f362bb2 in migrate_qmp tests/qtest/migration-helpers.c:222:23 #8 0x563d7f3533cd in test_precopy_common tests/qtest/migration-test.c:1817:5 #9 0x563d7f34dc1c in test_multifd_tcp_tls_psk_match tests/qtest/migration-test.c:3185:5 #10 0x563d7f365337 in migration_test_wrapper tests/qtest/migration-helpers.c:458:5 The code fishes out the SocketAddress from the list to return it, and the callers are freeing that, but nothing frees the list. Since this function is called in only two places, the simple fix is to make it return the SocketAddressList rather than just a SocketAddress, and then the callers can easily access the SocketAddress, and free the whole SocketAddressList when they're done. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>

In the migration test we create several TLS certificates with the TLS_* macros from crypto-tls-x509-helpers.h. These macros create both a QCryptoTLSCertReq object which must be deinitialized and also an on-disk certificate file. The migration test currently removes the on-disk file in test_migrate_tls_x509_finish() but never deinitializes the QCryptoTLSCertReq, which means that memory allocated as part of it is leaked: Indirect leak of 2 byte(s) in 1 object(s) allocated from: #0 0x5558ba33712e in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f12e) (BuildId: 4c8618f663e538538cad19d35233124cea161491) #1 0x7f64afc131f4 (/lib/x86_64-linux-gnu/libtasn1.so.6+0x81f4) (BuildId: 2fde6ecb43c586fe4077118f771077aa1298e7ea) #2 0x7f64afc18d58 in asn1_write_value (/lib/x86_64-linux-gnu/libtasn1.so.6+0xdd58) (BuildId: 2fde6ecb43c586fe4077118f771077aa1298e7ea) #3 0x7f64af8fc678 in gnutls_x509_crt_set_version (/lib/x86_64-linux-gnu/libgnutls.so.30+0xe7678) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #4 0x5558ba470035 in test_tls_generate_cert tests/unit/crypto-tls-x509-helpers.c:234:5 #5 0x5558ba464e4a in test_migrate_tls_x509_start_common tests/qtest/migration-test.c:1058:5 #6 0x5558ba462c8a in test_migrate_tls_x509_start_default_host tests/qtest/migration-test.c:1123:12 #7 0x5558ba45ab40 in test_precopy_common tests/qtest/migration-test.c:1786:21 #8 0x5558ba450015 in test_precopy_unix_tls_x509_default_host tests/qtest/migration-test.c:2077:5 #9 0x5558ba46d3c7 in migration_test_wrapper tests/qtest/migration-helpers.c:456:5 (and similar reports). The only function currently provided to deinit a QCryptoTLSCertReq is test_tls_discard_cert(), which also removes the on-disk certificate file. For the migration tests we need to retain the on-disk files until we've finished running the test, so the simplest fix is to provide a new function test_tls_deinit_cert() which does only the cleanup of the QCryptoTLSCertReq, and call it in the right places. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>

We create a gnutls_x509_privkey_t in test_tls_init(), but forget to deinit it in test_tls_cleanup(), resulting in leaks reported in hte migration test such as: Indirect leak of 8 byte(s) in 1 object(s) allocated from: #0 0x55fa6d11c12e in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/tests/qtest/migration-test+0x22f12e) (BuildId: 852a267993587f557f50e5715f352f43720077ba) #1 0x7f073982685d in __gmp_default_allocate (/lib/x86_64-linux-gnu/libgmp.so.10+0xa85d) (BuildId: f110719303ddbea25a5e89ff730fec520eed67b0) #2 0x7f0739836193 in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x1a193) (BuildId: f110719303ddbea25a5e89ff730fec520eed67b0) #3 0x7f0739836594 in __gmpz_import (/lib/x86_64-linux-gnu/libgmp.so.10+0x1a594) (BuildId: f110719303ddbea25a5e89ff730fec520eed67b0) #4 0x7f07398a91ed in nettle_mpz_set_str_256_u (/lib/x86_64-linux-gnu/libhogweed.so.6+0xb1ed) (BuildId: 3cc4a3474de72db89e9dcc93bfb95fe377f48c37) #5 0x7f073a146a5a (/lib/x86_64-linux-gnu/libgnutls.so.30+0x131a5a) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #6 0x7f073a07192c (/lib/x86_64-linux-gnu/libgnutls.so.30+0x5c92c) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #7 0x7f073a078333 (/lib/x86_64-linux-gnu/libgnutls.so.30+0x63333) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #8 0x7f073a0e8353 (/lib/x86_64-linux-gnu/libgnutls.so.30+0xd3353) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #9 0x7f073a0ef0ac in gnutls_x509_privkey_import (/lib/x86_64-linux-gnu/libgnutls.so.30+0xda0ac) (BuildId: 97b8f99f392f1fd37b969a7164bcea884e23649b) #10 0x55fa6d2547e3 in test_tls_load_key tests/unit/crypto-tls-x509-helpers.c:99:11 #11 0x55fa6d25460c in test_tls_init tests/unit/crypto-tls-x509-helpers.c:128:15 #12 0x55fa6d2495c4 in test_migrate_tls_x509_start_common tests/qtest/migration-test.c:1044:5 #13 0x55fa6d24c23a in test_migrate_tls_x509_start_reject_anon_client tests/qtest/migration-test.c:1216:12 #14 0x55fa6d23fb40 in test_precopy_common tests/qtest/migration-test.c:1789:21 #15 0x55fa6d236b7c in test_precopy_tcp_tls_x509_reject_anon_client tests/qtest/migration-test.c:2614:5 (Oddly, there is no reported leak in the x509 unit tests, even though those also use test_tls_init() and test_tls_cleanup().) Deinit the privkey in test_tls_cleanup(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Since the TYPE_XNLX_VERSAL_CFU_FDRO device creates a FIFO in its instance_init method, we must destroy the FIFO in instance_finalize to avoid a memory leak for the QOM introspection "instantiate-examine-finalize" cycle: Direct leak of 8192 byte(s) in 1 object(s) allocated from: #0 0x55ec89eae7ee in malloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-aarch64+0x294d7ee) (BuildId: 6d508874816cc47d17c8dd775e8f809ae520e8cb) #1 0x7f697018f738 in g_malloc debian/build/deb/../../../glib/gmem.c:128:13 #2 0x55ec8d98d98d in fifo8_create util/fifo8.c:27:18 #3 0x55ec8aa2a624 in fifo32_create /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/include/qemu/fifo32.h:35:5 #4 0x55ec8aa2a33c in cfu_fdro_init hw/misc/xlnx-versal-cfu.c:397:5 #5 0x55ec8ce75da1 in object_init_with_type qom/object.c:420:9 #6 0x55ec8ce5d07b in object_initialize_with_type qom/object.c:562:5 #7 0x55ec8ce5e91d in object_new_with_type qom/object.c:782:5 #8 0x55ec8ce5e9f1 in object_new qom/object.c:797:12 #9 0x55ec8d65c81d in qmp_device_list_properties qom/qom-qmp-cmds.c:144:11 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20240822162127.705879-2-peter.maydell@linaro.org

The TYPE_XLNX_VERSAL_TRNG device creates s->prng with g_rand_new() in its init method, but it frees it in its unrealize method. This results in a leak in the QOM introspection "initialize-inspect-finalize" lifecycle: Direct leak of 2500 byte(s) in 1 object(s) allocated from: #0 0x55ec89eae9d8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-aarch64+0x294d9d8) (BuildId: 6d5 08874816cc47d17c8dd775e8f809ae520e8cb) #1 0x7f697018fc50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x7f6970197738 in g_rand_new_with_seed_array debian/build/deb/../../../glib/grand.c:202:17 #3 0x7f6970197816 in g_rand_new debian/build/deb/../../../glib/grand.c:286:10 #4 0x55ec8aa3656a in trng_init hw/misc/xlnx-versal-trng.c:624:15 #5 0x55ec8ce75da1 in object_init_with_type qom/object.c:420:9 #6 0x55ec8ce5d07b in object_initialize_with_type qom/object.c:562:5 #7 0x55ec8ce5e91d in object_new_with_type qom/object.c:782:5 #8 0x55ec8ce5e9f1 in object_new qom/object.c:797:12 #9 0x55ec8d65c81d in qmp_device_list_properties qom/qom-qmp-cmds.c:144:11 Move the free to finalize so it matches where we are initing s->prng. Since that's the only thing our unrealize method was doing, this essentially switches the whole function to be a finalize implementation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20240822162127.705879-3-peter.maydell@linaro.org

The TYPE_XLNX_BBRAM device creates a register block with register_init_block32() in its instance_init method; we must therefore destroy it in our instance_finalize method to avoid a leak in the QOM introspection "init-inspect-finalize" lifecycle: Direct leak of 304 byte(s) in 1 object(s) allocated from: #0 0x5641518ca9d8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-aarch64+0x294d9d8) (BuildId: 4a6 18cb63d57d5a19ed45cfc262b08da47eaafe5) #1 0x7ff1aab31c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x564151cffc5d in register_init_block hw/core/register.c:248:34 #3 0x564151d006be in register_init_block32 hw/core/register.c:299:12 #4 0x56415293df75 in bbram_ctrl_init hw/nvram/xlnx-bbram.c:462:9 #5 0x564154891dc1 in object_init_with_type qom/object.c:420:9 #6 0x56415487909b in object_initialize_with_type qom/object.c:562:5 #7 0x56415487a93d in object_new_with_type qom/object.c:782:5 #8 0x56415487aa11 in object_new qom/object.c:797:12 #9 0x56415507883d in qmp_device_list_properties qom/qom-qmp-cmds.c:144:11 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20240822162127.705879-4-peter.maydell@linaro.org

The TYPE_XLNX_ZYNQMP_EFUSE device creates a register block with register_init_block32() in its instance_init method; we must therefore destroy it in our instance_finalize method to avoid a leak in the QOM introspection "init-inspect-finalize" lifecycle: Direct leak of 304 byte(s) in 1 object(s) allocated from: #0 0x55f3ff5839d8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-aarch64+0x294d9d8) (BuildId: 23cf931c66865a71b6cc4da95156d03bc106fa72) #1 0x7f3f31c6bc50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x55f3ff9b8c5d in register_init_block hw/core/register.c:248:34 #3 0x55f3ff9b96be in register_init_block32 hw/core/register.c:299:12 #4 0x55f4005e5b25 in efuse_ctrl_init hw/nvram/xlnx-versal-efuse-ctrl.c:718:9 #5 0x55f40254afb1 in object_init_with_type qom/object.c:420:9 #6 0x55f40253228b in object_initialize_with_type qom/object.c:562:5 #7 0x55f402533b2d in object_new_with_type qom/object.c:782:5 #8 0x55f402533c01 in object_new qom/object.c:797:12 #9 0x55f402d31a2d in qmp_device_list_properties qom/qom-qmp-cmds.c:144:11 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20240822162127.705879-5-peter.maydell@linaro.org

The TYPE_XLNX_VERSAL_TRNG device creates a register block with register_init_block32() in its instance_init method; we must therefore destroy it in our instance_finalize method to avoid a leak in the QOM introspection "init-inspect-finalize" lifecycle: Direct leak of 304 byte(s) in 1 object(s) allocated from: #0 0x55842ec799d8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-aarch64+0x294e9d8) (BuildId: 47496e53f3e779f1c7e9b82cbea07407152b498b) #1 0x7fe793c75c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x55842f0aec5d in register_init_block hw/core/register.c:248:34 #3 0x55842f0af6be in register_init_block32 hw/core/register.c:299:12 #4 0x55842f801588 in trng_init hw/misc/xlnx-versal-trng.c:614:9 #5 0x558431c411a1 in object_init_with_type qom/object.c:420:9 #6 0x558431c2847b in object_initialize_with_type qom/object.c:562:5 #7 0x558431c29d1d in object_new_with_type qom/object.c:782:5 #8 0x558431c29df1 in object_new qom/object.c:797:12 #9 0x558432427c1d in qmp_device_list_properties qom/qom-qmp-cmds.c:144:11 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20240822162127.705879-6-peter.maydell@linaro.org

The TYPE_XLNX_VERSAL_EFUSE_CTRL device creates a register block with register_init_block32() in its instance_init method; we must therefore destroy it in our instance_finalize method to avoid a leak in the QOM introspection "init-inspect-finalize" lifecycle: Direct leak of 304 byte(s) in 1 object(s) allocated from: #0 0x55f222b5b9d8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-aarch64+0x294e9d8) (BuildId: 420 43d49e1139e3f3071b1f22fac1e3e7249c9a6) #1 0x7fbb10669c50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x55f222f90c5d in register_init_block hw/core/register.c:248:34 #3 0x55f222f916be in register_init_block32 hw/core/register.c:299:12 #4 0x55f223bbdd15 in efuse_ctrl_init hw/nvram/xlnx-versal-efuse-ctrl.c:718:9 #5 0x55f225b23391 in object_init_with_type qom/object.c:420:9 #6 0x55f225b0a66b in object_initialize_with_type qom/object.c:562:5 #7 0x55f225b0bf0d in object_new_with_type qom/object.c:782:5 #8 0x55f225b0bfe1 in object_new qom/object.c:797:12 #9 0x55f226309e0d in qmp_device_list_properties qom/qom-qmp-cmds.c:144:11 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@amd.com> Reviewed-by: Francisco Iglesias <francisco.iglesias@amd.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20240822162127.705879-7-peter.maydell@linaro.org

Allow overlapping request by removing the assert that made it impossible. There are only two callers: 1. block_copy_task_create() It already asserts the very same condition before calling reqlist_init_req(). 2. cbw_snapshot_read_lock() There is no need to have read requests be non-overlapping in copy-before-write when used for snapshot-access. In fact, there was no protection against two callers of cbw_snapshot_read_lock() calling reqlist_init_req() with overlapping ranges and this could lead to an assertion failure [1]. In particular, with the reproducer script below [0], two cbw_co_snapshot_block_status() callers could race, with the second calling reqlist_init_req() before the first one finishes and removes its conflicting request. [0]: > #!/bin/bash -e > dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024 > ./qemu-img create /tmp/fleecing.raw -f raw 1G > ( > ./qemu-system-x86_64 --qmp stdio \ > --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \ > --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw \ > <<EOF > {"execute": "qmp_capabilities"} > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } } > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } } > {"execute": "nbd-server-start", "arguments": {"addr": { "type": "unix", "data": { "path": "/tmp/nbd.socket" } } } } > {"execute": "block-export-add", "arguments": {"id": "exp0", "node-name": "snap0", "type": "nbd", "name": "exp0"}} > EOF > ) & > sleep 5 > while true; do > ./qemu-nbd -d /dev/nbd0 > ./qemu-nbd -c /dev/nbd0 nbd:unix:/tmp/nbd.socket:exportname=exp0 -f raw -r > nbdinfo --map 'nbd+unix:///exp0?socket=/tmp/nbd.socket' > done [1]: > #5 0x000071e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101 > #6 0x0000615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23 > #7 0x00006152853e2d98 in cbw_snapshot_read_lock (...) at ../block/copy-before-write.c:237 > #8 0x00006152853e3068 in cbw_co_snapshot_block_status (...) at ../block/copy-before-write.c:304 > #9 0x00006152853f4d22 in bdrv_co_snapshot_block_status (...) at ../block/io.c:3726 > #10 0x000061528543a63e in snapshot_access_co_block_status (...) at ../block/snapshot-access.c:48 > #11 0x00006152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474 > #12 0x00006152853f2016 in bdrv_co_common_block_status_above (...) at ../block/io.c:2652 > #13 0x00006152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732 > #14 0x00006152853d9a86 in blk_co_block_status_above (...) at ../block/block-backend.c:1473 > #15 0x000061528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374 > #16 0x000061528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481 > #17 0x000061528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978 > #18 0x000061528538f906 in nbd_trip (...) at ../nbd/server.c:3121 > #19 0x00006152855a7caf in coroutine_trampoline (...) at ../util/coroutine-ucontext.c:175 Cc: qemu-stable@nongnu.org Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20240712140716.517911-1-f.ebner@proxmox.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

In regime_is_user() we assert if we're passed an ARMMMUIdx_E10_* mmuidx value. This used to make sense because we only used this function in ptw.c and would never use it on this kind of stage 1+2 mmuidx, only for an individual stage 1 or stage 2 mmuidx. However, when we implemented FEAT_E0PD we added a callsite in aa64_va_parameters(), which means this can now be called for stage 1+2 mmuidx values if the guest sets the TCG_ELX.{E0PD0,E0PD1} bits to enable use of the feature. This will then result in an assertion failure later, for instance on a TLBI operation: #6 0x00007ffff6d0e70f in g_assertion_message_expr (domain=0x0, file=0x55555676eeba "../../target/arm/internals.h", line=978, func=0x555556771d48 <__func__.5> "regime_is_user", expr=<optimised out>) at ../../../glib/gtestutils.c:3279 #7 0x0000555555f286d2 in regime_is_user (env=0x555557f2fe00, mmu_idx=ARMMMUIdx_E10_0) at ../../target/arm/internals.h:978 #8 0x0000555555f3e31c in aa64_va_parameters (env=0x555557f2fe00, va=18446744073709551615, mmu_idx=ARMMMUIdx_E10_0, data=true, el1_is_aa32=false) at ../../target/arm/helper.c:12048 #9 0x0000555555f3163b in tlbi_aa64_get_range (env=0x555557f2fe00, mmuidx=ARMMMUIdx_E10_0, value=106721347371041) at ../../target/arm/helper.c:5214 #10 0x0000555555f317e8 in do_rvae_write (env=0x555557f2fe00, value=106721347371041, idxmap=21, synced=true) at ../../target/arm/helper.c:5260 #11 0x0000555555f31925 in tlbi_aa64_rvae1is_write (env=0x555557f2fe00, ri=0x555557fbeae0, value=106721347371041) at ../../target/arm/helper.c:5302 #12 0x0000555556036f8f in helper_set_cp_reg64 (env=0x555557f2fe00, rip=0x555557fbeae0, value=106721347371041) at ../../target/arm/tcg/op_helper.c:965 Since we do know whether these mmuidx values are for usermode or not, we can easily make regime_is_user() handle them: ARMMMUIdx_E10_0 is user, and the other two are not. Cc: qemu-stable@nongnu.org Fixes: e4c93e4 ("target/arm: Implement FEAT_E0PD") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20241017172331.822587-1-peter.maydell@linaro.org

qemu-ga on a NetBSD -current VM terminates with a SIGSEGV upon receiving 'guest-set-time' command... Core was generated by `qemu-ga'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 88 *str[len] = '\0'; [Current thread is 1 (process 1112)] (gdb) bt #0 0x000000000cd37a40 in ga_pipe_read_str (fd=fd@entry=0xffffff922a20, str=str@entry=0xffffff922a18) at ../qga/commands-posix.c:88 #1 0x000000000cd37b60 in ga_run_command (argv=argv@entry=0xffffff922a90, action=action@entry=0xcda34b8 "set hardware clock to system time", errp=errp@entry=0xffffff922a70, in_str=0x0) at ../qga/commands-posix.c:164 #2 0x000000000cd380c4 in qmp_guest_set_time (has_time=<optimized out>, time_ns=<optimized out>, errp=errp@entry=0xffffff922ad0) at ../qga/commands-posix.c:304 #3 0x000000000cd253d8 in qmp_marshal_guest_set_time (args=<optimized out>, ret=<optimized out>, errp=0xffffff922b48) at qga/qga-qapi-commands.c:193 #4 0x000000000cd4e71c in qmp_dispatch (cmds=cmds@entry=0xcdf5b18 <ga_commands>, request=request@entry=0xf3c711a4b000, allow_oob=allow_oob@entry=false, cur_mon=cur_mon@entry=0x0) at ../qapi/qmp-dispatch.c:220 #5 0x000000000cd36524 in process_event (opaque=0xf3c711a79000, obj=0xf3c711a4b000, err=0x0) at ../qga/main.c:677 #6 0x000000000cd526f0 in json_message_process_token (lexer=lexer@entry=0xf3c711a79018, input=0xf3c712072480, type=type@entry=JSON_RCURLY, x=28, y=1) at ../qobject/json-streamer.c:99 #7 0x000000000cd93860 in json_lexer_feed_char (lexer=lexer@entry=0xf3c711a79018, ch=125 '}', flush=flush@entry=false) at ../qobject/json-lexer.c:313 #8 0x000000000cd93a00 in json_lexer_feed (lexer=lexer@entry=0xf3c711a79018, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-lexer.c:350 #9 0x000000000cd5290c in json_message_parser_feed (parser=parser@entry=0xf3c711a79000, buffer=buffer@entry=0xffffff922d10 "{\"execute\":\"guest-set-time\"}\n", size=<optimized out>) at ../qobject/json-streamer.c:121 #10 0x000000000cd361fc in channel_event_cb (condition=<optimized out>, data=0xf3c711a79000) at ../qga/main.c:703 #11 0x000000000cd3710c in ga_channel_client_event (channel=<optimized out>, condition=<optimized out>, data=0xf3c711b2d300) at ../qga/channel-posix.c:94 #12 0x0000f3c7120d9bec in g_main_dispatch () from /usr/pkg/lib/libglib-2.0.so.0 #13 0x0000f3c7120dd25c in g_main_context_iterate_unlocked.constprop () from /usr/pkg/lib/libglib-2.0.so.0 #14 0x0000f3c7120ddbf0 in g_main_loop_run () from /usr/pkg/lib/libglib-2.0.so.0 #15 0x000000000cda00d8 in run_agent_once (s=0xf3c711a79000) at ../qga/main.c:1522 #16 run_agent (s=0xf3c711a79000) at ../qga/main.c:1559 #17 main (argc=<optimized out>, argv=<optimized out>) at ../qga/main.c:1671 (gdb) The commandline options used on the host machine... qemu-system-aarch64 \ -machine type=virt,pflash0=rom \ -m 8G \ -cpu host \ -smp 8 \ -accel hvf \ -device virtio-net-pci,netdev=unet \ -device virtio-blk-pci,drive=hd \ -drive file=netbsd.qcow2,if=none,id=hd \ -netdev user,id=unet,hostfwd=tcp::2223-:22 \ -object rng-random,filename=/dev/urandom,id=viornd0 \ -device virtio-rng-pci,rng=viornd0 \ -serial mon:stdio \ -display none \ -blockdev node-name=rom,driver=file,filename=/opt/homebrew/Cellar/qemu/9.0.2/share/qemu/edk2-aarch64-code.fd,read-only=true \ -chardev socket,path=/tmp/qga_netbsd.sock,server=on,wait=off,id=qga0 \ -device virtio-serial \ -device virtconsole,chardev=qga0,name=org.qemu.guest_agent.0 This patch rectifies the operator precedence while assigning the NUL terminator. Fixes: c3f32c1 Signed-off-by: Sunil Nimmagadda <sunil@nimmagadda.net> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/m15xppk9qg.fsf@nimmagadda.net Signed-off-by: Konstantin Kostiuk <kkostiuk@redhat.com>

legoater mentioned this issue Jul 7, 2017

unlink_anon_vmas: Unable to handle kernel paging request at virtual address 12005452 openbmc/linux#102

Closed

legoater mentioned this issue Mar 5, 2018

Witherspoon QEMU kernel oops: Unable to handle kernel paging request openbmc/openbmc#2962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guest kernel memory corruption #9

Guest kernel memory corruption #9

amboar commented Jul 4, 2017 •

edited

Loading

legoater commented Jul 4, 2017

amboar commented Jul 4, 2017

Guest kernel memory corruption #9

Guest kernel memory corruption #9

Comments

amboar commented Jul 4, 2017 • edited Loading

legoater commented Jul 4, 2017

amboar commented Jul 4, 2017

amboar commented Jul 4, 2017 •

edited

Loading