Bug 19048 - [Anolis 8.10][RC1][loongarch64] KVM虚拟机cpu hotunplug时,qemu进程存在内存泄露,qemu进程报错“总线错误 (核心已转储)”。
Summary: [Anolis 8.10][RC1][loongarch64] KVM虚拟机cpu hotunplug时,qemu进程存在内存泄露,qemu进程报错“总线...
Status: CLOSED FIXED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: BaseOS Modules (show other bugs) BaseOS Modules
Version: 8.10
Hardware: loongarch Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: wenlong
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on: 8346
Blocks:
  Show dependency tree
 
Reported: 2025-02-25 11:13 UTC by wuzhiguo
Modified: 2025-03-19 17:35 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wuzhiguo loongson_group 2025-02-25 11:13:53 UTC
Description of problem:
KVM虚拟机cpu hotunplug时,qemu进程存在内存泄露,qemu进程报错“总线错误 (核心已转储)”。
avocado-vt KVM虚拟化测试套件使用设置为1的MALLOC_PERTURB_env变量来帮助捕捉qemu上的内存分配问题。
有 MALLOC_PERTURB_=1 参数,cpu hotunplug 虚拟机死机。
没有 MALLOC_PERTURB_=1 参数,cpu hotunplug 虚拟机正常工作。

Version-Release number of selected component (if applicable):
内核版本: kernel-4.19.190-7.11.an8.loongarch64
qemu版本: qemu-kvm-6.2.0-53.0.3.module+an8.9.0+11292+334bc2d1.2.loongarch64

How reproducible:

Steps to Reproduce:
1. 启动虚拟机
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -cpu 'Loongson-3A5000' \
    -machine loongson7a \
    -m 2048 \
    -smp 1,maxcpus=8,cores=1,threads=1,sockets=8  \
    -device Loongson-3A5000-loongarch-cpu,id=vcpu1,core-id=1 \
    -bios loongarch_bios.bin \
    -boot c -d int \
    -drive file=AnolisOS-8.10-loongarch64.qcow2,if=virtio \
    -enable-kvm \
    -nographic \
    -net nic -net tap \
    -monitor telnet:localhost:4444,server,nowait \
    -serial stdio
2. 连接qemu monitor,进行cpu hotunplug操作。
[root@localhost ~]# telnet localhost 4444
Trying ::1...
Connected to localhost.
Escape character is '^]'.
QEMU 6.2.0 monitor - type 'help' for more information
(qemu) device_del vcpu1
(qemu) Connection closed by foreign host.
[root@localhost ~]# 
3. 观察虚拟机状态和cpu个数。
[root@anolis-8-10-guest ~]# lscpu 
Architecture:        loongarch64
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           2
NUMA node(s):        1
CPU family:          Loongson-64bit
Model name:          Loongson-3C5000
CPU MHz:             2000.00
BogoMIPS:            4000.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            256K
L3 cache:            16384K
NUMA node0 CPU(s):   0,1
Flags:               cpucfg lam ual fpu lsx lasx crc32 lbt_x86 lbt_arm lbt_mips
[root@anolis-8-10-guest ~]# [   38.487446] cpu1 hot remove!
总线错误 (核心已转储)
[root@localhost ~]# 

Actual results:
KVM虚拟机cpu hotunplug时,qemu进程存在内存泄露,qemu进程报错“总线错误 (核心已转储)

Expected results:
KVM虚拟机cpu hotunplug时,qemu进程正常,虚拟机状态正常,虚拟机cpu个数正确。

Additional info:
Comment 1 lixianglai loongson_group 2025-02-25 11:27:20 UTC
其他架构应该也有此问题

Thread 6 "qemu-kvm" received signal SIGBUS, Bus error.
[Switching to Thread 0xffe37fa8b0 (LWP 1245830)]
0x0000000120542d3c in object_property_del_child (obj=0x101010101010101, child=0x125966810, errp=0x0) at qom/object.c:575
575        g_hash_table_iter_init(&iter, obj->properties);
(gdb) bt
#0  0x0000000120542d3c in object_property_del_child (obj=0x101010101010101, child=0x125966810, errp=0x0) at qom/object.c:575
#1  0x0000000120542ec4 in object_unparent (obj=0x125966810) at qom/object.c:599
#2  0x00000001202716ac in cpu_hotplug_wr (opaque=0x12562fe40, addr=4, data=8, size=1) at hw/acpi/cpu.c:130
#3  0x00000001200d516c in memory_region_write_accessor (mr=0x12562fe40, addr=4, value=0xffe37f9d08, size=1, shift=0, mask=255, attrs=...) at /root/qemu/memory.c:483
#4  0x00000001200d544c in access_with_adjusted_size (addr=4, value=0xffe37f9d08, size=1, access_size_min=1, access_size_max=4, access_fn=
    0x1200d507c <memory_region_write_accessor>, mr=0x12562fe40, attrs=...) at /root/qemu/memory.c:544
#5  0x00000001200d8890 in memory_region_dispatch_write (mr=0x12562fe40, addr=4, data=8, op=MO_8, attrs=...) at /root/qemu/memory.c:1475
#6  0x0000000120064988 in flatview_write_continue (fv=0x1276a58a0, addr=503316484, attrs=..., buf=0xfff5608028 "\b", len=1, addr1=4, l=1, mr=0x12562fe40) at /root/qemu/exec.c:3129
#7  0x0000000120064b20 in flatview_write (fv=0x1276a58a0, addr=503316484, attrs=..., buf=0xfff5608028 "\b", len=1) at /root/qemu/exec.c:3169
#8  0x0000000120064f40 in address_space_write (as=0x120c9f4a8 <address_space_memory>, addr=503316484, attrs=..., buf=0xfff5608028 "\b", len=1) at /root/qemu/exec.c:3259
#9  0x0000000120064fd4 in address_space_rw (as=0x120c9f4a8 <address_space_memory>, addr=503316484, attrs=..., buf=0xfff5608028 "\b", len=1, is_write=true) at /root/qemu/exec.c:3269
#10 0x00000001200f5a18 in kvm_cpu_exec (cpu=0x1253f5310) at /root/qemu/accel/kvm/kvm-all.c:2398
#11 0x00000001200c2ddc in qemu_kvm_cpu_thread_fn (arg=0x1253f5310) at /root/qemu/cpus.c:1318
#12 0x00000001206cd480 in qemu_thread_start (args=0x125424810) at util/qemu-thread-posix.c:519
#13 0x000000fff41a9dc8 in start_thread () at /lib64/libpthread.so.0
#14 0x000000fff40f98fc in __thread_start () at /lib64/libc.so.6
(gdb) Quit
Comment 2 lixianglai loongson_group 2025-02-25 11:27:42 UTC
为qemu公共问题
Comment 3 lixianglai loongson_group 2025-03-06 17:27:50 UTC
(In reply to lixianglai from comment #2)
> 为qemu公共问题

是loongarch架构的问题,cpu object重复释放了
Comment 4 lixianglai loongson_group 2025-03-07 10:35:16 UTC
已经提交PR:
https://gitee.com/src-anolis-os/qemu-kvm/pulls/69
Comment 5 wangzhe 2025-03-16 18:24:27 UTC
PR 已合入,已构建更新版本
qemu-kvm-6.2.0-53.0.7.module+an8.9.0+11306+880141fa.2
Comment 6 liqianwen loongson_group 2025-03-19 17:35:55 UTC
已复测,最终测试结果pass 8 | fail 2

iso下载:
http://build.openanolis.cn/kojifiles/output/anolis-8-20250316.5/compose/BaseOS/loongarch64/iso/anolis-8-loongarch64-dvd1-20250316.5.iso

内核版本:
4.19.190-7.12.an8.loongarch64

测试结果:
# avocado run cpu_device_hotpluggable.hotunplug  --vt-type qemu --vt-guest-os Linux.AnolisOS.8.10.loongarch64
JOB ID     : 7ec485a8d5396038f4480d9022d42e0407cbbc1c
JOB LOG    : /root/avocado/job-results/job-2025-03-19T16.29-7ec485a/job.log
 (01/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_reboot.shell_reboot: STARTED
 (01/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_reboot.shell_reboot: PASS (165.40 s)
 (02/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_reboot.qemu_system_reset: STARTED
 (02/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_reboot.qemu_system_reset: PASS (154.06 s)
 (03/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_shutdown.shell_shutdown: STARTED
 (03/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_shutdown.shell_shutdown: PASS (84.44 s)
 (04/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_shutdown.qemu_system_powerdown: STARTED
 (04/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_shutdown.qemu_system_powerdown: FAIL: Guest refuses to go down after vcpu hotunplug (1144.87 s)
 (05/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_migrate: STARTED
 (05/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_migrate: PASS (225.94 s)
 (06/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_reboot.shell_reboot: STARTED
 (06/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_reboot.shell_reboot: PASS (164.14 s)
 (07/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_reboot.qemu_system_reset: STARTED
 (07/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_reboot.qemu_system_reset: PASS (180.70 s)
 (08/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_shutdown.shell_shutdown: STARTED
 (08/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_shutdown.shell_shutdown: PASS (88.09 s)
 (09/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_shutdown.qemu_system_powerdown: STARTED
 (09/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_shutdown.qemu_system_powerdown: FAIL: Guest refuses to go down after vcpu hotunplug (1144.89 s)
 (10/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_migrate: STARTED
 (10/10) type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_migrate: PASS (217.47 s)
RESULTS    : PASS 8 | ERROR 0 | FAIL 2 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2025-03-19T16.29-7ec485a/results.html
JOB TIME   : 3585.73 s

Test summary:
type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.multi_vcpu.with_shutdown.qemu_system_powerdown: FAIL
type_specific.io-github-autotest-qemu.cpu_device_hotpluggable.hotunplug.single_vcpu.with_shutdown.qemu_system_powerdown: FAIL