Bug 8346 - [Anolis8.9][RC1][loongarch64]虚拟机 cpu hotplug后,虚拟机死机
Summary: [Anolis8.9][RC1][loongarch64]虚拟机 cpu hotplug后,虚拟机死机
Status: CLOSED FIXED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-4.19 (show other bugs) kernel - anck-4.19
Version: 8.9
Hardware: loongarch Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: qhw13324663979
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 19048
  Show dependency tree
 
Reported: 2024-02-27 17:58 UTC by wuzhiguo
Modified: 2025-02-25 14:28 UTC (History)
2 users (show)

See Also:


Attachments
选择4.19内核安装后无法进入系统 (206.48 KB, image/png)
2024-03-15 12:53 UTC, qhw13324663979
Details
安装的kernel-4.19.91-27.7.an8 (139.83 KB, image/png)
2024-03-15 12:55 UTC, qhw13324663979
Details

Note You need to log in before you can comment on or make changes to this bug.
Description wuzhiguo loongson_group 2024-02-27 17:58:49 UTC
Description of problem:
虚拟机 cpu hotplug后,虚拟机死机


Version-Release number of selected component (if applicable):
kernel版本: 4.19.190-7.7.an8.loongarch64
qemu版本:QEMU emulator version 6.2.0 (qemu-kvm-6.2.0-41.0.1.module+an8.9.0+11168+98c7cfc6.1)


Steps to Reproduce:
1. 虚拟机启动,命令如下:
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine loongson7a,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -m 2048 \
    -object memory-backend-ram,size=2048M,id=mem-machine_mem  \
    -smp 1,maxcpus=2,cores=2,threads=1,sockets=1  \
    -cpu 'Loongson-3A5000' \
    -device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
    -device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,filename=/root/avocado/data/avocado-vt/images/AnolisOS-8.9-loongarch64.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -vnc :0  \
    -rtc base=utc,clock=host  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -bios loongarch_bios.bin \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
    -serial stdio \
    -monitor telnet:localhost:4444,server,nowait
2. 虚拟机启动后,qemu monitor 进行 cpu hotplug,命令如下:
# telnet localhost 4444
Trying ::1...
Connected to localhost.
Escape character is '^]'.
QEMU 6.2.0 monitor - type 'help' for more information
(qemu) 
(qemu) device_add Loongson-3A5000-loongarch-cpu,id=vcpu1,core-id=1


Actual results:
[root@anolis-8-guest ~]# [   41.221035] CPU1 has been hot-added
[   41.224239] Booting CPU#1...
[ 8697.930254] 64-bit Loongson Processor probed (LA464 Core)
[ 8697.930937] CPU1 revision is: 0014c010 (Loongson-64bit)
[ 8697.931505] FPU1 revision is: 00000000
[ 8697.931912] CPU1 __my_cpu_offset: 942b0000
[ 8697.932460] CPU#1 finished
[ 8697.933244] pv stealtime: cpu 1, st:0x9000000096000680 phys:0x96000680
[ 8697.934185] Will online and init hotplugged CPU: 1
[ 8697.936965] rcu: INFO: rcu_sched self-detected stall on CPU
[ 8697.937770] rcu: 	1-...!: (1 ticks this GP) idle=012/0/0x1 softirq=1/1 fqs=0 
[ 8697.938719] rcu: 	 (t=6597070 jiffies g=12093 q=141)
[ 8697.939389] rcu: rcu_sched kthread starved for 6597070 jiffies! g12093 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 8697.940804] rcu: RCU grace-period kthread stack dump:
[ 8697.941507] rcu_sched       I    0    10      2 0x00004000
[ 8697.942266] Stack : 900000009400ec00 0000000000000000 ffffffffffffffff 900000000161d940
[ 8697.943350]         90000000016442f0 0000000000000004 90000000ec537d40 0000000000000000
[ 8697.944444]         ffffffffffffffff 900000000161d940 90000000016442f0 0000000000000000
[ 8697.945540]         90000000ec537d40 9000000094004a00 00000000000000b0 00000000ffff03b1
[ 8697.946619]         900000000163602c 900000000112407c 00000000ffff03b1 90000000011289bc
[ 8697.947695]         90000000ec537e08 00000000000000b4 0000000000000000 9000000094004c00
[ 8697.948780]         00000000ffff03b1 90000000002cad00 000000000c800000 90000000ec4d7700
[ 8697.949849]         90000000016442f0 0000000000000001 0000000000000001 9000000001636028
[ 8697.950936]         0000000000000005 900000000161d940 0000000000000006 90000000016442f0
[ 8697.952032]         900000000165c280 900000000165a280 900000000163602c 90000000002ba674
[ 8697.953134]         ...
[ 8697.953492] Call Trace:
[ 8697.953868] [<9000000001123830>] __schedule+0x4e0/0xd00
[ 8697.954598] [<9000000001124078>] schedule+0x28/0x80
[ 8697.955253] [<90000000011289b8>] schedule_timeout+0x208/0x520
[ 8697.956062] [<90000000002ba670>] rcu_gp_kthread+0x9a0/0xab0
[ 8697.956838] [<900000000024f1fc>] kthread+0x12c/0x140
[ 8697.957530] [<90000000002031c8>] ret_from_kernel_thread+0x8/0x10
[ 8697.958342] Sending NMI from CPU 1 to CPUs 0:
[ 8707.959472] NMI backtrace for cpu 1
[ 8707.960111] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G            E     4.19.190-7.7.an8.loongarch64 #1
[ 8707.961530] Hardware name: Loongson KVM, BIOS 0.0.0 02/06/2015
[ 8707.962339] Stack : 0000000000000000 900000000111c120 90000000ec544000 90000000ec187ab0
[ 8707.963422]         0000000000000000 90000000ec187ab0 0000000000000000 00000000000000ff
[ 8707.964521]         0000000000000000 ffffffffffffffff 0000000000000020 0000000000aaaaaa
[ 8707.965627]         900000000111c120 0000000000000007 0000000000000006 0000000000000007
[ 8707.966713]         9000000096000950 0000000000aaaaaa 0000000000000205 0000000000000001
[ 8707.967798]         ffff80010d27f020 00000000942b0000 0000000000000001 0000000000000000
[ 8707.968892]         9000000001768230 0000000000000000 0000000000000000 0000000000000000
[ 8707.970004]         900000000112da20 0000000000000240 9000000001636028 9000000001635f40
[ 8707.971094]         900000000020a334 0000000000000000 00000000000000b0 0000000000000004
[ 8707.973256]         0000000000000000 000000000007141c 0000000000000800 9000000001768230
[ 8707.975360]         ...
[ 8707.976741] Call Trace:
[ 8707.978123] [<900000000020a334>] show_stack+0x34/0x140
[ 8707.979843] [<900000000111c11c>] dump_stack+0xac/0xe8
[ 8707.981530] [<90000000010ffe10>] nmi_cpu_backtrace+0xb0/0xf0
[ 8707.983269] [<9000000001100008>] nmi_trigger_cpumask_backtrace+0x1b8/0x1c0
[ 8707.985196] [<90000000011123a4>] rcu_dump_cpu_stacks+0x124/0x190
[ 8707.987008] [<90000000002bc690>] rcu_check_callbacks+0x940/0xa00
[ 8707.988780] [<90000000002cc384>] update_process_times+0x34/0x90
[ 8707.990571] [<90000000002df324>] tick_sched_handle+0x84/0xa0
[ 8707.992305] [<90000000002df7f4>] tick_sched_timer+0x44/0xa0
[ 8707.994013] [<90000000002ccfd4>] __hrtimer_run_queues+0x194/0x400
[ 8707.995755] [<90000000002ce2c0>] hrtimer_interrupt+0x140/0x380
[ 8707.997459] [<9000000000209694>] constant_timer_interrupt+0x34/0x50
[ 8707.999197] [<90000000002a4cf8>] __handle_irq_event_percpu+0x88/0x280
[ 8708.000922] [<90000000002a4f14>] handle_irq_event_percpu+0x24/0x90
[ 8708.002630] [<90000000002aadf0>] handle_percpu_irq+0x60/0xa0
[ 8708.004258] [<90000000002a3878>] generic_handle_irq+0x28/0x50
[ 8708.005889] [<900000000112a6ec>] do_IRQ+0x1c/0x30
[ 8708.007386] [<9000000000203430>] except_vec_vi_handler+0xac/0xdc
[ 8708.009075] [<9000000000203380>] __cpu_wait+0x20/0x24
[ 8708.010626] [<9000000000264468>] do_idle+0x258/0x300
[ 8708.012160] [<90000000002646a0>] cpu_startup_entry+0x20/0x30
[ 8708.013804] [<900000000111d148>] smp_bootstrap+0x50/0x58


Expected results:
cpu hoyplug 后,成功被添加,自动成功上线,虚拟机正常工作。
Comment 1 qhw13324663979 2024-03-15 12:53:54 UTC
Created attachment 1078 [details]
选择4.19内核安装后无法进入系统
Comment 2 qhw13324663979 2024-03-15 12:55:16 UTC
Created attachment 1079 [details]
安装的kernel-4.19.91-27.7.an8
Comment 3 qhw13324663979 2024-03-15 12:57:26 UTC
使用的镜像anolis-8-x86_64-dvd1-20240202.0.iso
Comment 4 wangzhe 2024-03-16 21:34:14 UTC
cpu hotplug 死机问题暂时无法复现,此问题场景不影响产品核心功能使用,为保证产品发布,此问题处理优先级降低
Comment 5 wuzhiguo loongson_group 2024-04-09 10:50:50 UTC
使用 kernel-4.19.190-7.9.an8.loongarch64 测试通过。

内核rpm下载地址: https://abs.openanolis.cn/all_project/1?tab=packages&package_id=43928
Comment 6 wuzhiguo loongson_group 2024-09-05 20:22:06 UTC
使用 kernel-4.19.190-7.9.an8.loongarch64 测试通过。

内核rpm下载地址: https://abs.openanolis.cn/all_project/1?tab=packages&package_id=43928