Bug 19397 - [Anolis 8.10][RC1][loongarch64][3C6000单路] KVM虚拟机动态迁移失败,虚拟机死机,虚拟机内核报错。
Summary: [Anolis 8.10][RC1][loongarch64][3C6000单路] KVM虚拟机动态迁移失败,虚拟机死机,虚拟机内核报错。
Status: CLOSED FIXED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-4.19 (show other bugs) kernel - anck-4.19
Version: 8.10
Hardware: loongarch Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: wenlong
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-07 16:09 UTC by wuzhiguo
Modified: 2025-03-19 16:02 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wuzhiguo loongson_group 2025-03-07 16:09:29 UTC
Description of problem:
3C6000单路虚拟机动态迁移失败,虚拟机死机,虚拟机内核报错。

Version-Release number of selected component (if applicable):
物理机内核版本: 4.19.190-7.11.an8.loongarch64
虚拟机内核版本: 4.19.190-7.11.an8.loongarch64
qemu版本: qemu-kvm-6.2.0-53.0.3.module+an8.9.0+11292+334bc2d1.2.loongarch64

How reproducible:

Steps to Reproduce:
1.使用 avocado-vt 工具测试
# avocado run migrate.default --vt-type qemu --vt-guest-os Linux.AnolisOS.8.10.loongarch64

Actual results:
3C6000单路虚拟机动态迁移失败,虚拟机死机,虚拟机内核报错。如下:
[   16.163162] CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, era == 9000000000ef1934, ra == 9000000000ef1a78
[   16.164259] Oops[#1]:
[   16.164469] CPU: 0 PID: 871 Comm: NetworkManager Tainted: G            E     4.19.190-7.11.an8.loongarch64 #1
[   16.165245] Hardware name: Loongson QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[   16.165824] pc 9000000000ef1934 ra 9000000000ef1a78 tp 90000000efc38000 sp 90000000efc3bb20
[   16.166478] a0 90000000f1820000 a1 90000000f18200c8 a2 0000000000000000 a3 0000000000000000
[   16.167138] a4 90000000efc3bc58 a5 90000000efc3bc5c a6 90000000efc3bb8c a7 90000000efc3bbf0
[   16.167788] t0 0000000000000000 t1 90000000fb59dd00 t2 0000000000000000 t3 0000000000008000
[   16.168436] t4 000000aaaf2b4000 t5 000000007fffc000 t6 ffff800000000000 t7 000000007fffc000
[   16.169087] t8 90000000efc38000 u0 0000000f00010003 s9 000000fffbda7250 s0 90000000fb59de00
[   16.169731] s1 0000000000000000 s2 0000000000000004 s3 90000000efc3bc58 s4 0000000000000040
[   16.170377] s5 900000000161c2f0 s6 90000000f1820000 s7 00000000000000b4 s8 90000000efc3bc5c
[   16.171022]    ra: 9000000000ef1a78 __skb_try_recv_datagram+0xe8/0x1d0
[   16.171537]   ERA: 9000000000ef1934 __skb_try_recv_from_queue+0x194/0x1f0
[   16.172082]  CRMD: 00000001 (PLV1 -IE -DA -PG DACF=SUC DACM=SUC -WE)
[   16.172597]  PRMD: 00000000 (PPLV0 -PIE -PWE)
[   16.172965]  EUEN: 00000003 (+FPE +SXE -ASXE -BTE)
[   16.173363]  ECFG: 9000000000f66fa8 (LIE=3,5,7-11 VS=6)
[   16.173798] ESTAT: 90000000fb67ae00
[   16.174098] ExcCode : 27 (SubCode 1ed)
[   16.174417]  PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
[   16.174895] Modules linked in: nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nf_tables_set(E) nft_chain_nat_ipv6(E) nf_nat_ipv6(E) nft_chain_route_ipv6(E) nft_chain_nat_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_chain_route_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) nfnetlink(E) vfat(E) fat(E) efi_pstore(E) joydev(E) vmw_vsock_virtio_transport(E) efivars(E) vmw_vsock_virtio_transport_common(E) vsock(E) nvme_fabrics(E) dm_multipath(E) virtio_net(E) net_failover(E) failover(E) virtio_scsi(E) bochs_drm(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) libcxgbi(E) libcxgb(E) qla4xxx(E) iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) fuse(E) scsi_transport_iscsi(E)
[   16.180370] Process NetworkManager (pid: 871, threadinfo=00000000a67ca7fa, task=00000000509c50f7)
[   16.181063] Stack : 0000000000000000 000000000000002a 90000000fb59de00 0000000000000000
[   16.181683]         90000000f18200dc 90000000efc3bbf0 90000000f18200dc 9000000000ef1a78
[   16.182304]         90000000f18200c8 0000000000000000 0000000000000000 90000000efc3bc78
[   16.182930]         0000000000000000 0000000000000000 0000000000000001 000000fffbda75e0
[   16.183550]         fffffffffffffff5 90000000efc3bc5c 90000000efc3bc58 0000000000000000
[   16.184170]         0000000000000040 0000000000000040 90000000f1820000 90000000efc3bc78
[   16.184796]         000000fffbda7250 9000000000ef1c00 90000000fb59fd00 0000000000000000
[   16.185418]         90000000efc3bc00 000000fffbda7600 0000000000008000 0000000000000000
[   16.186042]         90000000f5575b80 90000000efc3be48 0000000000000040 0000000000000000
[   16.186662]         90000000f1820000 9000000000ef1c90 0000000000000000 0000000000000000
[   16.187281]         ...
[   16.187502] Call Trace:
[   16.188509] [<9000000000ef1934>] __skb_try_recv_from_queue+0x194/0x1f0
[   16.189745] [<9000000000ef1a74>] __skb_try_recv_datagram+0xe4/0x1d0
[   16.190947] [<9000000000ef1bfc>] __skb_recv_datagram+0x9c/0xf0
[   16.192114] [<9000000000ef1c8c>] skb_recv_datagram+0x3c/0x50
[   16.193277] [<9000000000f609c0>] netlink_recvmsg+0x60/0x3e0
[   16.194417] [<9000000000edc984>] ___sys_recvmsg+0xc4/0x170
[   16.195526] [<9000000000eddea8>] __sys_recvmsg+0x48/0xa0
[   16.196603] [<9000000000212214>] syscall_common+0x20/0x34
[   16.197687] Code: 29c002e0  29c022e0  29c021ac <29c0018d> 43ff00ff  001502e5  29c02069  4c0000e1  28c02069
[   16.199087]
[   16.199891] CPU 0 Unable to handle kernel paging request at virtual address 0000000000000000, era == 90000000010f2024, ra == 90000000010f82fc
[   16.202204] Oops[#2]:
[   16.203074] CPU: 0 PID: 871 Comm: NetworkManager Tainted: G      D     E     4.19.190-7.11.an8.loongarch64 #1
[   16.204511] Hardware name: Loongson QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[   16.205777] pc 90000000010f2024 ra 90000000010f82fc tp 90000000efc38000 sp 90000000f61a3d80
[   16.207121] a0 9000000006007640 a1 9000000006006f60 a2 0000000000000000 a3 0000000000000000
[   16.208464] a4 9000000001121228 a5 9000000001121200 a6 0000000000000001 a7 0000000000000000
[   16.209807] t0 9000000006007468 t1 9000000006007468 t2 0000000000000000 t3 9000000006007469
[   16.211134] t4 9000000006007468 t5 fffffffffffffffc t6 0000000000000000 t7 00000000042e0000
[   16.212461] t8 0000000000000001 u0 900000000161c2f0 s9 9000000006006f40 s0 9000000006007640
[   16.213809] s1 9000000006006f60 s2 0000000000000000 s3 9000000006006f00 s4 0000000000000004
[   16.215134] s5 900000000161c2f0 s6 900000000174d180 s7 900000000174d1a8 s8 0000000000000000
[   16.216466]    ra: 90000000010f82fc timerqueue_del+0x6c/0xa0
[   16.217612]   ERA: 90000000010f2024 rb_erase+0x264/0x410
[   16.218725]  CRMD: 900000000174d180 (PLV0 -IE -DA -PG DACF=SUC DACM=Reserved(3) -WE)
[   16.220041]  PRMD: 00000000 (PPLV0 -PIE -PWE)
[   16.221099]  EUEN: 00000004 (-FPE -SXE +ASXE -BTE)
[   16.222165]  ECFG: 00000000 (LIE= VS=0)
[   16.223142] ESTAT: 900000000174d1a8
[   16.224069] ExcCode : 34 (SubCode 5)
[   16.224980]  PRID: 0014d010 (Loongson-64bit, Loongson-3C6000/S)
[   16.226049] Modules linked in: nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nf_tables_set(E) nft_chain_nat_ipv6(E) nf_nat_ipv6(E) nft_chain_route_ipv6(E) nft_chain_nat_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_chain_route_ipv4(E) rfkill(E) ip_set(E) nf_tables(E) nfnetlink(E) vfat(E) fat(E) efi_pstore(E) joydev(E) vmw_vsock_virtio_transport(E) efivars(E) vmw_vsock_virtio_transport_common(E) vsock(E) nvme_fabrics(E) dm_multipath(E) virtio_net(E) net_failover(E) failover(E) virtio_scsi(E) bochs_drm(E) be2iscsi(E) bnx2i(E) cnic(E) uio(E) cxgb4i(E) cxgb4(E) libcxgbi(E) libcxgb(E) qla4xxx(E) iscsi_boot_sysfs(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) fuse(E) scsi_transport_iscsi(E)
[   16.235831] Process NetworkManager (pid: 871, threadinfo=00000000a67ca7fa, task=00000000509c50f7)
[   16.237198] Stack : 900000000161c2f0 9000000006006f40 9000000006007640 90000000002ce708
[   16.238515]         9000000006006f00 0000000000000000 9000000006007640 9000000006006f00
[   16.239829]         9000000006006f00 90000000002ce954 0000000000000000 0000000000000000
[   16.241142]         00000000000000b0 90000000f61a3eec 0000000000000002 00000003a58c3f1f
[   16.242454]         0000000000200000 00000003a58c3f1f 0000000000000000 9000000006006ff8
[   16.243781]         9000000006006f00 00000003a58c3f1f 0000000000000004 900000000161c2f0
[   16.245096]         0000000000000001 0000000000000003 000000000000003d 00000000000000b0
[   16.246416]         9000000006007038 90000000002cfc74 900000000600fce8 9000000006006f0c
[   16.247718]         9000000006006fb8 00000003a58c3f1f 900000000160dfd0 0000000000000001
[   16.249024]         90000000ebd9dfa0 0000000000000014 90000000f6165a00 900000000174c9b0
[   16.250325]         ...
[   16.251233] Call Trace:
[   16.252134] [<90000000010f2024>] rb_erase+0x264/0x410
[   16.253234] [<90000000010f82f8>] timerqueue_del+0x68/0xa0
[   16.254363] [<90000000002ce704>] __remove_hrtimer+0x64/0xe0
[   16.255491] [<90000000002ce950>] __hrtimer_run_queues+0x160/0x400
[   16.256648] [<90000000002cfc70>] hrtimer_interrupt+0x140/0x380
[   16.257780] [<9000000000209844>] constant_timer_interrupt+0x34/0x50
[   16.258921] [<90000000002a4c68>] __handle_irq_event_percpu+0x88/0x280
[   16.260056] [<90000000002a4e84>] handle_irq_event_percpu+0x24/0x90
[   16.261181] [<90000000002aad60>] handle_percpu_irq+0x60/0xa0
[   16.262254] [<90000000002a37e8>] generic_handle_irq+0x28/0x50
[   16.263318] [<900000000111877c>] do_IRQ+0x1c/0x30
[   16.264294] [<90000000002034b0>] except_vec_vi_handler+0xac/0xdc
[   16.265371] [<900000000020a84c>] die+0x11c/0x190
[   16.266357] [<9000000001118218>] no_context+0x118/0x120
[   16.267387] [<90000000011186d0>] do_page_fault+0x310/0x3a0
[   16.268436] [<9000000000219c28>] tlb_do_page_fault_1+0x110/0x128
[   16.269531] [<9000000000ef1934>] __skb_try_recv_from_queue+0x194/0x1f0
[   16.270651] [<9000000000ef1a74>] __skb_try_recv_datagram+0xe4/0x1d0
[   16.271742] [<9000000000ef1bfc>] __skb_recv_datagram+0x9c/0xf0
[   16.272801] [<9000000000ef1c8c>] skb_recv_datagram+0x3c/0x50
[   16.273839] [<9000000000f609c0>] netlink_recvmsg+0x60/0x3e0
[   16.274863] [<9000000000edc984>] ___sys_recvmsg+0xc4/0x170
[   16.275878] [<9000000000eddea8>] __sys_recvmsg+0x48/0xa0
[   16.276886] [<9000000000212214>] syscall_common+0x20/0x34
[   16.277893] Code: 29c041ae  29c0218d  038005af <29c001cf> 28c001af  29c0018f  29c001ac  0014c5ef  400089e0
[   16.279241]
[   16.280003] ---[ end trace ad59b71fb7017510 ]---
[   16.280995] Kernel panic - not syncing: Fatal exception in interrupt
[   16.282091] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Expected results:
虚拟机动态迁移成功,虚拟机工作正常。

Additional info:
3C5000双路服务器上虚拟机动态迁移没有此问题。
Comment 1 lixianglai loongson_group 2025-03-08 17:24:15 UTC
问题提交到内部代码仓库,
http://rd.loongson.cn:8081/c/kernel/linux-4.19-anolis/+/62817
Comment 2 wangzhe 2025-03-16 18:06:30 UTC
PR 已合入,龙芯内核已更新构建
kernel-4.19.190-7.12.an8
Comment 3 liqianwen loongson_group 2025-03-19 16:02:55 UTC
已验证,已修复

iso下载:
http://build.openanolis.cn/kojifiles/output/anolis-8-20250316.5/compose/BaseOS/loongarch64/iso/anolis-8-loongarch64-dvd1-20250316.5.iso

内核版本:
4.19.190-7.12.an8.loongarch64

测试结果:pass 10 | cancel 1
# avocado run migrate.default --vt-type qemu --vt-guest-os Linux.AnolisOS.8.10.loongarch64
JOB ID     : 6bd0fed292cdee99dfd056edb88d1a81d2fa286c
JOB LOG    : /root/avocado/job-results/job-2025-03-19T15.01-6bd0fed/job.log
 (01/11) type_specific.io-github-autotest-qemu.migrate.default.tcp.default: STARTED
 (01/11) type_specific.io-github-autotest-qemu.migrate.default.tcp.default: PASS (78.29 s)
 (02/11) type_specific.io-github-autotest-qemu.migrate.default.tcp.with_filter_off.with_post_copy: STARTED
 (02/11) type_specific.io-github-autotest-qemu.migrate.default.tcp.with_filter_off.with_post_copy: PASS (67.71 s)
 (03/11) type_specific.io-github-autotest-qemu.migrate.default.tcp.with_filter_off.with_multifd: STARTED
 (03/11) type_specific.io-github-autotest-qemu.migrate.default.tcp.with_filter_off.with_multifd: PASS (76.91 s)
 (04/11) type_specific.io-github-autotest-qemu.migrate.default.unix.default: STARTED
 (04/11) type_specific.io-github-autotest-qemu.migrate.default.unix.default: PASS (76.49 s)
 (05/11) type_specific.io-github-autotest-qemu.migrate.default.unix.with_filter_off.with_post_copy: STARTED
 (05/11) type_specific.io-github-autotest-qemu.migrate.default.unix.with_filter_off.with_post_copy: PASS (67.07 s)
 (06/11) type_specific.io-github-autotest-qemu.migrate.default.unix.with_filter_off.with_multifd: STARTED
 (06/11) type_specific.io-github-autotest-qemu.migrate.default.unix.with_filter_off.with_multifd: PASS (77.25 s)
 (07/11) type_specific.io-github-autotest-qemu.migrate.default.exec.default_exec.default: STARTED
 (07/11) type_specific.io-github-autotest-qemu.migrate.default.exec.default_exec.default: PASS (75.89 s)
 (08/11) type_specific.io-github-autotest-qemu.migrate.default.exec.gzip_exec.default: STARTED
 (08/11) type_specific.io-github-autotest-qemu.migrate.default.exec.gzip_exec.default: PASS (137.31 s)
 (09/11) type_specific.io-github-autotest-qemu.migrate.default.fd.default: STARTED
 (09/11) type_specific.io-github-autotest-qemu.migrate.default.fd.default: PASS (77.11 s)
 (10/11) type_specific.io-github-autotest-qemu.migrate.default.fd.with_filter_off.with_multifd: STARTED
 (10/11) type_specific.io-github-autotest-qemu.migrate.default.fd.with_filter_off.with_multifd: CANCEL: Unable to access capability: set capability failed for multifd (QMP command 'migrate-set-capabilities' failed    (arguments: {'capabilities': [{'state': True, 'capability': 'multifd'}]},    error message: {'class': 'GenericError', 'desc': 'multifd is not ... (57.93 s)
 (11/11) type_specific.io-github-autotest-qemu.migrate.default.mig_cancel.default: STARTED
 (11/11) type_specific.io-github-autotest-qemu.migrate.default.mig_cancel.default: PASS (67.14 s)
RESULTS    : PASS 10 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 1
JOB HTML   : /root/avocado/job-results/job-2025-03-19T15.01-6bd0fed/results.html
JOB TIME   : 871.31 s