Description of problem: anolisos7 aarch64在tone执行冒烟测试手动触发crash失败,手动进实例触发依旧失败。 无法生成vmcore,有vmcore-dmesg.txt vmcore-dmesg.txt部分内容如下: [ 69.045064] CPU: 0 PID: 1742 Comm: bash Kdump: loaded Not tainted 4.18.0-193.28.1.an7.aarch64 #1 [ 69.046552] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 [ 69.047820] pstate: 40400005 (nZcv daif +PAN -UAO) [ 69.048643] pc : sysrq_handle_crash+0x24/0x30 [ 69.049390] lr : __handle_sysrq+0x98/0x188 [ 69.050081] sp : ffff000013c8fcf0 [ 69.050646] x29: ffff000013c8fcf0 x28: ffff8001c85dbb00 [ 69.051546] x27: 00000000004fb000 x26: 00000000004bbe70 [ 69.052452] x25: ffff0000116ad898 x24: 0000000000000000 [ 69.053373] x23: 0000000000000000 x22: 0000000000000004 [ 69.054269] x21: 0000000000000063 x20: ffff0000115dc000 [ 69.055172] x19: ffff00001160f000 x18: 0000000000000010 [ 69.056067] x17: 0000ffff80653c20 x16: 0000000000000000 [ 69.056970] x15: 0000000000aaaaaa x14: ffff0000115d3708 [ 69.057867] x13: 0000000000000001 x12: 00000000ffffffff [ 69.058770] x11: ffff000008e60090 x10: 0000000000000001 [ 69.059669] x9 : 0000000000000001 x8 : ffff0000105da200 [ 69.060565] x7 : 00000000000006bb x6 : ffff8001ef2923d0 [ 69.061471] x5 : ffff8001ef2923d0 x4 : 0000000000000000 [ 69.062366] x3 : ffff8001ef31a408 x2 : f8692d3ed1bb8800 [ 69.063268] x1 : 0000000000000000 x0 : 0000000000000001 [ 69.064165] Process bash (pid: 1742, stack limit = 0x00000000110189bb) [ 69.065273] Call trace: [ 69.065691] sysrq_handle_crash+0x24/0x30 [ 69.066373] __handle_sysrq+0x98/0x188 [ 69.067007] write_sysrq_trigger+0x70/0x88 [ 69.067707] proc_reg_write+0x7c/0xb8 [ 69.068329] __vfs_write+0x48/0x90 [ 69.069375] vfs_write+0xac/0x1b8 [ 69.070386] ksys_write+0x6c/0xd0 [ 69.071403] __arm64_sys_write+0x24/0x30 [ 69.072515] el0_svc_handler+0xb4/0x188 [ 69.073622] el0_svc+0x8/0xc [ 69.074543] Code: 52800020 b90cc820 d5033e9f d2800001 (39000020) [ 69.075999] SMP: stopping secondary CPUs [ 69.078021] Starting crashdump kernel... [ 69.079093] Bye! # cat /etc/image-id image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition" image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220518.vhd" release_date="20220518121027" # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=8c55f940-8fa4-41aa-acf7-686e5acfa571 ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 Version-Release number of selected component (if applicable): 4.18.0-193.28.1.an7.aarch64 How reproducible: 总是 Steps to Reproduce: 1.yum install crash kexec-tools -y yum install kernel-debuginfo -y systemctl restart kdump systemctl enable kdump echo c >/proc/sysrq-trigger Actual results: 无法产生crash文件 Expected results: 正常产生crash文件并能解析 Additional info:
ecs.g6r.large实例
crashkernel内存预留正常,第二内核可正常启动,问题出在makedumpfile执行失败上,日志如下: readpage_elf: Attempt to read non-existent page at 0x0. readmem: type_addr: 1, addr:0, size:8 vaddr_to_paddr_arm64: Can't read pmd readmem: Can't convert a virtual address(ffff0000115e556c) to physical address. readmem: type_addr: 0, addr:ffff0000115e556c, size:390 check_release: Can't get the address of system_utsname. sadump: unsupported architecture LOAD (0) phys_start : 40080000 phys_end : 42340000 virt_start : ffff000010080000 virt_end : ffff000012340000 LOAD (1) phys_start : 40000000 phys_end : efe00000 virt_start : ffff800000000000 virt_end : ffff8000afe00000 LOAD (2) phys_start : ffe00000 phys_end : 22b790000 virt_start : ffff8000bfe00000 virt_end : ffff8001eb790000 LOAD (3) phys_start : 22b7a0000 phys_end : 22ba50000 virt_start : ffff8001eb7a0000 virt_end : ffff8001eba50000 LOAD (4) phys_start : 22ba60000 phys_end : 22bc80000 virt_start : ffff8001eba60000 virt_end : ffff8001ebc80000 LOAD (5) phys_start : 22c000000 phys_end : 22c030000 virt_start : ffff8001ec000000 virt_end : ffff8001ec030000 LOAD (6) phys_start : 22c0f0000 phys_end : 22f3d0000 virt_start : ffff8001ec0f0000 virt_end : ffff8001ef3d0000 LOAD (7) phys_start : 22f460000 phys_end : 22f470000 virt_start : ffff8001ef460000 virt_end : ffff8001ef470000 LOAD (8) phys_start : 22f590000 phys_end : 230000000 virt_start : ffff8001ef590000 virt_end : ffff8001f0000000 Linux kdump page_size : 65536 phys_base : 40000000 max_mapnr : 23000 There is enough free memory to be done in one cycle. Buffer size for the cyclic mode: 35840 page_offset=ffff800000000000, va_bits=48 kimage_voffset : fffeffffd0000000 max_physmem_bits : 30 section_size_bits: 1e makedumpfile Failed.
经过评估,不影响发布
6月月度镜像 anolisos_7_9_arm64_20G_rhck_alibase_20220704.vhd ecs.g6r.large,ecs.g6r.16xlarge实例无法生成vmcore,但有vmcore-dmesg.txt 以ecs.g6r.large实例为例 # cat /etc/image-id image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition" image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220704.vhd" release_date="20220704095950" # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=a9f838d3-a6a5-4c9c-8b46-7ed525c4854b ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 部分vmcore-dmesg.txt内容如下: [529973.723371] CPU: 1 PID: 13028 Comm: bash Kdump: loaded Not tainted 4.18.0-193.28.1.an7.aarch64 #1 [529973.724855] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 [529973.726107] pstate: 40400005 (nZcv daif +PAN -UAO) [529973.726919] pc : sysrq_handle_crash+0x24/0x30 [529973.727661] lr : __handle_sysrq+0x98/0x188 [529973.728359] sp : ffff000013defcf0 [529973.728922] x29: ffff000013defcf0 x28: ffff8001c9e7aa00 [529973.729815] x27: 00000000004fb000 x26: 00000000004bbe70 [529973.730714] x25: ffff0000116ad898 x24: 0000000000000000 [529973.731615] x23: 0000000000000000 x22: 0000000000000004 [529973.732517] x21: 0000000000000063 x20: ffff0000115dc000 [529973.733412] x19: ffff00001160f000 x18: 0000000000000010 [529973.734313] x17: 0000ffff9bfe3c20 x16: 0000000000000000 [529973.735207] x15: 0000000000aaaaaa x14: ffff0000115d3708 [529973.736116] x13: 0000000000000001 x12: 00000000ffffffff [529973.737013] x11: ffff000008e60090 x10: 0000000000000001 [529973.737909] x9 : 0000000000000001 x8 : ffff0000105da200 [529973.738807] x7 : 00000000000006b9 x6 : ffff8001ef3323d0 [529973.739718] x5 : ffff8001ef3323d0 x4 : 0000000000000000 [529973.740615] x3 : ffff8001ef3ba408 x2 : aacd587e6a678000 [529973.741508] x1 : 0000000000000000 x0 : 0000000000000001 [529973.742404] Process bash (pid: 13028, stack limit = 0x00000000faa6ec63) [529973.743519] Call trace: [529973.743947] sysrq_handle_crash+0x24/0x30 [529973.744626] __handle_sysrq+0x98/0x188 [529973.745263] write_sysrq_trigger+0x70/0x88 [529973.745961] proc_reg_write+0x7c/0xb8 [529973.746585] __vfs_write+0x48/0x90 [529973.747646] vfs_write+0xac/0x1b8 [529973.748670] ksys_write+0x6c/0xd0 [529973.749688] __arm64_sys_write+0x24/0x30 [529973.750781] el0_svc_handler+0xb4/0x188 [529973.751879] el0_svc+0x8/0xc [529973.752796] Code: 52800020 b90cc820 d5033e9f d2800001 (39000020) [529973.754242] SMP: stopping secondary CPUs [529973.756184] Starting crashdump kernel... [529973.757273] Bye!
在7月份月度镜像测试依旧有该问题,手动触发crash无法生成vmcore但有vmcore-dmesg.txt文件 anolisos_7_9_arm64_20G_rhck_alibase_20220727.vhd镜像的ecs.g6r.16xlarge和ecs.g6r.large实例 以ecs.g6r.large实例为例: # cat /etc/image-id image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition" image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220727.vhd" release_date="20220727152319" # cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=bbc67681-3eb0-44b9-a28e-bf5f3121f72b ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 # uname -a Linux iZbp17ambf2tulah3k706aZ 4.18.0-193.28.1.an7.aarch64 #1 SMP Wed Dec 22 16:43:38 CST 2021 aarch64 aarch64 aarch64 GNU/Linux 部分vmcore-dmesg如下: [ 405.833608] CPU: 25 PID: 2587 Comm: bash Kdump: loaded Not tainted 4.18.0-193.28.1.an7.aarch64 #1 [ 405.835071] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 [ 405.836342] pstate: 40400005 (nZcv daif +PAN -UAO) [ 405.837126] pc : sysrq_handle_crash+0x24/0x30 [ 405.837904] lr : __handle_sysrq+0x98/0x188 [ 405.838635] sp : ffff000025f2fcf0 [ 405.839239] x29: ffff000025f2fcf0 x28: ffff803e489f7400 [ 405.840215] x27: 00000000004fb000 x26: 00000000004bbe70 [ 405.841190] x25: ffff0000116ad898 x24: 0000000000000000 [ 405.842164] x23: 0000000000000000 x22: 0000000000000004 [ 405.843133] x21: 0000000000000063 x20: ffff0000115dc000 [ 405.844095] x19: ffff00001160f000 x18: 0000000000000010 [ 405.845065] x17: 0000ffff98a23c20 x16: 0000000000000000 [ 405.846001] x15: 0000000000aaaaaa x14: ffff0000115d3708 [ 405.846959] x13: 0000000000000001 x12: 00000000ffffffff [ 405.847903] x11: ffff000008e60090 x10: 0000000000000001 [ 405.848858] x9 : 0000000000000001 x8 : ffff0000105da200 [ 405.849805] x7 : 00000000000007f6 x6 : ffff803eca1723d0 [ 405.850758] x5 : ffff803eca1723d0 x4 : 0000000000000000 [ 405.851722] x3 : ffff803eca1fa408 x2 : 305163d62ccf6200 [ 405.852679] x1 : 0000000000000000 x0 : 0000000000000001 [ 405.853630] Process bash (pid: 2587, stack limit = 0x0000000081e09e02) [ 405.854825] Call trace: [ 405.855287] sysrq_handle_crash+0x24/0x30 [ 405.856006] __handle_sysrq+0x98/0x188 [ 405.856691] write_sysrq_trigger+0x70/0x88 [ 405.857440] proc_reg_write+0x7c/0xb8 [ 405.858123] __vfs_write+0x48/0x90 [ 405.859208] vfs_write+0xac/0x1b8 [ 405.860284] ksys_write+0x6c/0xd0 [ 405.861342] __arm64_sys_write+0x24/0x30 [ 405.862508] el0_svc_handler+0xb4/0x188 [ 405.863656] el0_svc+0x8/0xc [ 405.864608] Code: 52800020 b90cc820 d5033e9f d2800001 (39000020) [ 405.866112] SMP: stopping secondary CPUs [ 405.876781] Starting crashdump kernel... [ 405.878144] Bye!