Bug 1368 - [anolisos7][aarch64]在执行冒烟测试时手动触发crash,无法生成vmcore,但有vmcore-desg.txt
Summary: [anolisos7][aarch64]在执行冒烟测试时手动触发crash,无法生成vmcore,但有vmcore-desg.txt
Status: CONFIRMED
Alias: None
Product: Anolis OS 7
Classification: Anolis OS
Component: Images&Installations (show other bugs) Images&Installations
Version: 7.9
Hardware: aarch64 Linux
: P2-High S2-major
Target Milestone: ---
Assignee: maqiao_mq
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on: 1365
Blocks:
  Show dependency tree
 
Reported: 2022-05-30 19:57 UTC by chuyang_94
Modified: 2022-08-03 17:02 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description chuyang_94 alibaba_cloud_group 2022-05-30 19:57:51 UTC
Description of problem:
anolisos7  aarch64在tone执行冒烟测试手动触发crash失败,手动进实例触发依旧失败。
无法生成vmcore,有vmcore-dmesg.txt
vmcore-dmesg.txt部分内容如下:
[   69.045064] CPU: 0 PID: 1742 Comm: bash Kdump: loaded Not tainted 4.18.0-193.28.1.an7.aarch64 #1
[   69.046552] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
[   69.047820] pstate: 40400005 (nZcv daif +PAN -UAO)
[   69.048643] pc : sysrq_handle_crash+0x24/0x30
[   69.049390] lr : __handle_sysrq+0x98/0x188
[   69.050081] sp : ffff000013c8fcf0
[   69.050646] x29: ffff000013c8fcf0 x28: ffff8001c85dbb00
[   69.051546] x27: 00000000004fb000 x26: 00000000004bbe70
[   69.052452] x25: ffff0000116ad898 x24: 0000000000000000
[   69.053373] x23: 0000000000000000 x22: 0000000000000004
[   69.054269] x21: 0000000000000063 x20: ffff0000115dc000
[   69.055172] x19: ffff00001160f000 x18: 0000000000000010
[   69.056067] x17: 0000ffff80653c20 x16: 0000000000000000
[   69.056970] x15: 0000000000aaaaaa x14: ffff0000115d3708
[   69.057867] x13: 0000000000000001 x12: 00000000ffffffff
[   69.058770] x11: ffff000008e60090 x10: 0000000000000001
[   69.059669] x9 : 0000000000000001 x8 : ffff0000105da200
[   69.060565] x7 : 00000000000006bb x6 : ffff8001ef2923d0
[   69.061471] x5 : ffff8001ef2923d0 x4 : 0000000000000000
[   69.062366] x3 : ffff8001ef31a408 x2 : f8692d3ed1bb8800
[   69.063268] x1 : 0000000000000000 x0 : 0000000000000001
[   69.064165] Process bash (pid: 1742, stack limit = 0x00000000110189bb)
[   69.065273] Call trace:
[   69.065691]  sysrq_handle_crash+0x24/0x30
[   69.066373]  __handle_sysrq+0x98/0x188
[   69.067007]  write_sysrq_trigger+0x70/0x88
[   69.067707]  proc_reg_write+0x7c/0xb8
[   69.068329]  __vfs_write+0x48/0x90
[   69.069375]  vfs_write+0xac/0x1b8
[   69.070386]  ksys_write+0x6c/0xd0
[   69.071403]  __arm64_sys_write+0x24/0x30
[   69.072515]  el0_svc_handler+0xb4/0x188
[   69.073622]  el0_svc+0x8/0xc
[   69.074543] Code: 52800020 b90cc820 d5033e9f d2800001 (39000020)
[   69.075999] SMP: stopping secondary CPUs
[   69.078021] Starting crashdump kernel...
[   69.079093] Bye!

# cat  /etc/image-id
image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition"
image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220518.vhd"
release_date="20220518121027"

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=8c55f940-8fa4-41aa-acf7-686e5acfa571 ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295


Version-Release number of selected component (if applicable):

4.18.0-193.28.1.an7.aarch64
How reproducible:
总是

Steps to Reproduce:
1.yum install crash kexec-tools -y
yum install kernel-debuginfo -y
systemctl restart kdump
systemctl enable kdump
echo c >/proc/sysrq-trigger


Actual results:
无法产生crash文件

Expected results:
正常产生crash文件并能解析

Additional info:
Comment 1 chuyang_94 alibaba_cloud_group 2022-05-30 20:41:38 UTC
ecs.g6r.large实例
Comment 2 maqiao_mq alibaba_cloud_group 2022-06-06 22:56:52 UTC
crashkernel内存预留正常,第二内核可正常启动,问题出在makedumpfile执行失败上,日志如下:

readpage_elf: Attempt to read non-existent page at 0x0.
readmem: type_addr: 1, addr:0, size:8
vaddr_to_paddr_arm64: Can't read pmd
readmem: Can't convert a virtual address(ffff0000115e556c) to physical address.
readmem: type_addr: 0, addr:ffff0000115e556c, size:390
check_release: Can't get the address of system_utsname.
sadump: unsupported architecture
LOAD (0)
  phys_start : 40080000
  phys_end   : 42340000
  virt_start : ffff000010080000
  virt_end   : ffff000012340000
LOAD (1)
  phys_start : 40000000
  phys_end   : efe00000
  virt_start : ffff800000000000
  virt_end   : ffff8000afe00000
LOAD (2)
  phys_start : ffe00000
  phys_end   : 22b790000
  virt_start : ffff8000bfe00000
  virt_end   : ffff8001eb790000
LOAD (3)
  phys_start : 22b7a0000
  phys_end   : 22ba50000
  virt_start : ffff8001eb7a0000
  virt_end   : ffff8001eba50000
LOAD (4)
  phys_start : 22ba60000
  phys_end   : 22bc80000
  virt_start : ffff8001eba60000
  virt_end   : ffff8001ebc80000
LOAD (5)
  phys_start : 22c000000
  phys_end   : 22c030000
  virt_start : ffff8001ec000000
  virt_end   : ffff8001ec030000
LOAD (6)
  phys_start : 22c0f0000
  phys_end   : 22f3d0000
  virt_start : ffff8001ec0f0000
  virt_end   : ffff8001ef3d0000
LOAD (7)
  phys_start : 22f460000
  phys_end   : 22f470000
  virt_start : ffff8001ef460000
  virt_end   : ffff8001ef470000
LOAD (8)
  phys_start : 22f590000
  phys_end   : 230000000
  virt_start : ffff8001ef590000
  virt_end   : ffff8001f0000000
Linux kdump
page_size    : 65536
phys_base    : 40000000

max_mapnr    : 23000
There is enough free memory to be done in one cycle.

Buffer size for the cyclic mode: 35840
page_offset=ffff800000000000, va_bits=48
kimage_voffset   : fffeffffd0000000
max_physmem_bits : 30
section_size_bits: 1e

makedumpfile Failed.
Comment 3 gaochang alibaba_cloud_group 2022-06-07 15:46:43 UTC
经过评估,不影响发布
Comment 4 liuyaqing alibaba_cloud_group 2022-07-14 17:58:22 UTC
6月月度镜像  anolisos_7_9_arm64_20G_rhck_alibase_20220704.vhd
ecs.g6r.large,ecs.g6r.16xlarge实例无法生成vmcore,但有vmcore-dmesg.txt

以ecs.g6r.large实例为例

# cat /etc/image-id
image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition"
image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220704.vhd"
release_date="20220704095950"

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=a9f838d3-a6a5-4c9c-8b46-7ed525c4854b ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295

部分vmcore-dmesg.txt内容如下:
[529973.723371] CPU: 1 PID: 13028 Comm: bash Kdump: loaded Not tainted 4.18.0-193.28.1.an7.aarch64 #1
[529973.724855] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
[529973.726107] pstate: 40400005 (nZcv daif +PAN -UAO)
[529973.726919] pc : sysrq_handle_crash+0x24/0x30
[529973.727661] lr : __handle_sysrq+0x98/0x188
[529973.728359] sp : ffff000013defcf0
[529973.728922] x29: ffff000013defcf0 x28: ffff8001c9e7aa00
[529973.729815] x27: 00000000004fb000 x26: 00000000004bbe70
[529973.730714] x25: ffff0000116ad898 x24: 0000000000000000
[529973.731615] x23: 0000000000000000 x22: 0000000000000004
[529973.732517] x21: 0000000000000063 x20: ffff0000115dc000
[529973.733412] x19: ffff00001160f000 x18: 0000000000000010
[529973.734313] x17: 0000ffff9bfe3c20 x16: 0000000000000000
[529973.735207] x15: 0000000000aaaaaa x14: ffff0000115d3708
[529973.736116] x13: 0000000000000001 x12: 00000000ffffffff
[529973.737013] x11: ffff000008e60090 x10: 0000000000000001
[529973.737909] x9 : 0000000000000001 x8 : ffff0000105da200
[529973.738807] x7 : 00000000000006b9 x6 : ffff8001ef3323d0
[529973.739718] x5 : ffff8001ef3323d0 x4 : 0000000000000000
[529973.740615] x3 : ffff8001ef3ba408 x2 : aacd587e6a678000
[529973.741508] x1 : 0000000000000000 x0 : 0000000000000001
[529973.742404] Process bash (pid: 13028, stack limit = 0x00000000faa6ec63)
[529973.743519] Call trace:
[529973.743947]  sysrq_handle_crash+0x24/0x30
[529973.744626]  __handle_sysrq+0x98/0x188
[529973.745263]  write_sysrq_trigger+0x70/0x88
[529973.745961]  proc_reg_write+0x7c/0xb8
[529973.746585]  __vfs_write+0x48/0x90
[529973.747646]  vfs_write+0xac/0x1b8
[529973.748670]  ksys_write+0x6c/0xd0
[529973.749688]  __arm64_sys_write+0x24/0x30
[529973.750781]  el0_svc_handler+0xb4/0x188
[529973.751879]  el0_svc+0x8/0xc
[529973.752796] Code: 52800020 b90cc820 d5033e9f d2800001 (39000020)
[529973.754242] SMP: stopping secondary CPUs
[529973.756184] Starting crashdump kernel...
[529973.757273] Bye!
Comment 5 liuyaqing alibaba_cloud_group 2022-08-03 17:02:42 UTC
在7月份月度镜像测试依旧有该问题,手动触发crash无法生成vmcore但有vmcore-dmesg.txt文件

anolisos_7_9_arm64_20G_rhck_alibase_20220727.vhd镜像的ecs.g6r.16xlarge和ecs.g6r.large实例

以ecs.g6r.large实例为例:

# cat /etc/image-id
image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition"
image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220727.vhd"
release_date="20220727152319"

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=bbc67681-3eb0-44b9-a28e-bf5f3121f72b ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295

# uname -a
Linux iZbp17ambf2tulah3k706aZ 4.18.0-193.28.1.an7.aarch64 #1 SMP Wed Dec 22 16:43:38 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

部分vmcore-dmesg如下:
[  405.833608] CPU: 25 PID: 2587 Comm: bash Kdump: loaded Not tainted 4.18.0-193.28.1.an7.aarch64 #1
[  405.835071] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
[  405.836342] pstate: 40400005 (nZcv daif +PAN -UAO)
[  405.837126] pc : sysrq_handle_crash+0x24/0x30
[  405.837904] lr : __handle_sysrq+0x98/0x188
[  405.838635] sp : ffff000025f2fcf0
[  405.839239] x29: ffff000025f2fcf0 x28: ffff803e489f7400
[  405.840215] x27: 00000000004fb000 x26: 00000000004bbe70
[  405.841190] x25: ffff0000116ad898 x24: 0000000000000000
[  405.842164] x23: 0000000000000000 x22: 0000000000000004
[  405.843133] x21: 0000000000000063 x20: ffff0000115dc000
[  405.844095] x19: ffff00001160f000 x18: 0000000000000010
[  405.845065] x17: 0000ffff98a23c20 x16: 0000000000000000
[  405.846001] x15: 0000000000aaaaaa x14: ffff0000115d3708
[  405.846959] x13: 0000000000000001 x12: 00000000ffffffff
[  405.847903] x11: ffff000008e60090 x10: 0000000000000001
[  405.848858] x9 : 0000000000000001 x8 : ffff0000105da200
[  405.849805] x7 : 00000000000007f6 x6 : ffff803eca1723d0
[  405.850758] x5 : ffff803eca1723d0 x4 : 0000000000000000
[  405.851722] x3 : ffff803eca1fa408 x2 : 305163d62ccf6200
[  405.852679] x1 : 0000000000000000 x0 : 0000000000000001
[  405.853630] Process bash (pid: 2587, stack limit = 0x0000000081e09e02)
[  405.854825] Call trace:
[  405.855287]  sysrq_handle_crash+0x24/0x30
[  405.856006]  __handle_sysrq+0x98/0x188
[  405.856691]  write_sysrq_trigger+0x70/0x88
[  405.857440]  proc_reg_write+0x7c/0xb8
[  405.858123]  __vfs_write+0x48/0x90
[  405.859208]  vfs_write+0xac/0x1b8
[  405.860284]  ksys_write+0x6c/0xd0
[  405.861342]  __arm64_sys_write+0x24/0x30
[  405.862508]  el0_svc_handler+0xb4/0x188
[  405.863656]  el0_svc+0x8/0xc
[  405.864608] Code: 52800020 b90cc820 d5033e9f d2800001 (39000020)
[  405.866112] SMP: stopping secondary CPUs
[  405.876781] Starting crashdump kernel...
[  405.878144] Bye!