Bug 1360 - 加载镜像后,通过手动触发crash未正常生成vmcore
Summary: 加载镜像后,通过手动触发crash未正常生成vmcore
Status: RESOLVED WONTFIX
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: Images&Installations (show other bugs) Images&Installations
Version: 8.4
Hardware: All Linux
: P2-High S2-major
Target Milestone: ---
Assignee: maqiao_mq
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-30 10:56 UTC by chuyang_94
Modified: 2022-09-23 14:30 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description chuyang_94 alibaba_cloud_group 2022-05-30 10:56:24 UTC
Description of problem:
在g8m.large云上实例上加载anolis8.4 arm镜像后,通过手动触发crash未正常生成vmcore

部分系统日志:
May 30 10:35:09 iZbp1ar06qebal9l3l44dlZ kernel: sysrq: SysRq : Trigger a crash
May 30 10:35:09 iZbp1ar06qebal9l3l44dlZ kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
May 30 10:35:09 iZbp1ar06qebal9l3l44dlZ kernel: Mem abort info:
May 30 10:35:09 iZbp1ar06qebal9l3l44dlZ kernel:   ESR = 0x96000044
-- Reboot --
May 30 10:35:54 localhost.localdomain kernel: Booting Linux on physical CPU 0x0000000000 [0x410fd490]

Version-Release number of selected component (if applicable):
# cat /etc/image-id
image_name="Anolis OS 8.4 ANCK 64 bit ARM Edition"
image_id="anolisos_8_4_arm64_20G_anck_alibase_20220519.vhd"
release_date="20220519172657"

# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/boot/vmlinuz-4.19.91-25.8.an8.aarch64 root=UUID=4acda34d-7b77-4784-9c53-1ae37d9c6961 ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M

#uname -a
Linux iZbp1ar06qebal9l3l44dlZ 4.19.91-25.8.an8.aarch64 #1 SMP Tue Apr 12 16:51:26 CST 2022 aarch64 aarch64 aarch64 GNU/Linux

How reproducible:


Steps to Reproduce:
1.加载anolis8.4 arm镜像
2.检查kdump等服务正常 systemctl status kdump
3.手动触发crash echo c >/proc/sysrq-trigger
4.等环境reboot完成,在/var/crash文件夹下检查是否正常生成vmcore

Actual results:
未有vmcore生成

Expected results:
正常生成crash相关vmcore文件

Additional info:
Comment 1 maqiao_mq alibaba_cloud_group 2022-06-02 14:45:53 UTC
原因:
crashkernel参数配置,预留内存过小导致kdump无法正常运行。

实例使用的是8G内存,crashkernel配置如下:
> crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M

将该参数修改为:
> crashkernel=0M-2G:0M,2G-8G:256M,8G-:512M

如此,在预留256M内存的情况下,可以正常生成vmcore
Comment 2 gaochang alibaba_cloud_group 2022-06-07 15:47:19 UTC
经过评估,不影响发布
Comment 3 liuyaqing alibaba_cloud_group 2022-08-03 17:20:25 UTC
7月份镜像测试,镜像anolisos_7_9_x64_20G_anck_uefi_alibase_20220727.vhd

实例ecs.ebmg6a.64xlarge 在默认配置无法生成vmcore,crashkernel改为crashkernel=0M-2G:0M,2G-8G:256M,8G-:512M依旧无法生成vmcore 且/var/log/dmesg没有日志内容


# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.19.91-26.an7.x86_64 root=UUID=9180ee76-6ecd-4463-a1f1-e106e4e186ee ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api spectre_v2=retpoline rhgb quiet biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 intel_idle.max_cstate=1

# cat /etc/image-id
image_name="Anolis OS 7.9 ANCK 64 bit UEFI Edition"
image_id="anolisos_7_9_x64_20G_anck_uefi_alibase_20220727.vhd"
release_date="20220727162909"

# uname -a
Linux iZbp13hv74nnq9gvs38q0mZ 4.19.91-26.an7.x86_64 #1 SMP Tue May 24 13:18:57 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
Comment 4 liuyaqing alibaba_cloud_group 2022-08-25 17:53:06 UTC
8月份月度镜像anolis 7.9 arm依旧有此问题
涉及实例:ecs.g6r.large0,ecs.g6r.16xlarge1
以ecs.g6r.large0为例
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64 root=UUID=6d95fc29-8c36-4e3b-a232-ea912f85f638 ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295

# cat /etc/os-release
NAME="Anolis OS"
VERSION="7.9"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="7.9"
PRETTY_NAME="Anolis OS 7.9"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugs.openanolis.cn/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

# cat /etc/image-id
image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition"
image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220815.vhd"
release_date="20220815152133"

# uname -a
Linux iZbp1fmukjrpety3ot2ur4Z 4.18.0-193.28.1.an7.aarch64 #1 SMP Wed Dec 22 16:43:38 CST 2021 aarch64 aarch64 aarch64 GNU/Linux
Comment 5 maqiao_mq alibaba_cloud_group 2022-09-23 14:28:30 UTC
(In reply to liuyaqing from comment #4)
> 8月份月度镜像anolis 7.9 arm依旧有此问题
> 涉及实例:ecs.g6r.large0,ecs.g6r.16xlarge1
> 以ecs.g6r.large0为例
> # cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-4.18.0-193.28.1.an7.aarch64
> root=UUID=6d95fc29-8c36-4e3b-a232-ea912f85f638 ro
> crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests
> cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api
> rhgb quiet console=tty0 biosdevname=0 net.ifnames=0 console=ttyAMA0,115200n8
> noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295
> 
> # cat /etc/os-release
> NAME="Anolis OS"
> VERSION="7.9"
> ID="anolis"
> ID_LIKE="rhel fedora centos"
> VERSION_ID="7.9"
> PRETTY_NAME="Anolis OS 7.9"
> ANSI_COLOR="0;31"
> HOME_URL="https://openanolis.cn/"
> BUG_REPORT_URL="https://bugs.openanolis.cn/"
> 
> CENTOS_MANTISBT_PROJECT="CentOS-7"
> CENTOS_MANTISBT_PROJECT_VERSION="7"
> REDHAT_SUPPORT_PRODUCT="centos"
> REDHAT_SUPPORT_PRODUCT_VERSION="7"
> 
> # cat /etc/image-id
> image_name="Anolis OS 7.9 RHCK 64 bit ARM Edition"
> image_id="anolisos_7_9_arm64_20G_rhck_alibase_20220815.vhd"
> release_date="20220815152133"
> 
> # uname -a
> Linux iZbp1fmukjrpety3ot2ur4Z 4.18.0-193.28.1.an7.aarch64 #1 SMP Wed Dec 22
> 16:43:38 CST 2021 aarch64 aarch64 aarch64 GNU/Linux

RHCK内核,不处理
Comment 6 maqiao_mq alibaba_cloud_group 2022-09-23 14:30:35 UTC
crash问题以后@xiangzao会统一处理,目前已经刷新过一次crashkernel的参数了,本问题置为won't fix