Bug 5374 - [Anolis23 GA][x86/arm]ISO镜像默认没有crashkernel值,无法启动kdump服务,无法触发crash
Summary: [Anolis23 GA][x86/arm]ISO镜像默认没有crashkernel值,无法启动kdump服务,无法触发crash
Status: CONFIRMED
Alias: None
Product: Anolis OS 23
Classification: Anolis OS
Component: Images&Installations (show other bugs) Images&Installations
Version: 23.0
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ga
Assignee: happy_orange
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-02 11:52 UTC by liuyaqing
Modified: 2024-04-26 16:38 UTC (History)
3 users (show)

See Also:


Attachments
默认没有crashkernel值 (8.05 MB, image/bmp)
2023-06-02 11:52 UTC, liuyaqing
Details
kernel-core 包 post 脚本执行 (341.38 KB, image/png)
2023-06-16 10:04 UTC, happy_orange
Details

Note You need to log in before you can comment on or make changes to this bug.
Description liuyaqing alibaba_cloud_group 2023-06-02 11:52:12 UTC
Created attachment 741 [details]
默认没有crashkernel值

Description of problem:
arm/x86镜像默认没有crashkernel值,无法启动kdump服务,报错为: anolis systemd[1]: kdump.service - Crash recovery kernel arming was skipped because of an unmet condition check (ConditionKernelCommandLine=crashkernel)  无法触发crash

以arm dvd-iso为例:
# uname -r
5.10.134-14.an23.aarch64
# cat /etc/anolis-release 
Anolis OS release 23

# cat /etc/os-release 
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"
Version-Release number of selected component (if applicable):


Steps to Reproduce:
1.yum install crash kexec-tools -y 
2.systemctl start kdump && systemctl status kdump
3.cat /proc/cmdline

Actual results:
kdump服务无法启动,没有crashkernel

Expected results:
/proc/cmdline存在crashkernel值,kdump服务正常启动,执行echo c >/proc/sysrq-trigger正常触发crash

Additional info:
Comment 1 liuyaqing alibaba_cloud_group 2023-06-05 12:19:23 UTC
问题镜像为anolis23-dvd-iso和anolis23-boot-iso镜像,包含x86和arm64
Comment 2 happy_orange alibaba_cloud_group 2023-06-10 15:05:41 UTC
是不是系统配置的内存太小了?系统内存小于 8G 的时候,会出现 kdump 启动失败的现象。

我本地没有复现出来,启动服务是可以的。
[root@iZbp1gmbng4di4cw992erkZ ~]# systemctl start kdump
[root@iZbp1gmbng4di4cw992erkZ ~]# systemctl status kdump
● kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: enabled)
     Active: active (exited) since Sat 2023-06-10 14:59:36 CST; 3min 32s ago
   Main PID: 873 (code=exited, status=0/SUCCESS)
        CPU: 17.807s

Jun 10 22:59:26 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Install squash loader ***
Jun 10 22:59:26 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Stripping files ***
Jun 10 14:59:28 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Stripping files done ***
Jun 10 14:59:28 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Squashing the files inside the initramfs >
Jun 10 14:59:35 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Squashing the files inside the initramfs >
Jun 10 14:59:35 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Creating image file '/boot/initramfs-5.10>
Jun 10 14:59:35 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Creating initramfs image file '/boot/init>
Jun 10 14:59:36 iZbp1gmbng4di4cw992erkZ kdumpctl[889]: kdump: kexec: loaded kdump kernel
Jun 10 14:59:36 iZbp1gmbng4di4cw992erkZ kdumpctl[889]: kdump: Starting kdump: [OK]
Jun 10 14:59:36 iZbp1gmbng4di4cw992erkZ systemd[1]: Finished kdump.service - Crash recovery kernel >
[root@iZbp1gmbng4di4cw992erkZ ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"
Comment 3 liuyaqing alibaba_cloud_group 2023-06-13 09:34:41 UTC
(In reply to happy_orange from comment #2)
> 是不是系统配置的内存太小了?系统内存小于 8G 的时候,会出现 kdump 启动失败的现象。
> 
> 我本地没有复现出来,启动服务是可以的。
> [root@iZbp1gmbng4di4cw992erkZ ~]# systemctl start kdump
> [root@iZbp1gmbng4di4cw992erkZ ~]# systemctl status kdump
> ● kdump.service - Crash recovery kernel arming
>      Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset:
> enabled)
>      Active: active (exited) since Sat 2023-06-10 14:59:36 CST; 3min 32s ago
>    Main PID: 873 (code=exited, status=0/SUCCESS)
>         CPU: 17.807s
> 
> Jun 10 22:59:26 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Install squash
> loader ***
> Jun 10 22:59:26 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Stripping files ***
> Jun 10 14:59:28 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Stripping files
> done ***
> Jun 10 14:59:28 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Squashing the
> files inside the initramfs >
> Jun 10 14:59:35 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Squashing the
> files inside the initramfs >
> Jun 10 14:59:35 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Creating image
> file '/boot/initramfs-5.10>
> Jun 10 14:59:35 iZbp1gmbng4di4cw992erkZ dracut[1225]: *** Creating initramfs
> image file '/boot/init>
> Jun 10 14:59:36 iZbp1gmbng4di4cw992erkZ kdumpctl[889]: kdump: kexec: loaded
> kdump kernel
> Jun 10 14:59:36 iZbp1gmbng4di4cw992erkZ kdumpctl[889]: kdump: Starting
> kdump: [OK]
> Jun 10 14:59:36 iZbp1gmbng4di4cw992erkZ systemd[1]: Finished kdump.service -
> Crash recovery kernel >
> [root@iZbp1gmbng4di4cw992erkZ ~]# cat /etc/os-release
> NAME="Anolis OS"
> VERSION="23"
> ID="anolis"
> VERSION_ID="23"
> PLATFORM_ID="platform:an23"
> PRETTY_NAME="Anolis OS 23"
> ANSI_COLOR="0;31"
> HOME_URL="https://openanolis.cn/"
> BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

额,我想说的是镜像里默认是没有crashkernel的,不是值太小的问题,需要手动在/etc/default/grub里添加crashkernel="xxx",没有这个crashkernel值,kdump服务是起不来的
Comment 4 happy_orange alibaba_cloud_group 2023-06-16 10:04:30 UTC
Created attachment 782 [details]
kernel-core 包 post 脚本执行

该问题仅存在 iso 启动场景里,vhd 启动没有问题。
经过初步定位,应该是 kernel-core 的安装脚本没有执行成功,手动执行脚本之后,是有 crashkernel 值的。
Comment 5 happy_orange alibaba_cloud_group 2023-06-19 10:08:47 UTC
在 kernel-core 中尝试了增加 grubby 的依赖,并进行 iso 安装验证,经过验证之后发现该问题没有被修复。
Comment 6 liuyaqing alibaba_cloud_group 2023-06-25 15:56:06 UTC
anolis23 GA RC2版本测试依旧存在该问题
Comment 7 happy_orange alibaba_cloud_group 2023-06-29 10:31:07 UTC
尝试使用 beta 版本,也是没有该变量,应该是没有预留空间。
Comment 8 happy_orange alibaba_cloud_group 2023-06-29 10:39:52 UTC
可以先进行手动设置:
(1)grep "GRUB_CMDLINE_LINUX" /etc/default/grub && sed -i 's/GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="crashkernel=0M-2G:0M,2G-8G:192M,8G-128G:256M,128G-:384M /g' /etc/default/grub;
(2)/usr/sbin/grub2-mkconfig -o /boot/grub2/grub.cfg
(3)reboot 重启
(4)重新启动 kdump 服务。
Comment 9 happy_orange alibaba_cloud_group 2023-06-29 21:19:12 UTC
有 workaround 方法,先降级。
Comment 10 Banana alibaba_cloud_group 2024-01-30 18:42:33 UTC
An23.1 RC1物理机镜像有相同问题

workaround方法可行,未阻塞物理机checklist

[root@anolis ~]# cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.10.134-16.2_rc1.an23.x86_64 root=/dev/mapper/ao_anolis-root ro resume=/dev/mapper/ao_anolis-swap rd.lvm.lv=ao_anolis/root rd.lvm.lv=ao_anolis/swap rhgb quiet


kdump服务:
[root@anolis ~]# systemctl status kdump.service
○ kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Tue 2024-01-30 18:52:08 CST; 939ms ago
             └─ ConditionKernelCommandLine=crashkernel was not met

Jan 05 17:17:32 anolis systemd[1]: kdump.service - Crash recovery kernel arming was skipped because of an unmet condition check (ConditionKernelCommandLine=cr>
Jan 30 18:52:08 anolis systemd[1]: kdump.service - Crash recovery kernel arming was skipped because of an unmet condition check (ConditionKernelCommandLine=cr>
 ESCOC

abled)

 was skipped because of an unmet condition check (ConditionKernelCommandLine=crashkernel).
 was skipped because of an unmet condition check (ConditionKernelCommandLine=crashkernel).

[root@anolis ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

[root@anolis ~]# uname -a
Linux anolis 5.10.134-16.2_rc1.an23.x86_64 #1 SMP Tue Jan 2 10:09:28 CST 2024 x86_64 GNU/Linux
Comment 12 Banana alibaba_cloud_group 2024-04-26 16:37:25 UTC
在an23.1 GA版本arm环境仍存在此问题

[root@anolis ~]# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

[root@anolis ~]# uname -r
6.6.25-2_rc1.an23.aarch64

[root@anolis ~]# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.6.25-2_rc1.an23.aarch64 root=UUID=cb8e7cfe-0e15-478c-aad7-5903e90e574e ro selinux=0

[root@anolis ~]# systemctl start kdump.service
[root@anolis ~]# systemctl status kdump.service
○ kdump.service - Crash recovery kernel arming
     Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; preset: enabled)
     Active: inactive (dead)
  Condition: start condition unmet at Fri 2024-04-26 16:23:31 CST; 4s ago
             └─ ConditionKernelCommandLine=crashkernel was not met
        CPU: 0

4月 25 17:40:04 anolis systemd[1]: kdump.service - Crash recovery kernel arming was skipped because of an unmet condition check (ConditionKernelCom>
4月 26 16:23:31 anolis systemd[1]: kdump.service - Crash recovery kernel arming was skipped because of an unmet condition check (ConditionKernelCom>