Bug 3060 - 在ampere cpu上无法产生vmcore
Summary: 在ampere cpu上无法产生vmcore
Status: RESOLVED FIXED
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: ARM (show other bugs) ARM
Version: 4.19-026.x
Hardware: aarch64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: xiangzao
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-14 16:56 UTC by wangkaiyuan
Modified: 2023-01-04 16:06 UTC (History)
3 users (show)

See Also:


Attachments
串口日志显示crashkernel下发生oom (140.71 KB, image/png)
2022-11-14 16:56 UTC, wangkaiyuan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description wangkaiyuan inspur_group 2022-11-14 16:56:42 UTC
Created attachment 462 [details]
串口日志显示crashkernel下发生oom

Description of problem:
在Ampere cpu服务器上测试kdump,发现无法产生完整的vmcore文件

Version-Release number of selected component (if applicable):
kernel 4.19.91-26.4

How reproducible:
配置crashkernel=768M

Steps to Reproduce:
1.echo 1 > /proc/sys/kernel/sysrq
2.echo c > /proc/sysrq-trigger
3.

Actual results:
/var/crash/下存在不完整的vmcore

Expected results:
/var/crash/下存在完整的vmcore

Additional info:
Comment 1 xiangzao alibaba_cloud_group 2022-11-14 20:02:49 UTC
目前分析vmcore无法生成有两个原因
一是kexec传递的program header有问题,通过升级kexec-tools版本可以解决
二是由于安培架构和加载的驱动的原因使得需要预留较大的crashkernel大小,通过增大crashkernel的大小可以解决

目前kexec的问题在两台安培机器上都出现了,后续会讨论下解决方案
Comment 2 xiangzao alibaba_cloud_group 2022-11-23 10:48:32 UTC
解决kexec问题需要合入的patch

1. arm64: support more than one crash kernel regions
   b5a34a20984c4ad27cc5054d9957af8130b42a50

2. arm64: make phys_offset signed
   67ea2d99e1356352034dc9d9c7b5ec6dd6b722eb
Comment 3 xiangzao alibaba_cloud_group 2023-01-04 16:06:39 UTC
kexec-tools版本升级为2.0.24之后问题解决,置为fixed