Bug 770 - 冒烟自动化测试,手动触发crash后,实例异常
Summary: 冒烟自动化测试,手动触发crash后,实例异常
Status: CONFIRMED
Alias: None
Product: Anolis OS 7
Classification: Anolis OS
Component: kernel - rhck (show other bugs) kernel - rhck
Version: 7.9
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: yunqi-zwt
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-01 10:45 UTC by chuyang_94
Modified: 2022-04-01 16:29 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description chuyang_94 alibaba_cloud_group 2022-04-01 10:45:08 UTC
Description of problem:
运行镜像冒烟自动化测试,在kdump测试项中,手动触发crash后,实例重启失败,查看日志像是进入了dracut模式,后期实例一直异常

Version-Release number of selected component (if applicable):
手动触发crash后,从日志看触发成功:
[   27.522564] CPU: 46 PID: 3282 Comm: bash Kdump: loaded Not tainted 3.10.0-1160.59.1.0.1.an7.x86_64 #1
[   27.523548] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[   27.524365] task: ffff8ad78007e300 ti: ffff8ad76f124000 task.ti: ffff8ad76f124000
[   27.525161] RIP: 0010:[<ffffffff85e755a6>]  [<ffffffff85e755a6>] sysrq_handle_crash+0x16/0x20
[   27.526093] RSP: 0018:ffff8ad76f127e58  EFLAGS: 00010246
[   27.526666] RAX: ffffffff85e75590 RBX: ffffffff866e74a0 RCX: 0000000000000000
[   27.527429] RDX: 0000000000000000 RSI: ffff8af7819938d8 RDI: 0000000000000063
[   27.528190] RBP: ffff8ad76f127e58 R08: ffffffff86a0487c R09: 0000000000000002
[   27.528960] R10: 00000000000005c2 R11: 00000000000005c1 R12: 0000000000000063
[   27.529727] R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000000
[   27.530495] FS:  00007f9508e6a740(0000) GS:ffff8af781980000(0000) knlGS:0000000000000000
[   27.531357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.531977] CR2: 0000000000000000 CR3: 0000005f40272000 CR4: 00000000003407e0
[   27.532746] Call Trace:
[   27.533021]  [<ffffffff85e75dcd>] __handle_sysrq+0x10d/0x170
[   27.533638]  [<ffffffff85e76238>] write_sysrq_trigger+0x28/0x40
[   27.534288]  [<ffffffff85cc7770>] proc_reg_write+0x40/0x80
[   27.534884]  [<ffffffff85c4e590>] vfs_write+0xc0/0x1f0
[   27.535452]  [<ffffffff85c4f36f>] SyS_write+0x7f/0xf0
[   27.536003]  [<ffffffff86199f92>] system_call_fastpath+0x25/0x2a

触发crash后,实例重启:
[[32m  OK  [0m] Started dracut initqueue hook.

[[32m  OK  [0m] Reached target Remote File Systems (Pre).

[[32m  OK  [0m] Reached target Remote File Systems.

         Mounting /sysroot...

[   11.134870] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
[[32m  OK  [0m] Mounted /sysroot.

[[32m  OK  [0m] Reached target Initrd Root File System.

         Starting Reload Configuration from the Real Root...

[[32m  OK  [0m] Started Reload Configuration from the Real Root.

[[32m  OK  [0m] Reached target Initrd File Systems.

[[32m  OK  [0m] Reached target Initrd Default Target.

         Starting dracut pre-pivot and cleanup hook...

[[32m  OK  [0m] Started dracut
从日志看,实例started dracut后,没有日志,实例无法ping通,重启也无法恢复

How reproducible:
仅在ecs.g6a.32xlarge实例必现

Steps to Reproduce:
1.安装anolisos_7_9_x64_20G_rhck_alibase_20220316.vhd镜像,启动实例
2.运行镜像冒烟自动化测试

Actual results:
手动触发crash后,未正常重启,实例异常

Expected results:
手动触发crash后,能正常重启,实例能正常运行

Additional info:
# cat /etc/image-id
image_name="Anolis OS 7.9 RHCK 64 bit"
image_id="anolisos_7_9_x64_20G_rhck_alibase_20220316.vhd"
release_date="20220316123009"
# uname -r
3.10.0-1160.59.1.0.1.an7.x86_64

crashkernel配置:
# cat /boot/grub2/grub.cfg | grep crashkernel
	linux16 /boot/vmlinuz-3.10.0-1160.59.1.0.1.an7.x86_64 root=UUID=a9878241-301b-4cdb-885c-5a809423caff ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api spectre_v2=retpoline   biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295
	linux16 /boot/vmlinuz-0-rescue-20220316121405035951031993234224 root=UUID=a9878241-301b-4cdb-885c-5a809423caff ro crashkernel=0M-2G:0M,2G-8G:192M,8G-:256M cryptomgr.notests cgroup.memory=nokmem rcupdate.rcu_cpu_stall_timeout=300 vring_force_dma_api spectre_v2=retpoline   biosdevname=0 net.ifnames=0 console=tty0 console=ttyS0,115200n8 noibrs nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295
Comment 1 Jacob admin 2022-04-01 16:29:06 UTC
不影响发布