Bug 1351 - [anolis8.4]执行镜像ltp测试产生crash:BUG: unable to handle page fault for address: ffff9521adaf9796,Oops: 0000 [#1] SMP NOPTI
Summary: [anolis8.4]执行镜像ltp测试产生crash:BUG: unable to handle page fault for address: fff...
Status: CONFIRMED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-5.10 (show other bugs) kernel - anck-5.10
Version: 8.4
Hardware: x86_64 Linux
: P2-High S2-major
Target Milestone: ---
Assignee: maqiao_mq
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-27 18:37 UTC by liuyaqing
Modified: 2024-03-12 10:31 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description liuyaqing alibaba_cloud_group 2022-05-27 18:37:59 UTC
Description of problem:
ecs.g7.32xlarge运行镜像冒烟ltp,产生Oops: 0000 [#1] SMP NOPTI  crash

Version-Release number of selected component (if applicable):
5.10.84-10.3.an8.x86_64

How reproducible:


Steps to Reproduce:
1.git clone https://github.com/linux-test-project/ltp
yum install gcc-c++ gcc git libaio* kernel-debuginfo -y --skip-broken
cd ltp
make autotools && ./configure && make && make install
mkdir /disk1
wipefs -a --force /dev/vdb 
mkfs -t ext4 -q -F /dev/vdb
mount -t ext4 /dev/vdb /disk1
mkdir -p /disk1/tmpdir/ltp
lsblk
mount | grep vdb
cd /opt/ltp
vim load.sh
#!/bin/bash
echo 1  > /proc/sys/kernel/panic
echo 1  > /proc/sys/kernel/hardlockup_panic
echo 1  > /proc/sys/kernel/softlockup_panic
echo 50 > /proc/sys/kernel/watchdog_thresh
echo 1200 > /proc/sys/kernel/hung_task_timeout_secs
echo 0   > /proc/sys/kernel/hung_task_panic
nr_cpu=$(nproc)
mem_kb=$(grep ^MemTotal /proc/meminfo | awk '{print $2}')
./runltp \
 -c $((nr_cpu / 2)) \
 -m $((nr_cpu / 4)),4,$((mem_kb / nr_cpu / 2 * 1024)),1 \
 -D $((nr_cpu / 10)),1,0,1 \
 -i 2 \
 -B ext4 \
 -R -p -q \
 -t 24h \
 -d /disk1/tmpdir/ltp
chmod +x load.sh
nohup ./load.sh > t1.log &
2.wget -c http://yum.tbsite.net/taobao/7/x86_64/current/vmcore-collect/vmcore-collect-1.0.30-20220517145059.alios7.x86_64.rpm
rpm -ivh vmcore-collect-1.0.30-20220517145059.alios7.x86_64.rpm
systemctl start vmcore-collect
systemctl enable vmcore-collect


Actual results:
执行2.5h产生crash,并因kernel-debuginfo不是5.10版本无法解析,上传解析
产生crash:BUG: unable to handle page fault for address: ffff9521adaf9796
smpboot: CPU 1 is now offline

http://vmcore.alibaba-inc.com/vmcore_detail/20220527161910_192.108.10.4/

Expected results:
无crash及其他已知问题,跑完24h

Additional info:
Comment 1 liuyaqing alibaba_cloud_group 2022-05-27 19:42:39 UTC
ltp运行日志打印:
thp01.c:100: TPASS: system didn't crash.
thp01.c:100: TPASS: system didn't crash.
..............
Comment 2 liuyaqing alibaba_cloud_group 2022-05-30 09:32:50 UTC
image_name="Anolis OS 8.4 ANCK 64 bit"
image_id="anolisos_8_4_x64_20G_anck_alibase_20220518.vhd"
release_date="20220518111246"
Comment 3 liuyaqing alibaba_cloud_group 2022-05-30 10:28:33 UTC
vmcore-dmesg.txt部分内容:

[ 8993.045330] Call Trace:
[ 8993.045406]  __netif_set_xps_queue+0x6d6/0x930
[ 8993.045513]  virtnet_set_affinity+0x141/0x1b0 [virtio_net]
[ 8993.045629]  ? virtnet_set_affinity+0x1b0/0x1b0 [virtio_net]
[ 8993.045746]  virtnet_cpu_dead+0x1b/0x20 [virtio_net]
[ 8993.045855]  cpuhp_invoke_callback+0x1b6/0x3f0
[ 8993.045953]  _cpu_down+0xbf/0x1e0
[ 8993.046026]  cpu_down+0x2c/0x50
[ 8993.046097]  device_offline+0x81/0xb0
[ 8993.046176]  online_store+0x3a/0x70
[ 8993.046253]  kernfs_fop_write_iter+0x130/0x1c0
[ 8993.046349]  new_sync_write+0x10b/0x190
[ 8993.046428]  vfs_write+0x182/0x250
[ 8993.046505]  ksys_write+0x45/0xb0
[ 8993.046577]  do_syscall_64+0x33/0x40
[ 8993.046656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8993.046766] RIP: 0033:0x7f4f560ed648
[ 8993.046842] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 55 6f 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 8993.048247] RSP: 002b:00007ffd870c23b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 8993.048890] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4f560ed648
[ 8993.049522] RDX: 0000000000000002 RSI: 00005586328f75b0 RDI: 0000000000000001
[ 8993.050149] RBP: 00005586328f75b0 R08: 00007f4f563c1860 R09: 00007f4f566c1680
[ 8993.050771] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4f563c06e0
[ 8993.051388] R13: 0000000000000002 R14: 00007f4f563bb880 R15: 0000000000000002
[ 8993.052003] Modules linked in: dummy(E) vfat(E) fat(E) fuse(E) xfs(E) libcrc32c(E) loop(E) tun(E) veth(E) tcp_diag(E) inet_diag(E) rfkill(E) sunrpc(E) mousedev(E) intel_rapl_msr(E) intel_rapl_common(E) nfit(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) virtio_balloon(E) psmouse(E) pcspkr(E) i2c_piix4(E) cirrus(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) virtio_net(E) crc32c_intel(E) drm(E) net_failover(E) serio_raw(E) failover(E) virtio_console(E) i2c_core(E)
[ 8993.055478] CR2: ffff9521adaf9796
[ 8993.056067] ---[ end trace db2303a00eddf8b9 ]---