Bug 4052 - [Alinux3-debug-kernel][nightly][x86-64/aarch64]执行ltp:ftrace-stress-test 触发kernel softlockup
Summary: [Alinux3-debug-kernel][nightly][x86-64/aarch64]执行ltp:ftrace-stress-test 触发ker...
Status: NEW
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Jacob
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-14 10:39 UTC by wangpingping
Modified: 2023-02-14 10:39 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description wangpingping alibaba_cloud_group 2023-02-14 10:39:27 UTC
[缺陷描述]:
ltp:ftrace-stress-test 触发kernel softlockup,系统卡主,串口有异常信息打印

串口日志具体如下:
[71643.477048] watchdog: BUG: soft lockup - CPU#0 stuck for 112s! [sh:2296076]
[71647.861429] Modules linked in: isofs tun fuse n_gsm pps_ldisc ppp_synctty ppp_async ppp_generic slcan slip slhc n_hdlc pcrypt crypto_user authenc vmac poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha chacha20poly1305 salsa20_generic sha3_generic dummy veth msdos vfat fat xfs libcrc32c loop sr_mod cdrom sd_mod t10_pi sg tcp_diag udp_diag inet_diag binfmt_misc rfkill mousedev intel_rapl_msr intel_rapl_common isst_if_common nfit crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_net crc32c_intel serio_raw ata_generic net_failover i2c_core failover ata_piix libata [last unloaded: ltp_insmod01]
[71647.897322] irq event stamp: 0
[71647.901215] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[71647.905731] hardirqs last disabled at (0): [<ffffffff861df063>] copy_process+0x1373/0x4a40
[71647.910640] softirqs last  enabled at (0): [<ffffffff861df09f>] copy_process+0x13af/0x4a40
[71647.915612] softirqs last disabled at (0): [<0000000000000000>] 0x0
[71647.920291] CPU: 0 PID: 2296076 Comm: sh Kdump: loaded Tainted: G        W  OE     5.10.134-832.git.058854d6c99d.al8.x86_64+debug #1
[71647.929422] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[71647.951406] Modules linked in:
[71647.970042] RIP: 0010:smp_call_function_many_cond+0x60e/0x920
[71647.978965] Code: 38 d0 7c 08 84 d2 0f 85 29 01 00 00 8b 43 08 a8 01 74 2a 48 89 ca 49 89 ce 48 c1 ea 03 41 83 e6 07 4c 01 fa 41 83 c6 03 f3 90 <0f> b6 02 41 38 c6 7c 04 84 c0 75 59 8b 43 08 a8 01 75 eb e9 3e ff
[71647.990279] RSP: 0018:ffff888126f87b68 EFLAGS: 00000202
[71647.995382] RAX: 0000000000000011 RBX: ffffe8ffff21ab80 RCX: ffffe8ffff21ab88
[71648.034439] RDX: fffff91fffe43571 RSI: 0000000000000000 RDI: ffffffff8906aae8
[71648.034450] RBP: ffffed107b8c0fb1 R08: 0000000000000000 R09: 0000000000000001
[71648.034473] R10: fffffbfff153895c R11: 0000000000000001 R12: ffff8883dc607d80
[71648.034483] R13: ffffed107b8c0fb0 R14: 0000000000000003 R15: dffffc0000000000
[71648.034499] FS:  00007f1b294a2740(0000) GS:ffff8883dc400000(0000) knlGS:0000000000000000
[71648.034513] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71648.034523] CR2: 00007f3df5891420 CR3: 00000001814a4005 CR4: 00000000003706f0
[71648.034532] DR0: 0000000000000001 DR1: 0000000000000000 DR2: 0000000000000000
[71648.034542] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[71648.034551] Call Trace:
[71648.034593]  ? __text_poke+0xba0/0xba0
[71648.034641]  ? __text_poke+0xba0/0xba0
[71648.034684]  on_each_cpu+0x39/0x80
[71648.034714]  text_poke_bp_batch+0x1a6/0x4d0
[71648.034748]  ? jump_label_transform+0x130/0x130
[71648.034770]  ? alternatives_enable_smp+0x70/0x70
[71648.034820]  ? ftrace_graph_caller+0x6b/0xa0
[71648.096260]  text_poke_flush+0x8c/0xc0
[71648.096282]  ? rcu_report_qs_rnp+0x310/0x310
[71648.096302]  text_poke_queue+0x55/0xd0
[71648.096345]  ftrace_replace_code+0x1c1/0x2e0
[71648.096401]  ftrace_modify_all_code+0xc1/0x160
[71648.096431]  ftrace_run_update_code+0x13/0x70
[71648.096448]  ftrace_shutdown.part.0+0x35b/0x6f0
[71648.096474]  ? remove_ftrace_ops.constprop.0+0x139/0x230
[71648.096485]  ? ftrace_epilogue+0x10/0x10
[71648.096500]  ? update_ftrace_function+0xb0/0x210
[71648.096566]  unregister_ftrace_graph+0x53/0x80
[71648.096578]  ftrace_profile_write+0x120/0x140
[71648.096601]  ? ftrace_profile_init+0x340/0x340
[71648.096660]  vfs_write+0x1cd/0x860
[71648.096718]  ksys_write+0xe9/0x1b0
[71648.096742]  ? __ia32_sys_read+0xb0/0xb0
[71648.096774]  ? rcu_read_lock_sched_held+0x12/0x80
[71648.096826]  do_syscall_64+0x30/0x40
[71648.096846]  entry_SYSCALL_64_after_hwframe+0x61/0xc6
[71648.096861] RIP: 0033:0x7f1b295974a7
[71648.096876] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[71648.096887] RSP: 002b:00007ffd02268458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[71648.096909] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1b295974a7
[71648.096919] RDX: 0000000000000002 RSI: 00005560a7503f60 RDI: 0000000000000001
[71648.096928] RBP: 00005560a7503f60 R08: 000000000000000a R09: 00007f1b2962e0c0
[71648.096937] R10: 00007f1b2962dfc0 R11: 0000000000000246 R12: 0000000000000002
[71648.096946] R13: 00007f1b2966b520 R14: 0000000000000002 R15: 00007f1b2966b720
[71665.439688] watchdog: BUG: soft lockup - CPU#2 stuck for 111s! [nscd:660]
[71670.491721]  isofs
[71693.509225] Modules linked in:
[71716.887592]  tun
[71739.733711]  isofs
[71761.631413]  fuse
[71783.242653]  tun
[71783.471485] watchdog: BUG: soft lockup - CPU#0 stuck for 112s! [sh:2296076]
[71783.471496] Modules linked in: isofs tun fuse n_gsm pps_ldisc ppp_synctty ppp_async ppp_generic slcan slip slhc n_hdlc pcrypt crypto_user authenc vmac poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha chacha20poly1305 salsa20_generic sha3_generic dummy veth msdos vfat fat xfs libcrc32c loop sr_mod cdrom sd_mod t10_pi sg tcp_diag udp_diag inet_diag binfmt_misc rfkill mousedev intel_rapl_msr intel_rapl_common isst_if_common nfit crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_net crc32c_intel serio_raw ata_generic net_failover i2c_core failover ata_piix libata [last unloaded: ltp_insmod01]
[71783.472155] irq event stamp: 0
[71783.472169] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[71783.472183] hardirqs last disabled at (0): [<ffffffff861df063>] copy_process+0x1373/0x4a40
[71783.472194] softirqs last  enabled at (0): [<ffffffff861df09f>] copy_process+0x13af/0x4a40
[71783.472205] softirqs last disabled at (0): [<0000000000000000>] 0x0
[71783.472218] CPU: 0 PID: 2296076 Comm: sh Kdump: loaded Tainted: G        W  OEL    5.10.134-832.git.058854d6c99d.al8.x86_64+debug #1
[71783.472227] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014
[71783.472239] RIP: 0010:smp_call_function_many_cond+0x60e/0x920
[71783.472253] Code: 38 d0 7c 08 84 d2 0f 85 29 01 00 00 8b 43 08 a8 01 74 2a 48 89 ca 49 89 ce 48 c1 ea 03 41 83 e6 07 4c 01 fa 41 83 c6 03 f3 90 <0f> b6 02 41 38 c6 7c 04 84 c0 75 59 8b 43 08 a8 01 75 eb e9 3e ff
[71783.472263] RSP: 0018:ffff888126f87b68 EFLAGS: 00000202
[71783.472283] RAX: 0000000000000011 RBX: ffffe8ffff21ab80 RCX: ffffe8ffff21ab88
[71783.472293] RDX: fffff91fffe43571 RSI: 0000000000000000 RDI: ffffffff8906aae8
[71783.472301] RBP: ffffed107b8c0fb1 R08: 0000000000000000 R09: 0000000000000001
[71783.472310] R10: fffffbfff153895c R11: 0000000000000001 R12: ffff8883dc607d80
[71783.472318] R13: ffffed107b8c0fb0 R14: 0000000000000003 R15: dffffc0000000000
[71783.472330] FS:  00007f1b294a2740(0000) GS:ffff8883dc400000(0000) knlGS:0000000000000000
[71783.472342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71783.472350] CR2: 00007f3df5891420 CR3: 00000001814a4005 CR4: 00000000003706f0
[71783.472358] DR0: 0000000000000001 DR1: 0000000000000000 DR2: 0000000000000000

[重现概率]
必现
 
[测试步骤]
git clone http://gitlab.alibaba-inc.com/alikernel/ltp.git --branch  LTP-20200930
cd ltp
make autotools
./configure
make
make install
export PATH=$PATH:/opt/ltp/testcases/bin

./runltp -f tracing -s ftrace-stress-test

[测试环境]
内核:
5.10.134-833.git.058854d6c99d.al8.x86_64+debug

# cat /etc/os-release
NAME="Alibaba Cloud Linux"
VERSION="3 (Soaring Falcon)"
ID="alinux"
ID_LIKE="rhel fedora centos anolis"
VERSION_ID="3"
PLATFORM_ID="platform:al8"
PRETTY_NAME="Alibaba Cloud Linux 3 (Soaring Falcon)"
ANSI_COLOR="0;31"
HOME_URL="https://www.aliyun.com/"

CPU信息:
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           2
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Alibaba Cloud
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
BIOS Model name:     pc-i440fx-2.1
Stepping:            7
CPU MHz:             2499.998
BogoMIPS:            4999.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_vnni

内存:
# free -h
              total        used        free      shared  buff/cache   available
Mem:           12Gi       1.1Gi        10Gi       2.0Mi       987Mi        11Gi
Swap:            0B          0B          0B

[预期结果]:
用例执行成功

[实际结果]:
用例运行过程中,系统发生soft lockup