Bug 3584 - [anolis8.6][ck-4.19 nightly][aarch64] kernel-selftests测试套netfilter/nft_trans_stress.sh触发kernel:watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ping:77824]
Summary: [anolis8.6][ck-4.19 nightly][aarch64] kernel-selftests测试套netfilter/nft_trans_...
Status: NEW
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-4.19 (show other bugs) kernel - anck-4.19
Version: 8.6
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: shuancue
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-29 17:13 UTC by anolislw
Modified: 2023-02-28 10:43 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description anolislw alibaba_cloud_group 2022-12-29 17:13:11 UTC
[问题简述]
anolis8.6 ck-4.19 nightly测试在aarch64环境中,kernel-selftest测试套运行netfilter/nft_trans_stress.sh命令行会不停打印kernel:watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ping:77824]
查看messages日志信息又Call trace产生,但/var/crash无vmcore出现

[问题补充]
1)命令行不停的弹出kernel:watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ping:77824]
   使用pstree命令查看,kerself-selftest测试套中的nft_trans_stress.ng在调用运行
   -----------------------------------
  ├─toneagent─┬─sh───python───run_test.sh───run_kselftest.s───run_kselftest.s───nft_trans_stres───11*[ip]
        │           └─32*[{toneagent}]
        └─tuned───4*[{tuned}]
2)使用top命令查看ping持续高比重的占用cpu资源,大概率会导致环境ssh登不上ping不通
  -------------------------------------
  Tasks: 24334 total, 256 running, 24075 sleeping,   2 stopped,   1 zombie
%Cpu(s):  0.1 us, 34.9 sy,  0.0 ni, 33.1 id,  0.0 wa,  0.0 hi, 31.9 si,  0.0 st
top - 16:31:17 up 33 min,  1 user,  load average: 268.83, 268.58, 286.49
Tasks: 24334 total, 250 running, 24081 sleeping,   2 stopped,   1 zombie
%Cpu(s):  0.1 us, 34.8 sy,  0.0 ni, 33.1 id,  0.0 wa,  0.0 hi, 32.0 si,  0.0 st
MiB Mem : 772983.2 total, 758052.9 free,   8076.3 used,   6854.0 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used. 761269.2 avail Mem

  77890 root      20   0    8740   2040   1708 R 100.0   0.0  15:36.11 ping
  77770 root      20   0    8740   2040   1708 R 100.0   0.0  15:35.75 ping
  77773 root      20   0    8740   2036   1704 R 100.0   0.0  15:35.74 ping
  77803 root      20   0    8740   2040   1708 R 100.0   0.0  15:40.00 ping
  77809 root      20   0    8740   2040   1708 R 100.0   0.0  15:40.00 ping
  77815 root      20   0    8740   2040   1708 R 100.0   0.0  15:40.00 ping
  77821 root      20   0    8740   2040   1708 R 100.0   0.0  15:28.67 ping
  77836 root      20   0    8740   2040   1708 R 100.0   0.0  15:39.99 ping
  77854 root      20   0    8740   2012   1680 R 100.0   0.0  15:28.67 ping
  77863 root      20   0    8740   2040   1708 R 100.0   0.0  15:39.99 ping
  77872 root      20   0    8740   1868   1536 R 100.0   0.0  15:40.00 ping
  77881 root      20   0    8740   2040   1708 R 100.0   0.0  15:35.74 ping
  77764 root      20   0    8740   2036   1704 R  99.7   0.0  15:35.74 ping
  77767 root      20   0    8740   2032   1700 R  99.7   0.0  15:35.74 ping
  77779 root      20   0    8740   2044   1708 R  99.7   0.0  15:28.65 ping
  77794 root      20   0    8740   2044   1708 R  99.7   0.0  15:28.66 ping
  77797 root      20   0    8740   2040   1708 R  99.7   0.0  15:39.99 ping
  77800 root      20   0    8740   2040   1708 R  99.7   0.0  15:40.00 ping
  77806 root      20   0    8740   2040   1708 R  99.7   0.0  15:35.74 ping
3)messages有Call trace信息
   -------------------------
Dec 29 16:45:23 l57f12084 kernel: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [ping:77821]
Dec 29 16:45:23 l57f12084 kernel: Modules linked in: ip_vti(E) esp6(E) xfrm6_mode_tunnel(E) ip6_vti(E) esp4_offload(E) xfrm4_mode_transport(E) macsec(E) vrf(E) 8021q(E) garp(E) mrp(E) cls_u32(E) sch_htb(E) dummy(E) tls(E) authenc(E) echainiv(E) esp4(E) xfrm4_mode_tunnel(E) ipip(E) tunnel4(E) geneve(E) xt_mark(E)vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) ip6_gre(E) ip6_tunnel(E) tunnel6(E) ip_gre(E) ip_tunnel(E) gre(E) cls_bpf(E) sch_ingress(E) veth(E) xt_CHECKSUM(E) ipt_REJECT(E) nf_reject_ipv4(E) nft_chain_route_ipv6(E) nft_chain_nat_ipv6(E) nf_nat_ipv6(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_conntrack_netlink(E) nft_counter(E) nft_chain_route_ipv4(E) xt_addrtype(E) nft_compat(E) br_netfilter(E) nft_chain_nat_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nf_tables(E)
Dec 29 16:45:23 l57f12084 kernel: nfnetlink(E) bridge(E) stp(E) llc(E) binfmt_misc(E) bonding(E) overlay(E) vfat(E) fat(E) ipmi_ssif(E) aes_ce_blk(E) crypto_simd(E) mousedev(E) cryptd(E) aes_ce_cipher(E) crc32_ce(E) crct10dif_ce(E) ghash_ce(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) hns_roce_hw_v2(E) sbsa_gwdt(E) hns_roce(E) ib_core(E) spi_dw_mmio(E) ipmi_si(E) sch_fq_codel(E) hibmc_drm(E) drm_kms_helper(E) realtek(E) syscopyarea(E) sysfillrect(E) hns3(E) sysimgblt(E)fb_sys_fops(E) nvme(E) ttm(E) hisi_sas_v3_hw(E) hclge(E) mlx5_core(E) nvme_core(E) hisi_sas_main(E) hnae3(E) drm(E) mlxfw(E) libsas(E) devlink(E) scsi_transport_sas(E) i2c_designware_platform(E) i2c_designware_core(E) i2c_core(E) nfit(E) libnvdimm(E) sd_mod(E) sg(E) ahci(E) libahci(E) libata(E) ipmi_devintf(E) ipmi_msghandler(E) fuse(E) [last unloaded: netdevsim]
Dec 29 16:45:23 l57f12084 kernel: CPU: 8 PID: 77821 Comm: ping Kdump: loaded Tainted: G        W   EL    4.19.91-583.git.e680634f5.an8.aarch64 #1
Dec 29 16:45:23 l57f12084 kernel: Hardware name: H3C R4960 G3/BC82AMDDA, BIOS 1.70 01/07/2021
Dec 29 16:45:23 l57f12084 kernel: pstate: 00400009 (nzcv daif +PAN -UAO)
Dec 29 16:45:23 l57f12084 kernel: pc : queued_spin_lock_slowpath+0x238/0x2b0
Dec 29 16:45:23 l57f12084 kernel: lr : queued_write_lock_slowpath+0xe4/0xe8
Dec 29 16:45:23 l57f12084 kernel: sp : ffff00006967bd00
Dec 29 16:45:23 l57f12084 kernel: x29: ffff00006967bd00 x28: ffffa09fb6440040
Dec 29 16:45:23 l57f12084 kernel: x27: ffff000008b00ab0 x26: ffffa0a802367300
Dec 29 16:45:23 l57f12084 kernel: x25: ffff00006967be60 x24: ffff0000095ea4f8
Dec 29 16:45:23 l57f12084 kernel: x23: ffff000008930560 x22: 0000000000000001
Dec 29 16:45:23 l57f12084 kernel: x21: 0000000000000002 x20: ffff0000095ea4e8
Dec 29 16:45:23 l57f12084 kernel: x19: ffff0000095ea4e8 x18: 0000000000000000
Dec 29 16:45:23 l57f12084 kernel: x17: 0000000000000000 x16: 0000000000000000
Dec 29 16:45:23 l57f12084 kernel: x15: 0000000000000000 x14: 0000000000000000
Dec 29 16:45:23 l57f12084 kernel: x13: 0000000000000000 x12: 0000000000000000
Dec 29 16:45:23 l57f12084 kernel: x11: 0000000000240000 x10: 00000000ffffffff
Dec 29 16:45:23 l57f12084 kernel: x9 : 0000000000000000 x8 : ffffa0acfd377200
Dec 29 16:45:23 l57f12084 kernel: x7 : ffff000009239800 x6 : ffffa0acfd377200
Dec 29 16:45:23 l57f12084 kernel: x5 : ffff000009239748 x4 : ffff000008f12200
Dec 29 16:45:23 l57f12084 kernel: x3 : ffff0000095ea4ec x2 : 0000000000000000
Dec 29 16:45:23 l57f12084 kernel: x1 : 0000000000000000 x0 : ffffa0acfd377208
Dec 29 16:45:23 l57f12084 kernel: Call trace:
Dec 29 16:45:23 l57f12084 kernel: queued_spin_lock_slowpath+0x238/0x2b0
Dec 29 16:45:23 l57f12084 kernel: queued_write_lock_slowpath+0xe4/0xe8
Dec 29 16:45:23 l57f12084 kernel: raw_hash_sk+0x70/0x110
Dec 29 16:45:23 l57f12084 kernel: inet_create+0x198/0x370
Dec 29 16:45:23 l57f12084 kernel: __sock_create+0x11c/0x210
Dec 29 16:45:23 l57f12084 kernel: __sys_socket+0x64/0xf8
Dec 29 16:45:23 l57f12084 kernel: __arm64_sys_socket+0x24/0x30
Dec 29 16:45:23 l57f12084 kernel: el0_svc_common.constprop.0+0xa8/0x200
Dec 29 16:45:23 l57f12084 kernel: el0_svc_handler+0x30/0x80
Dec 29 16:45:23 l57f12084 kernel: el0_svc+0x10/0x580

[机器环境]
[root@l57f12084 ~]#uname -r
4.19.91-583.git.e680634f5.an8.aarch64

[root@l57f12084 ~]#cat /etc/redhat-release
Anolis OS release 8.6
[root@l57f12084 ~]#
[root@l57f12084 ~]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        1
Vendor ID:           HiSilicon
BIOS Vendor ID:      HiSilicon
Model:               0
Model name:          Kunpeng-920
BIOS Model name:     HUAWEI Kunpeng 920 5250
Stepping:            0x1
CPU max MHz:         2600.0000
CPU min MHz:         200.0000
BogoMIPS:            200.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            24576K
NUMA node0 CPU(s):   0-95
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
[root@l57f12084 ~]#
[root@l57f12084 ~]#df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        378G     0  378G   0% /dev
tmpfs           378G     0  378G   0% /dev/shm
tmpfs           378G   51M  378G   1% /run
tmpfs           378G     0  378G   0% /sys/fs/cgroup
/dev/sda2        49G   18G   30G  37% /
/dev/sda1      1022M  6.7M 1016M   1% /boot/efi
tmpfs            76G     0   76G   0% /run/user/0
[root@l57f12084 ~]#
Message from syslogd@l57f12084 at Dec 29 17:06:48 ...
 kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [ping:77809]
free -g
              total        used        free      shared  buff/cache   available
Mem:            754           8         739           0           6         742
Swap:             1           0           1
Comment 2 anolislw alibaba_cloud_group 2023-02-28 10:43:33 UTC
5.10上有个该用例的pr,请开发同学看下anck 4.19 arm上是否是同样的问题 https://gitee.com/anolis/cloud-kernel/pulls/1291