[问题简述] anolis8.6 ck-4.19 nightly测试在aarch64环境中,kernel-selftest测试套运行netfilter/nft_trans_stress.sh命令行会不停打印kernel:watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ping:77824] 查看messages日志信息又Call trace产生,但/var/crash无vmcore出现 [问题补充] 1)命令行不停的弹出kernel:watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [ping:77824] 使用pstree命令查看,kerself-selftest测试套中的nft_trans_stress.ng在调用运行 ----------------------------------- ├─toneagent─┬─sh───python───run_test.sh───run_kselftest.s───run_kselftest.s───nft_trans_stres───11*[ip] │ └─32*[{toneagent}] └─tuned───4*[{tuned}] 2)使用top命令查看ping持续高比重的占用cpu资源,大概率会导致环境ssh登不上ping不通 ------------------------------------- Tasks: 24334 total, 256 running, 24075 sleeping, 2 stopped, 1 zombie %Cpu(s): 0.1 us, 34.9 sy, 0.0 ni, 33.1 id, 0.0 wa, 0.0 hi, 31.9 si, 0.0 st top - 16:31:17 up 33 min, 1 user, load average: 268.83, 268.58, 286.49 Tasks: 24334 total, 250 running, 24081 sleeping, 2 stopped, 1 zombie %Cpu(s): 0.1 us, 34.8 sy, 0.0 ni, 33.1 id, 0.0 wa, 0.0 hi, 32.0 si, 0.0 st MiB Mem : 772983.2 total, 758052.9 free, 8076.3 used, 6854.0 buff/cache MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 761269.2 avail Mem 77890 root 20 0 8740 2040 1708 R 100.0 0.0 15:36.11 ping 77770 root 20 0 8740 2040 1708 R 100.0 0.0 15:35.75 ping 77773 root 20 0 8740 2036 1704 R 100.0 0.0 15:35.74 ping 77803 root 20 0 8740 2040 1708 R 100.0 0.0 15:40.00 ping 77809 root 20 0 8740 2040 1708 R 100.0 0.0 15:40.00 ping 77815 root 20 0 8740 2040 1708 R 100.0 0.0 15:40.00 ping 77821 root 20 0 8740 2040 1708 R 100.0 0.0 15:28.67 ping 77836 root 20 0 8740 2040 1708 R 100.0 0.0 15:39.99 ping 77854 root 20 0 8740 2012 1680 R 100.0 0.0 15:28.67 ping 77863 root 20 0 8740 2040 1708 R 100.0 0.0 15:39.99 ping 77872 root 20 0 8740 1868 1536 R 100.0 0.0 15:40.00 ping 77881 root 20 0 8740 2040 1708 R 100.0 0.0 15:35.74 ping 77764 root 20 0 8740 2036 1704 R 99.7 0.0 15:35.74 ping 77767 root 20 0 8740 2032 1700 R 99.7 0.0 15:35.74 ping 77779 root 20 0 8740 2044 1708 R 99.7 0.0 15:28.65 ping 77794 root 20 0 8740 2044 1708 R 99.7 0.0 15:28.66 ping 77797 root 20 0 8740 2040 1708 R 99.7 0.0 15:39.99 ping 77800 root 20 0 8740 2040 1708 R 99.7 0.0 15:40.00 ping 77806 root 20 0 8740 2040 1708 R 99.7 0.0 15:35.74 ping 3)messages有Call trace信息 ------------------------- Dec 29 16:45:23 l57f12084 kernel: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [ping:77821] Dec 29 16:45:23 l57f12084 kernel: Modules linked in: ip_vti(E) esp6(E) xfrm6_mode_tunnel(E) ip6_vti(E) esp4_offload(E) xfrm4_mode_transport(E) macsec(E) vrf(E) 8021q(E) garp(E) mrp(E) cls_u32(E) sch_htb(E) dummy(E) tls(E) authenc(E) echainiv(E) esp4(E) xfrm4_mode_tunnel(E) ipip(E) tunnel4(E) geneve(E) xt_mark(E)vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) ip6_gre(E) ip6_tunnel(E) tunnel6(E) ip_gre(E) ip_tunnel(E) gre(E) cls_bpf(E) sch_ingress(E) veth(E) xt_CHECKSUM(E) ipt_REJECT(E) nf_reject_ipv4(E) nft_chain_route_ipv6(E) nft_chain_nat_ipv6(E) nf_nat_ipv6(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_conntrack_netlink(E) nft_counter(E) nft_chain_route_ipv4(E) xt_addrtype(E) nft_compat(E) br_netfilter(E) nft_chain_nat_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) nf_tables(E) Dec 29 16:45:23 l57f12084 kernel: nfnetlink(E) bridge(E) stp(E) llc(E) binfmt_misc(E) bonding(E) overlay(E) vfat(E) fat(E) ipmi_ssif(E) aes_ce_blk(E) crypto_simd(E) mousedev(E) cryptd(E) aes_ce_cipher(E) crc32_ce(E) crct10dif_ce(E) ghash_ce(E) sha2_ce(E) sha256_arm64(E) sha1_ce(E) hns_roce_hw_v2(E) sbsa_gwdt(E) hns_roce(E) ib_core(E) spi_dw_mmio(E) ipmi_si(E) sch_fq_codel(E) hibmc_drm(E) drm_kms_helper(E) realtek(E) syscopyarea(E) sysfillrect(E) hns3(E) sysimgblt(E)fb_sys_fops(E) nvme(E) ttm(E) hisi_sas_v3_hw(E) hclge(E) mlx5_core(E) nvme_core(E) hisi_sas_main(E) hnae3(E) drm(E) mlxfw(E) libsas(E) devlink(E) scsi_transport_sas(E) i2c_designware_platform(E) i2c_designware_core(E) i2c_core(E) nfit(E) libnvdimm(E) sd_mod(E) sg(E) ahci(E) libahci(E) libata(E) ipmi_devintf(E) ipmi_msghandler(E) fuse(E) [last unloaded: netdevsim] Dec 29 16:45:23 l57f12084 kernel: CPU: 8 PID: 77821 Comm: ping Kdump: loaded Tainted: G W EL 4.19.91-583.git.e680634f5.an8.aarch64 #1 Dec 29 16:45:23 l57f12084 kernel: Hardware name: H3C R4960 G3/BC82AMDDA, BIOS 1.70 01/07/2021 Dec 29 16:45:23 l57f12084 kernel: pstate: 00400009 (nzcv daif +PAN -UAO) Dec 29 16:45:23 l57f12084 kernel: pc : queued_spin_lock_slowpath+0x238/0x2b0 Dec 29 16:45:23 l57f12084 kernel: lr : queued_write_lock_slowpath+0xe4/0xe8 Dec 29 16:45:23 l57f12084 kernel: sp : ffff00006967bd00 Dec 29 16:45:23 l57f12084 kernel: x29: ffff00006967bd00 x28: ffffa09fb6440040 Dec 29 16:45:23 l57f12084 kernel: x27: ffff000008b00ab0 x26: ffffa0a802367300 Dec 29 16:45:23 l57f12084 kernel: x25: ffff00006967be60 x24: ffff0000095ea4f8 Dec 29 16:45:23 l57f12084 kernel: x23: ffff000008930560 x22: 0000000000000001 Dec 29 16:45:23 l57f12084 kernel: x21: 0000000000000002 x20: ffff0000095ea4e8 Dec 29 16:45:23 l57f12084 kernel: x19: ffff0000095ea4e8 x18: 0000000000000000 Dec 29 16:45:23 l57f12084 kernel: x17: 0000000000000000 x16: 0000000000000000 Dec 29 16:45:23 l57f12084 kernel: x15: 0000000000000000 x14: 0000000000000000 Dec 29 16:45:23 l57f12084 kernel: x13: 0000000000000000 x12: 0000000000000000 Dec 29 16:45:23 l57f12084 kernel: x11: 0000000000240000 x10: 00000000ffffffff Dec 29 16:45:23 l57f12084 kernel: x9 : 0000000000000000 x8 : ffffa0acfd377200 Dec 29 16:45:23 l57f12084 kernel: x7 : ffff000009239800 x6 : ffffa0acfd377200 Dec 29 16:45:23 l57f12084 kernel: x5 : ffff000009239748 x4 : ffff000008f12200 Dec 29 16:45:23 l57f12084 kernel: x3 : ffff0000095ea4ec x2 : 0000000000000000 Dec 29 16:45:23 l57f12084 kernel: x1 : 0000000000000000 x0 : ffffa0acfd377208 Dec 29 16:45:23 l57f12084 kernel: Call trace: Dec 29 16:45:23 l57f12084 kernel: queued_spin_lock_slowpath+0x238/0x2b0 Dec 29 16:45:23 l57f12084 kernel: queued_write_lock_slowpath+0xe4/0xe8 Dec 29 16:45:23 l57f12084 kernel: raw_hash_sk+0x70/0x110 Dec 29 16:45:23 l57f12084 kernel: inet_create+0x198/0x370 Dec 29 16:45:23 l57f12084 kernel: __sock_create+0x11c/0x210 Dec 29 16:45:23 l57f12084 kernel: __sys_socket+0x64/0xf8 Dec 29 16:45:23 l57f12084 kernel: __arm64_sys_socket+0x24/0x30 Dec 29 16:45:23 l57f12084 kernel: el0_svc_common.constprop.0+0xa8/0x200 Dec 29 16:45:23 l57f12084 kernel: el0_svc_handler+0x30/0x80 Dec 29 16:45:23 l57f12084 kernel: el0_svc+0x10/0x580 [机器环境] [root@l57f12084 ~]#uname -r 4.19.91-583.git.e680634f5.an8.aarch64 [root@l57f12084 ~]#cat /etc/redhat-release Anolis OS release 8.6 [root@l57f12084 ~]# [root@l57f12084 ~]# lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 1 Vendor ID: HiSilicon BIOS Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 BIOS Model name: HUAWEI Kunpeng 920 5250 Stepping: 0x1 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 24576K NUMA node0 CPU(s): 0-95 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm [root@l57f12084 ~]# [root@l57f12084 ~]#df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 378G 0 378G 0% /dev tmpfs 378G 0 378G 0% /dev/shm tmpfs 378G 51M 378G 1% /run tmpfs 378G 0 378G 0% /sys/fs/cgroup /dev/sda2 49G 18G 30G 37% / /dev/sda1 1022M 6.7M 1016M 1% /boot/efi tmpfs 76G 0 76G 0% /run/user/0 [root@l57f12084 ~]# Message from syslogd@l57f12084 at Dec 29 17:06:48 ... kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [ping:77809] free -g total used free shared buff/cache available Mem: 754 8 739 0 6 742 Swap: 1 0 1
5.10上有个该用例的pr,请开发同学看下anck 4.19 arm上是否是同样的问题 https://gitee.com/anolis/cloud-kernel/pulls/1291