Bug 155 - stress-ng测试系统产生not syncing: softlockup: hung tasks
Summary: stress-ng测试系统产生not syncing: softlockup: hung tasks
Status: RESOLVED FIXED
Alias: None
Product: ANCK 4.19 Dev
Classification: ANCK
Component: general/others (show other bugs) general/others
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: beta
Assignee: Shiloong
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-21 14:55 UTC by fghui_kernel
Modified: 2021-12-21 15:57 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description fghui_kernel 2021-12-21 14:55:32 UTC
crash 7.2.9-1.alios7.alnx
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/4.19.91-013.alpha.ali4000.alios7.aarch64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2021-12-11-05:26:57/vmcore  [PARTIAL DUMP]
        CPUS: 128
        DATE: Sat Dec 11 11:59:32 CST 2021
      UPTIME: 4 days, 20:47:41
LOAD AVERAGE: 112125.87, 112128.56, 112104.60
       TASKS: 131088
    NODENAME: RAMOS-2102312PRNP0LA000005
     RELEASE: 4.19.91-013.alpha.ali4000.alios7.aarch64
     VERSION: #1 SMP Thu Dec 2 20:57:53 CST 2021
     MACHINE: aarch64  (unknown Mhz)
      MEMORY: 512 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 635
     COMMAND: "ksoftirqd/124"
        TASK: ffffa05ffbe78000  [THREAD_INFO: ffffa05ffbe78000]
         CPU: 124
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 635    TASK: ffffa05ffbe78000  CPU: 124  COMMAND: "ksoftirqd/124"
 #0 [ffff0000099f3b20] __crash_kexec at ffff0000081b2db4
 #1 [ffff0000099f3cb0] panic at ffff0000080e4e68
 #2 [ffff0000099f3d90] watchdog_timer_fn at ffff0000081eb33c
 #3 [ffff0000099f3e00] __run_hrtimer at ffff00000818c9e4
 #4 [ffff0000099f3e50] __hrtimer_run_queues at ffff00000818cd84
 #5 [ffff0000099f3eb0] hrtimer_interrupt at ffff00000818dc2c
 #6 [ffff0000099f3f20] arch_timer_handler_phys at ffff00000884fe0c
 #7 [ffff0000099f3f30] handle_percpu_devid_irq at ffff00000816b974
 #8 [ffff0000099f3f70] generic_handle_irq at ffff000008163a80
 #9 [ffff0000099f3f80] __handle_domain_irq at ffff00000816451c
#10 [ffff0000099f3fc0] gic_handle_irq at ffff00000808175c
--- <IRQ stack> ---
#11 [ffff00000d0bbc20] el1_irq at ffff0000080834c8
     PC: ffff0000083476fc  [file_free_rcu+36]
     LR: ffff00000817a9d0  [rcu_do_batch+208]
     SP: ffff00000d0bbc30  PSTATE: a0c00009
    X29: ffff00000d0bbc30  X28: ffff000008f5b000  X27: ffff0000092960c0
    X26: 0000000000000100  X25: ffffa05ffbe78000  X24: ffff0000092d7e88
    X23: ffffa05fffee5338  X22: ffffa05ffbe78000  X21: 7fffffffffffffff
    X20: ffffa05fffee5300  X19: ffff804d0b829f00  X18: 0000000000000000
    X17: 0000000000000000  X16: 0000000000000000  X15: 0000000000000000
    X14: 0000000000000000  X13: 0000000000000000  X12: 000000000000000d
    X11: 0000000000000cab  X10: 00000000000007a4   X9: ffff00000817a9d0
     X8: 0000000000210d00   X7: 0000000000000018   X6: ffffa05ffbe787f8
     X5: ffff0000092ab1b0   X4: ffff7e01172b5a20   X3: 00000000802d0025
     X2: ffff8045cad69860   X1: ffffa041f1a7ca80   X0: 00000000ffffffff
#12 [ffff00000d0bbc30] file_free_rcu at ffff0000083476f8
#13 [ffff00000d0bbc50] rcu_do_batch at ffff00000817a9cc
#14 [ffff00000d0bbcd0] __rcu_process_callbacks at ffff00000817ac94
#15 [ffff00000d0bbd10] rcu_process_callbacks at ffff00000817ad40
#16 [ffff00000d0bbd50] __softirqentry_text_start at ffff000008081c3c
#17 [ffff00000d0bbdf0] run_ksoftirqd at ffff0000080ebf04
#18 [ffff00000d0bbe00] smpboot_thread_fn at ffff0000081125a0
#19 [ffff00000d0bbe60] kthread at ffff00000810d71c

#uname -r
4.19.91-013.alpha.ali4000.alios7.aarch64

#rpm -qa|grep 4.19.91-013.alpha.ali4000.alios7.aarch64
kernel-debuginfo-common-aarch64-4.19.91-013.alpha.ali4000.alios7.aarch64
kernel-4.19.91-013.alpha.ali4000.alios7.aarch64
kernel-headers-4.19.91-013.alpha.ali4000.alios7.aarch64
kernel-debuginfo-4.19.91-013.alpha.ali4000.alios7.aarch64
kernel-devel-4.19.91-013.alpha.ali4000.alios7.aarch64

#cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.91-013.alpha.ali4000.alios7.aarch64 root=UUID=7aef8744-595d-4d3e-ab92-fdb4e12c9e91 ro crashkernel=0M-2G:0M,2G-256G:256M,256G-1024G:320M,1024G-:384M vconsole.font=latarcyrheb-sun16 vconsole.keymap=us biosdevname=0 console=tty0 scsi_mod.scan=sync pci=pcie_bus_perf nohz=off console=ttyS0,115200 virtio_ring.vring_force_dma_api=1 modprobe.blacklist=hns3 nokaslr iommu.passthrough=1

[重现步骤]:
1.挂载数据盘

[ -d /disk1 ] || mkdir /disk1

wipefs -a --force /dev/nvme0n1p1                 # 虚拟机环境更多的是/dev/vdb1

mkfs -t ext4 -q -F /dev/nvme0n1p1

mount -t ext4  /dev/nvme0n1p1 /disk1      

mkdir -p /disk1/tmpdir/stress-ng

2.配置参数设置

echo 1 > /sys/kernel/mm/transparent_hugepage/hugetext_enabled

echo 1  > /proc/sys/kernel/panic

echo 1  > /proc/sys/kernel/hardlockup_panic

echo 1  > /proc/sys/kernel/softlockup_panic

echo 50 > /proc/sys/kernel/watchdog_thresh

echo 1200 > /proc/sys/kernel/hung_task_timeout_secs

echo 0   > /proc/sys/kernel/hung_task_panic
3.下载stress-ng

git clone https://github.com/ColinIanKing/stress-ng.git
cd stress-ng-master
make && make install

nohup stress-ng -a 1 -x softlockup,resources,fifo,set,zlib,wcs,tree,splice,sockfd,sctp,radixsort,pipe,mergesort,key,inotify,heapsort,epoll,dccp,cap,aiol,vforkmany,switch,sock,cyclic -t 168h --metrics --times --verify -v -Y /disk1/tmpdir/stress-ng/stress-statistic-1209.yaml --log-file /disk1/tmpdir/stress-ng/stress-logfile-1209.txt --temp-path /disk1/tmpdir/stress-ng/ &
Comment 1 fghui_kernel 2021-12-21 15:57:53 UTC
commit d5a9a8c3bc8068f2e5dfba30150ac09b596b461a upstream