Created attachment 626 [details] vmcore-dmesg Description of problem: 执行ltp压力测试大约46h产生vmcore,unable to handle kernel NULL pointer dereference at 000000000000004e vmcore解析如下: # crash /usr/lib/debug/lib/modules/4.19.91-262.git.4a7c05f4b31f.an8.x86_64/vmlinux vmcore crash 7.3.1-5.an8 Copyright (C) 2002-2021 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2021 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [832MB]: patching 99595 gdb minimal_symbol values KERNEL: /usr/lib/debug/lib/modules/4.19.91-262.git.4a7c05f4b31f.an8.x86_64/vmlinux [TAINTED] DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 8 DATE: Sat Feb 4 15:51:01 CST 2023 UPTIME: 21 days, 21:36:24 LOAD AVERAGE: 17.15, 31.35, 43.30 TASKS: 649 NODENAME: qibo-zx-an86-1 RELEASE: 4.19.91-262.git.4a7c05f4b31f.an8.x86_64 VERSION: #1 SMP Tue Jan 10 21:09:58 CST 2023 MACHINE: x86_64 (2699 Mhz) MEMORY: 31.5 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at 000000000000004e" PID: 18049 COMMAND: "netstress" TASK: ffff940dbc818000 [THREAD_INFO: ffff940dbc818000] CPU: 3 STATE: TASK_RUNNING (PANIC) crash> bt PID: 18049 TASK: ffff940dbc818000 CPU: 3 COMMAND: "netstress" #0 [ffffada407727400] machine_kexec at ffffffffb5064a7a #1 [ffffada407727450] __crash_kexec at ffffffffb5149f0a #2 [ffffada407727510] panic at ffffffffb50a1325 #3 [ffffada407727588] oops_end.cold.2 at ffffffffb502ac4f #4 [ffffada4077275a8] no_context at ffffffffb5072c7f #5 [ffffada4077275f8] __do_page_fault at ffffffffb50734bd #6 [ffffada407727660] do_page_fault at ffffffffb50738e2 #7 [ffffada407727690] async_page_fault at ffffffffb5a011ee [exception RIP: ipv6_local_error+48] RIP: ffffffffb58982a0 RSP: ffffada407727748 RFLAGS: 00010202 RAX: 0000000000000002 RBX: ffff940a69e67500 RCX: 0000000000000000 RDX: ffffada407727780 RSI: 000000000000005a RDI: ffff940bc02ead00 RBP: ffffada407727770 R8: 00000000ffffffd8 R9: ffff940a69e67500 R10: 0000000000000562 R11: 4343434343434343 R12: 0000000000000592 R13: 0000000000000592 R14: ffff940e35fbc000 R15: ffff940c42437096 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffada407727778] xfrm6_local_error at ffffffffb58a1958 #9 [ffffada4077277e8] xfrm_local_error at ffffffffb584f501 #10 [ffffada407727800] xfrm6_extract_output at ffffffffb58a1a19 #11 [ffffada407727820] xfrm6_prepare_output at ffffffffb58a18b2 #12 [ffffada407727838] xfrm_output_resume at ffffffffb584f7a1 #13 [ffffada4077278a0] xfrm_output at ffffffffb584fc80 #14 [ffffada4077278c8] xfrm6_output at ffffffffb58a1b8f #15 [ffffada407727918] udp_tunnel6_xmit_skb at ffffffffc0f6a372 [ip6_udp_tunnel] #16 [ffffada407727960] geneve_xmit at ffffffffc0e2ac3d [geneve] #17 [ffffada407727a78] dev_hard_start_xmit at ffffffffb577e516 #18 [ffffada407727ad8] __dev_queue_xmit at ffffffffb577eedc #19 [ffffada407727b58] ip_finish_output2 at ffffffffb57e289b #20 [ffffada407727ba0] ip_output at ffffffffb57e4fc1 #21 [ffffada407727bf0] __ip_queue_xmit at ffffffffb57e4a6c #22 [ffffada407727c48] __tcp_transmit_skb at ffffffffb57ff975 #23 [ffffada407727cb0] tcp_write_xmit at ffffffffb5800d16 #24 [ffffada407727d18] __tcp_push_pending_frames at ffffffffb58019e1 #25 [ffffada407727d28] tcp_sendmsg_locked at ffffffffb57f2027 #26 [ffffada407727dc8] tcp_sendmsg at ffffffffb57f2177 #27 [ffffada407727de8] sock_sendmsg at ffffffffb575d983 #28 [ffffada407727e00] __sys_sendto at ffffffffb575ed3e #29 [ffffada407727f28] __x64_sys_sendto at ffffffffb575edd4 #30 [ffffada407727f30] do_syscall_64 at ffffffffb50040ff #31 [ffffada407727f50] entry_SYSCALL_64_after_hwframe at ffffffffb5a0009c RIP: 00007f07d9011eb6 RSP: 00007f07d93e4d80 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f07d9011eb6 RDX: 0000000000000632 RSI: 00007f07d93e4e20 RDI: 0000000000000003 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000004000 R11: 0000000000000246 R12: 00007f07d93e4e20 R13: 0000000000000632 R14: 0000000000004000 R15: 0000000000000001 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b crash> Steps to Reproduce: 1.git clone --branch anck-4.19 https://gitee.com/anolis/ltp.git cd ltp make autotools ./configure make make install 2.创建ltp.blacklist # cat ltp.blacklist min_free_kbytes oom01 oom02 oom03 oom04 oom05 memcg_stress #toneagent would be killed due to out of memory memcg_limit_in_bytes cpuset_memory_pressure #controllers memcg_subgroup_charge #https://bugs-old.openanolis.cn/view.php?id=19 memcg_max_usage_in_bytes #https://bugs-old.openanolis.cn/view.php?id=19 memcg_usage_in_bytes #passed in manual test cpuset_memory_spread #cpuhotplug cpuhotplug04 #syscalls add_key05 creat09 finit_module02 ioctl_sg01 fanotify09 madvise06 leapsec01 clock_settime03 set_mempolicy03 move_pages12 # trigger a crash on 4.19 # https://bugzilla.openanolis.cn/show_bug.cgi?id=2109 tc01 tpci 3.执行ltp测试脚本开始压力测试 mkdir -p /tmp/ltp_tmpdir cd /opt/ltp grep 'SCENARIO_LISTS="$LTP.*network' runltp && sed -i 's|$LTP.*network|$SCENARIO_LISTS &|g' runltp nr_cpu=$(nproc) mem_kb=$(grep ^MemTotal /proc/meminfo | awk '{print $2}') start_time=$(cat /proc/uptime |awk -F'.' '{print $1}') nr_cpu_c=$((nr_cpu / 2)) [ $nr_cpu_c -eq 0 ] && nr_cpu_c=1 nr_cpu_m=$((nr_cpu / 4)) [ $nr_cpu_m -eq 0 ] && nr_cpu_m=1 logger ./runltp \ -c $nr_cpu_c \ -m $nr_cpu_m,1,$(((mem_kb / 2) / nr_cpu_m * 1024)) \ -D 1,1,0,1 \ -B ${LTP_DEV_FS:-ext4} \ -R -p -q \ -N \ -t $runtime \ -d ${LTP_TMPDIR:-/tmp/ltp_tmpdir} \ -S $ltp_blacklist Actual results: 执行ltp压力测试大约46h产生vmcore Expected results: 不产生vmcore Additional info: vmcore-dmesg.txt 和kexec-dmesg.log详见附件 测试环境 # uname -r 4.19.91-262.git.4a7c05f4b31f.an8.x86_64 # cat /etc/os-release NAME="Anolis OS" VERSION="8.6" ID="anolis" ID_LIKE="rhel fedora centos" VERSION_ID="8.6" PLATFORM_ID="platform:an8" PRETTY_NAME="Anolis OS 8.6" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" # free -h total used free shared buff/cache available Mem: 30Gi 12Gi 16Gi 0.0Ki 1.4Gi 17Gi Swap: 0B 0B 0B # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Alibaba Cloud CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz BIOS Model name: pc-i440fx-2.1 Stepping: 6 CPU MHz: 2699.998 BogoMIPS: 5399.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 48K L1i cache: 32K L2 cache: 1280K L3 cache: 49152K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities # dmidecode -t 0 # dmidecode 3.3 Getting SMBIOS data from sysfs. SMBIOS 2.8 present. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: SeaBIOS Version: 9e9f1cc Release Date: 04/01/2014 Address: 0xE8000 Runtime Size: 96 kB ROM Size: 64 kB Characteristics: BIOS characteristics not supported Targeted content distribution is supported BIOS Revision: 0.0
Created attachment 627 [details] kexec-dmesg