Bug 4010 - [anolis8 4.19 262][x86_64][ecs]执行ltp压力测试大约46h产生vmcore,unable to handle kernel NULL pointer dereference at 000000000000004e
Summary: [anolis8 4.19 262][x86_64][ecs]执行ltp压力测试大约46h产生vmcore,unable to handle kernel...
Status: NEW
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Jacob
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-09 11:20 UTC by zhixin01
Modified: 2023-02-09 11:22 UTC (History)
6 users (show)

See Also:


Attachments
vmcore-dmesg (22.05 MB, text/plain)
2023-02-09 11:20 UTC, zhixin01
Details
kexec-dmesg (96.28 KB, text/plain)
2023-02-09 11:21 UTC, zhixin01
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zhixin01 alibaba_cloud_group 2023-02-09 11:20:48 UTC
Created attachment 626 [details]
vmcore-dmesg

Description of problem:
执行ltp压力测试大约46h产生vmcore,unable to handle kernel NULL pointer dereference at 000000000000004e

vmcore解析如下:
# crash /usr/lib/debug/lib/modules/4.19.91-262.git.4a7c05f4b31f.an8.x86_64/vmlinux vmcore

crash 7.3.1-5.an8
Copyright (C) 2002-2021  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2021  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [832MB]: patching 99595 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/4.19.91-262.git.4a7c05f4b31f.an8.x86_64/vmlinux  [TAINTED]
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Sat Feb  4 15:51:01 CST 2023
      UPTIME: 21 days, 21:36:24
LOAD AVERAGE: 17.15, 31.35, 43.30
       TASKS: 649
    NODENAME: qibo-zx-an86-1
     RELEASE: 4.19.91-262.git.4a7c05f4b31f.an8.x86_64
     VERSION: #1 SMP Tue Jan 10 21:09:58 CST 2023
     MACHINE: x86_64  (2699 Mhz)
      MEMORY: 31.5 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at 000000000000004e"
         PID: 18049
     COMMAND: "netstress"
        TASK: ffff940dbc818000  [THREAD_INFO: ffff940dbc818000]
         CPU: 3
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 18049  TASK: ffff940dbc818000  CPU: 3   COMMAND: "netstress"
 #0 [ffffada407727400] machine_kexec at ffffffffb5064a7a
 #1 [ffffada407727450] __crash_kexec at ffffffffb5149f0a
 #2 [ffffada407727510] panic at ffffffffb50a1325
 #3 [ffffada407727588] oops_end.cold.2 at ffffffffb502ac4f
 #4 [ffffada4077275a8] no_context at ffffffffb5072c7f
 #5 [ffffada4077275f8] __do_page_fault at ffffffffb50734bd
 #6 [ffffada407727660] do_page_fault at ffffffffb50738e2
 #7 [ffffada407727690] async_page_fault at ffffffffb5a011ee
    [exception RIP: ipv6_local_error+48]
    RIP: ffffffffb58982a0  RSP: ffffada407727748  RFLAGS: 00010202
    RAX: 0000000000000002  RBX: ffff940a69e67500  RCX: 0000000000000000
    RDX: ffffada407727780  RSI: 000000000000005a  RDI: ffff940bc02ead00
    RBP: ffffada407727770   R8: 00000000ffffffd8   R9: ffff940a69e67500
    R10: 0000000000000562  R11: 4343434343434343  R12: 0000000000000592
    R13: 0000000000000592  R14: ffff940e35fbc000  R15: ffff940c42437096
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffada407727778] xfrm6_local_error at ffffffffb58a1958
 #9 [ffffada4077277e8] xfrm_local_error at ffffffffb584f501
#10 [ffffada407727800] xfrm6_extract_output at ffffffffb58a1a19
#11 [ffffada407727820] xfrm6_prepare_output at ffffffffb58a18b2
#12 [ffffada407727838] xfrm_output_resume at ffffffffb584f7a1
#13 [ffffada4077278a0] xfrm_output at ffffffffb584fc80
#14 [ffffada4077278c8] xfrm6_output at ffffffffb58a1b8f
#15 [ffffada407727918] udp_tunnel6_xmit_skb at ffffffffc0f6a372 [ip6_udp_tunnel]
#16 [ffffada407727960] geneve_xmit at ffffffffc0e2ac3d [geneve]
#17 [ffffada407727a78] dev_hard_start_xmit at ffffffffb577e516
#18 [ffffada407727ad8] __dev_queue_xmit at ffffffffb577eedc
#19 [ffffada407727b58] ip_finish_output2 at ffffffffb57e289b
#20 [ffffada407727ba0] ip_output at ffffffffb57e4fc1
#21 [ffffada407727bf0] __ip_queue_xmit at ffffffffb57e4a6c
#22 [ffffada407727c48] __tcp_transmit_skb at ffffffffb57ff975
#23 [ffffada407727cb0] tcp_write_xmit at ffffffffb5800d16
#24 [ffffada407727d18] __tcp_push_pending_frames at ffffffffb58019e1
#25 [ffffada407727d28] tcp_sendmsg_locked at ffffffffb57f2027
#26 [ffffada407727dc8] tcp_sendmsg at ffffffffb57f2177
#27 [ffffada407727de8] sock_sendmsg at ffffffffb575d983
#28 [ffffada407727e00] __sys_sendto at ffffffffb575ed3e
#29 [ffffada407727f28] __x64_sys_sendto at ffffffffb575edd4
#30 [ffffada407727f30] do_syscall_64 at ffffffffb50040ff
#31 [ffffada407727f50] entry_SYSCALL_64_after_hwframe at ffffffffb5a0009c
    RIP: 00007f07d9011eb6  RSP: 00007f07d93e4d80  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000003  RCX: 00007f07d9011eb6
    RDX: 0000000000000632  RSI: 00007f07d93e4e20  RDI: 0000000000000003
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000004000  R11: 0000000000000246  R12: 00007f07d93e4e20
    R13: 0000000000000632  R14: 0000000000004000  R15: 0000000000000001
    ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b
crash>


Steps to Reproduce:
1.git clone --branch anck-4.19 https://gitee.com/anolis/ltp.git
  cd ltp
  make autotools
  ./configure
  make
  make install

2.创建ltp.blacklist
# cat ltp.blacklist
min_free_kbytes
oom01
oom02
oom03
oom04
oom05
memcg_stress
#toneagent would be killed due to out of memory
memcg_limit_in_bytes
cpuset_memory_pressure

#controllers
memcg_subgroup_charge
#https://bugs-old.openanolis.cn/view.php?id=19
memcg_max_usage_in_bytes
#https://bugs-old.openanolis.cn/view.php?id=19
memcg_usage_in_bytes
#passed in manual test
cpuset_memory_spread
#cpuhotplug
cpuhotplug04

#syscalls
add_key05
creat09
finit_module02
ioctl_sg01
fanotify09
madvise06
leapsec01
clock_settime03
set_mempolicy03
move_pages12
# trigger a crash on 4.19
# https://bugzilla.openanolis.cn/show_bug.cgi?id=2109
tc01
tpci

3.执行ltp测试脚本开始压力测试 
mkdir -p /tmp/ltp_tmpdir
cd /opt/ltp
grep 'SCENARIO_LISTS="$LTP.*network' runltp && sed -i 's|$LTP.*network|$SCENARIO_LISTS &|g' runltp
        nr_cpu=$(nproc)
        mem_kb=$(grep ^MemTotal /proc/meminfo | awk '{print $2}')
        start_time=$(cat /proc/uptime |awk -F'.' '{print $1}')
        nr_cpu_c=$((nr_cpu / 2))
        [ $nr_cpu_c -eq 0 ] && nr_cpu_c=1
        nr_cpu_m=$((nr_cpu / 4))
        [ $nr_cpu_m -eq 0 ] && nr_cpu_m=1
        logger ./runltp \
                -c $nr_cpu_c \
                -m $nr_cpu_m,1,$(((mem_kb / 2) / nr_cpu_m * 1024)) \
                -D 1,1,0,1 \
                -B ${LTP_DEV_FS:-ext4} \
                -R -p -q \
                -N \
                -t $runtime \
                -d ${LTP_TMPDIR:-/tmp/ltp_tmpdir} \
                -S $ltp_blacklist

Actual results:
执行ltp压力测试大约46h产生vmcore

Expected results:
不产生vmcore

Additional info:
vmcore-dmesg.txt 和kexec-dmesg.log详见附件

测试环境
# uname -r
4.19.91-262.git.4a7c05f4b31f.an8.x86_64

# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.6"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.6"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.6"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

# free -h
              total        used        free      shared  buff/cache   available
Mem:           30Gi        12Gi        16Gi       0.0Ki       1.4Gi        17Gi
Swap:            0B          0B          0B

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Alibaba Cloud
CPU family:          6
Model:               106
Model name:          Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz
BIOS Model name:     pc-i440fx-2.1
Stepping:            6
CPU MHz:             2699.998
BogoMIPS:            5399.99
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           48K
L1i cache:           32K
L2 cache:            1280K
L3 cache:            49152K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm arch_capabilities

# dmidecode -t 0
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: SeaBIOS
        Version: 9e9f1cc
        Release Date: 04/01/2014
        Address: 0xE8000
        Runtime Size: 96 kB
        ROM Size: 64 kB
        Characteristics:
                BIOS characteristics not supported
                Targeted content distribution is supported
        BIOS Revision: 0.0
Comment 1 zhixin01 alibaba_cloud_group 2023-02-09 11:21:39 UTC
Created attachment 627 [details]
kexec-dmesg