Bug 6267 - [ANCK-5.10-16][Anolis8][x86_64][nightly] xfstests ext4文件系统,执行generic/650用例时测试触发crash,PANIC: "Kernel panic - not syncing: Fatal hardware error!"
Summary: [ANCK-5.10-16][Anolis8][x86_64][nightly] xfstests ext4文件系统,执行generic/650用例时测试...
Status: RESOLVED BYDESIGN
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: x86_64 Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: yunmeng365524
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-29 11:34 UTC by shanxifanshi
Modified: 2023-10-17 16:26 UTC (History)
8 users (show)

See Also:


Attachments
vmcore-dmesg (997.23 KB, text/plain)
2023-08-29 11:36 UTC, shanxifanshi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description shanxifanshi alibaba_cloud_group 2023-08-29 11:34:57 UTC
[缺陷描述]:
xfstests ext4文件系统,执行generic/650用例时测试触发crash,PANIC: "Kernel panic - not syncing: Fatal hardware error!"

vmcore-dmesg日志详见附件

crash解析如下:
# crash /usr/lib/debug/usr/lib/modules/5.10.134-92.git.0c8b383f70.an8.x86_64/vmlinux /var/crash/127.0.0.1-2023-08-29-10\:59\:07/vmcore

crash 8.0.1-2.0.2.an8
Copyright (C) 2002-2022  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
Copyright (C) 2015, 2021  VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

      KERNEL: /usr/lib/debug/usr/lib/modules/5.10.134-92.git.0c8b383f70.an8.x86_64/vmlinux  [TAINTED]
    DUMPFILE: /var/crash/127.0.0.1-2023-08-29-10:59:07/vmcore  [PARTIAL DUMP]
        CPUS: 32 [OFFLINE: 16]
        DATE: Tue Aug 29 10:58:46 CST 2023
      UPTIME: 10:08:57
LOAD AVERAGE: 28.86, 11.27, 4.16
       TASKS: 504
    NODENAME: e18k04633.et15sqa
     RELEASE: 5.10.134-92.git.0c8b383f70.an8.x86_64
     VERSION: #1 SMP Mon Aug 28 11:56:35 UTC 2023
     MACHINE: x86_64  (2494 Mhz)
      MEMORY: 127.9 GB
       PANIC: "Kernel panic - not syncing: Fatal hardware error!"
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffffb1a17940  (1 of 32)  [THREAD_INFO: ffffffffb1a17940]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0      TASK: ffffffffb1a17940  CPU: 0   COMMAND: "swapper/0"
 #0 [fffffe4784621c58] machine_kexec at ffffffffb005e14c
 #1 [fffffe4784621ca0] __crash_kexec at ffffffffb01a25ba
 #2 [fffffe4784621d60] panic at ffffffffb0a43f64
 #3 [fffffe4784621e00] ghes_notify_nmi at ffffffffb068d115
 #4 [fffffe4784621e60] nmi_handle at ffffffffb0026245
 #5 [fffffe4784621ea8] default_do_nmi at ffffffffb0a83679
 #6 [fffffe4784621ec8] exc_nmi at ffffffffb0a83854
 #7 [fffffe4784621ef0] end_repeat_nmi at ffffffffb0c01508
    [exception RIP: acpi_idle_do_entry+111]
    RIP: ffffffffb0a949bf  RSP: ffffffffb1a03e70  RFLAGS: 00000246
    RAX: 0000000000004000  RBX: ffffa0f102364064  RCX: ffffa10ffee00000
    RDX: 0000000000000001  RSI: ffffffffb2203f40  RDI: 0000000000000001
    RBP: 0000000000000001   R8: ffffa0f102364000   R9: 0000000000034800
    R10: 000310c79458088f  R11: ffffa10ffee33de4  R12: ffffffffb2203fe0
    R13: ffffffffb2203f40  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #8 [ffffffffb1a03e70] acpi_idle_do_entry at ffffffffb0a949bf
 #9 [ffffffffb1a03e78] acpi_idle_enter at ffffffffb0680250
#10 [ffffffffb1a03e88] cpuidle_enter_state at ffffffffb084095e
#11 [ffffffffb1a03ec8] cpuidle_enter at ffffffffb0840c99
#12 [ffffffffb1a03ee8] do_idle at ffffffffb0121b15
#13 [ffffffffb1a03f30] cpu_startup_entry at ffffffffb0121cd9
#14 [ffffffffb1a03f40] start_secondary at ffffffffb005350f
#15 [ffffffffb1a03f50] secondary_startup_64_no_verify at ffffffffb0000104
crash> q


[重现概率]
必现

[重现环境]

内核:
# uname -r
5.10.134-92.git.0c8b383f70.an8.x86_64

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
BIOS Vendor ID:      Intel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
BIOS Model name:     Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
Stepping:            1
CPU MHz:             2495.013
CPU max MHz:         2500.0000
CPU min MHz:         1200.0000
BogoMIPS:            4987.95
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            40960K
NUMA node0 CPU(s):   0-31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts

操作系统信息:
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

内存信息:
# free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi       997Mi       123Gi       9.0Mi       857Mi       123Gi
Swap:         2.0Gi          0B       2.0Gi


[重现步骤]:
1. 配置环境
export FSTYP=ext4
export TEST_DEV=/dev/sdb1
export SCRATCH_DEV=/dev/sdb2
export TEST_DIR=/fs/sdb1
export SCRATCH_MNT=/fs/sdb2

2. 下载xfstests测试代码

3. 编译测试源码
git clone --branch anck-5.10 https://gitee.com/anolis/xfstests.git
cd xfstests
export CFLAGS="-fcommon"
make
make install

4.执行测试用例
./check generic/650


[期望结果]:
无新增fail


[实际结果]:
触发crash

[原因分析]:
Comment 1 shanxifanshi alibaba_cloud_group 2023-08-29 11:36:49 UTC
Created attachment 878 [details]
vmcore-dmesg
Comment 2 shanxifanshi alibaba_cloud_group 2023-10-17 14:20:09 UTC
an8 016版本这个问题仍然存在,不过目前来看,貌似只有这台N49机器有问题(目前测试是使用sdb磁盘,更换为sdi磁盘也会触发crash,跟磁盘关系应该不大);另外更换为另一台F51的机器,多次跑generic/650用例均未触发crash,麻烦看下是否需要关注。

# crash /usr/lib/debug/usr/lib/modules/5.10.134-16_rc1.an8.x86_64/vmlinux /var/crash/127.0.0.1-2023-10-17-14\:03\:57/vmcore

crash 8.0.1-2.0.2.an8
Copyright (C) 2002-2022  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
Copyright (C) 2015, 2021  VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

      KERNEL: /usr/lib/debug/usr/lib/modules/5.10.134-16_rc1.an8.x86_64/vmlinux  [TAINTED]
    DUMPFILE: /var/crash/127.0.0.1-2023-10-17-14:03:57/vmcore  [PARTIAL DUMP]
        CPUS: 32 [OFFLINE: 17]
        DATE: Tue Oct 17 14:03:34 CST 2023
      UPTIME: 00:14:17
LOAD AVERAGE: 18.32, 23.19, 14.34
       TASKS: 525
    NODENAME: e18k04633.et15sqa
     RELEASE: 5.10.134-16_rc1.an8.x86_64
     VERSION: #1 SMP Wed Oct 11 03:20:21 CST 2023
     MACHINE: x86_64  (2493 Mhz)
      MEMORY: 127.9 GB
       PANIC: "Kernel panic - not syncing: Fatal hardware error!"
         PID: 9215
     COMMAND: "fsstress"
        TASK: ffffa0d342422980  [THREAD_INFO: ffffa0d342422980]
         CPU: 16
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 9215   TASK: ffffa0d342422980  CPU: 16  COMMAND: "fsstress"
 #0 [fffffe73e002bc58] machine_kexec at ffffffff9605e68c
 #1 [fffffe73e002bca0] __crash_kexec at ffffffff961a3cfa
 #2 [fffffe73e002bd60] panic at ffffffff96a4906d
 #3 [fffffe73e002be00] ghes_notify_nmi at ffffffff96692025
 #4 [fffffe73e002be60] nmi_handle at ffffffff96026785
 #5 [fffffe73e002bea8] default_do_nmi at ffffffff96a88b39
 #6 [fffffe73e002bec8] exc_nmi at ffffffff96a88d14
 #7 [fffffe73e002bef0] end_repeat_nmi at ffffffff96c01508
    [exception RIP: __wake_up_bit+21]
    RIP: ffffffff9613f6f5  RSP: ffffaef3c3077c88  RFLAGS: 00000296
    RAX: ffffffff97a06560  RBX: ffffa0d8b5e4ae08  RCX: ffffa0d8b5e4aea0
    RDX: ffffffff97a06560  RSI: ffffa0d8b5e4aea0  RDI: ffffffff97a06558
    RBP: ffffa0d33a6eb110   R8: ffffa0d3293670d0   R9: ffffa0d2c5ac8800
    R10: ffffaef3c3077c00  R11: fffff50ec6119180  R12: 0000000000000000
    R13: ffffa0d33a6eb110  R14: ffffaef3c3077d08  R15: ffffa0da06d41a28
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #8 [ffffaef3c3077c88] __wake_up_bit at ffffffff9613f6f5
 #9 [ffffaef3c3077ca0] d_instantiate_new at ffffffff963a2097
#10 [ffffaef3c3077cb8] ext4_add_nondir at ffffffff9646b44b
#11 [ffffaef3c3077cf0] ext4_create at ffffffff9646b739
#12 [ffffaef3c3077d40] path_openat at ffffffff963989e1
#13 [ffffaef3c3077dd8] do_filp_open at ffffffff9639a0c1
#14 [ffffaef3c3077ee8] do_sys_openat2 at ffffffff9638405d
#15 [ffffaef3c3077f20] do_sys_open at ffffffff9638548b
#16 [ffffaef3c3077f40] do_syscall_64 at ffffffff96a87453
#17 [ffffaef3c3077f50] entry_SYSCALL_64_after_hwframe at ffffffff96c00099
    RIP: 00007f8566d20388  RSP: 00007ffcb2f83768  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000000006bb  RCX: 00007f8566d20388
    RDX: 0000000000000000  RSI: 00000000000001b6  RDI: 00000000010fccd0
    RBP: 00007ffcb2f838c0   R8: 00007f8566fbfbc0   R9: 0000000000000000
    R10: 00007ffcb2f833f5  R11: 0000000000000246  R12: 00000000000001b6
    R13: 0000000000404d10  R14: 0000000000000000  R15: 00007f8567b976b8
    ORIG_RAX: 0000000000000055  CS: 0033  SS: 002b
Comment 3 yunmeng365524 2023-10-17 16:26:45 UTC
在内部跟踪