[缺陷描述]: xfstests ext4文件系统,执行generic/650用例时测试触发crash,PANIC: "Kernel panic - not syncing: Fatal hardware error!" vmcore-dmesg日志详见附件 crash解析如下: # crash /usr/lib/debug/usr/lib/modules/5.10.134-92.git.0c8b383f70.an8.x86_64/vmlinux /var/crash/127.0.0.1-2023-08-29-10\:59\:07/vmcore crash 8.0.1-2.0.2.an8 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... KERNEL: /usr/lib/debug/usr/lib/modules/5.10.134-92.git.0c8b383f70.an8.x86_64/vmlinux [TAINTED] DUMPFILE: /var/crash/127.0.0.1-2023-08-29-10:59:07/vmcore [PARTIAL DUMP] CPUS: 32 [OFFLINE: 16] DATE: Tue Aug 29 10:58:46 CST 2023 UPTIME: 10:08:57 LOAD AVERAGE: 28.86, 11.27, 4.16 TASKS: 504 NODENAME: e18k04633.et15sqa RELEASE: 5.10.134-92.git.0c8b383f70.an8.x86_64 VERSION: #1 SMP Mon Aug 28 11:56:35 UTC 2023 MACHINE: x86_64 (2494 Mhz) MEMORY: 127.9 GB PANIC: "Kernel panic - not syncing: Fatal hardware error!" PID: 0 COMMAND: "swapper/0" TASK: ffffffffb1a17940 (1 of 32) [THREAD_INFO: ffffffffb1a17940] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 0 TASK: ffffffffb1a17940 CPU: 0 COMMAND: "swapper/0" #0 [fffffe4784621c58] machine_kexec at ffffffffb005e14c #1 [fffffe4784621ca0] __crash_kexec at ffffffffb01a25ba #2 [fffffe4784621d60] panic at ffffffffb0a43f64 #3 [fffffe4784621e00] ghes_notify_nmi at ffffffffb068d115 #4 [fffffe4784621e60] nmi_handle at ffffffffb0026245 #5 [fffffe4784621ea8] default_do_nmi at ffffffffb0a83679 #6 [fffffe4784621ec8] exc_nmi at ffffffffb0a83854 #7 [fffffe4784621ef0] end_repeat_nmi at ffffffffb0c01508 [exception RIP: acpi_idle_do_entry+111] RIP: ffffffffb0a949bf RSP: ffffffffb1a03e70 RFLAGS: 00000246 RAX: 0000000000004000 RBX: ffffa0f102364064 RCX: ffffa10ffee00000 RDX: 0000000000000001 RSI: ffffffffb2203f40 RDI: 0000000000000001 RBP: 0000000000000001 R8: ffffa0f102364000 R9: 0000000000034800 R10: 000310c79458088f R11: ffffa10ffee33de4 R12: ffffffffb2203fe0 R13: ffffffffb2203f40 R14: 0000000000000001 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #8 [ffffffffb1a03e70] acpi_idle_do_entry at ffffffffb0a949bf #9 [ffffffffb1a03e78] acpi_idle_enter at ffffffffb0680250 #10 [ffffffffb1a03e88] cpuidle_enter_state at ffffffffb084095e #11 [ffffffffb1a03ec8] cpuidle_enter at ffffffffb0840c99 #12 [ffffffffb1a03ee8] do_idle at ffffffffb0121b15 #13 [ffffffffb1a03f30] cpu_startup_entry at ffffffffb0121cd9 #14 [ffffffffb1a03f40] start_secondary at ffffffffb005350f #15 [ffffffffb1a03f50] secondary_startup_64_no_verify at ffffffffb0000104 crash> q [重现概率] 必现 [重现环境] 内核: # uname -r 5.10.134-92.git.0c8b383f70.an8.x86_64 # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Intel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz BIOS Model name: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz Stepping: 1 CPU MHz: 2495.013 CPU max MHz: 2500.0000 CPU min MHz: 1200.0000 BogoMIPS: 4987.95 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 40960K NUMA node0 CPU(s): 0-31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts 操作系统信息: # cat /etc/os-release NAME="Anolis OS" VERSION="8.8" ID="anolis" ID_LIKE="rhel fedora centos" VERSION_ID="8.8" PLATFORM_ID="platform:an8" PRETTY_NAME="Anolis OS 8.8" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" 内存信息: # free -h total used free shared buff/cache available Mem: 125Gi 997Mi 123Gi 9.0Mi 857Mi 123Gi Swap: 2.0Gi 0B 2.0Gi [重现步骤]: 1. 配置环境 export FSTYP=ext4 export TEST_DEV=/dev/sdb1 export SCRATCH_DEV=/dev/sdb2 export TEST_DIR=/fs/sdb1 export SCRATCH_MNT=/fs/sdb2 2. 下载xfstests测试代码 3. 编译测试源码 git clone --branch anck-5.10 https://gitee.com/anolis/xfstests.git cd xfstests export CFLAGS="-fcommon" make make install 4.执行测试用例 ./check generic/650 [期望结果]: 无新增fail [实际结果]: 触发crash [原因分析]:
Created attachment 878 [details] vmcore-dmesg
an8 016版本这个问题仍然存在,不过目前来看,貌似只有这台N49机器有问题(目前测试是使用sdb磁盘,更换为sdi磁盘也会触发crash,跟磁盘关系应该不大);另外更换为另一台F51的机器,多次跑generic/650用例均未触发crash,麻烦看下是否需要关注。 # crash /usr/lib/debug/usr/lib/modules/5.10.134-16_rc1.an8.x86_64/vmlinux /var/crash/127.0.0.1-2023-10-17-14\:03\:57/vmcore crash 8.0.1-2.0.2.an8 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... KERNEL: /usr/lib/debug/usr/lib/modules/5.10.134-16_rc1.an8.x86_64/vmlinux [TAINTED] DUMPFILE: /var/crash/127.0.0.1-2023-10-17-14:03:57/vmcore [PARTIAL DUMP] CPUS: 32 [OFFLINE: 17] DATE: Tue Oct 17 14:03:34 CST 2023 UPTIME: 00:14:17 LOAD AVERAGE: 18.32, 23.19, 14.34 TASKS: 525 NODENAME: e18k04633.et15sqa RELEASE: 5.10.134-16_rc1.an8.x86_64 VERSION: #1 SMP Wed Oct 11 03:20:21 CST 2023 MACHINE: x86_64 (2493 Mhz) MEMORY: 127.9 GB PANIC: "Kernel panic - not syncing: Fatal hardware error!" PID: 9215 COMMAND: "fsstress" TASK: ffffa0d342422980 [THREAD_INFO: ffffa0d342422980] CPU: 16 STATE: TASK_RUNNING (PANIC) crash> bt PID: 9215 TASK: ffffa0d342422980 CPU: 16 COMMAND: "fsstress" #0 [fffffe73e002bc58] machine_kexec at ffffffff9605e68c #1 [fffffe73e002bca0] __crash_kexec at ffffffff961a3cfa #2 [fffffe73e002bd60] panic at ffffffff96a4906d #3 [fffffe73e002be00] ghes_notify_nmi at ffffffff96692025 #4 [fffffe73e002be60] nmi_handle at ffffffff96026785 #5 [fffffe73e002bea8] default_do_nmi at ffffffff96a88b39 #6 [fffffe73e002bec8] exc_nmi at ffffffff96a88d14 #7 [fffffe73e002bef0] end_repeat_nmi at ffffffff96c01508 [exception RIP: __wake_up_bit+21] RIP: ffffffff9613f6f5 RSP: ffffaef3c3077c88 RFLAGS: 00000296 RAX: ffffffff97a06560 RBX: ffffa0d8b5e4ae08 RCX: ffffa0d8b5e4aea0 RDX: ffffffff97a06560 RSI: ffffa0d8b5e4aea0 RDI: ffffffff97a06558 RBP: ffffa0d33a6eb110 R8: ffffa0d3293670d0 R9: ffffa0d2c5ac8800 R10: ffffaef3c3077c00 R11: fffff50ec6119180 R12: 0000000000000000 R13: ffffa0d33a6eb110 R14: ffffaef3c3077d08 R15: ffffa0da06d41a28 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #8 [ffffaef3c3077c88] __wake_up_bit at ffffffff9613f6f5 #9 [ffffaef3c3077ca0] d_instantiate_new at ffffffff963a2097 #10 [ffffaef3c3077cb8] ext4_add_nondir at ffffffff9646b44b #11 [ffffaef3c3077cf0] ext4_create at ffffffff9646b739 #12 [ffffaef3c3077d40] path_openat at ffffffff963989e1 #13 [ffffaef3c3077dd8] do_filp_open at ffffffff9639a0c1 #14 [ffffaef3c3077ee8] do_sys_openat2 at ffffffff9638405d #15 [ffffaef3c3077f20] do_sys_open at ffffffff9638548b #16 [ffffaef3c3077f40] do_syscall_64 at ffffffff96a87453 #17 [ffffaef3c3077f50] entry_SYSCALL_64_after_hwframe at ffffffff96c00099 RIP: 00007f8566d20388 RSP: 00007ffcb2f83768 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00000000000006bb RCX: 00007f8566d20388 RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 00000000010fccd0 RBP: 00007ffcb2f838c0 R8: 00007f8566fbfbc0 R9: 0000000000000000 R10: 00007ffcb2f833f5 R11: 0000000000000246 R12: 00000000000001b6 R13: 0000000000404d10 R14: 0000000000000000 R15: 00007f8567b976b8 ORIG_RAX: 0000000000000055 CS: 0033 SS: 002b
在内部跟踪