Bug 4836 - [Anck-5.10-16][x86-64/aarch64][内部nightly]xfstests->ext4-2-bigalloc:generic/075用例执行失败,output mismatch
Summary: [Anck-5.10-16][x86-64/aarch64][内部nightly]xfstests->ext4-2-bigalloc:generic/07...
Status: CLOSED WONTFIX
Alias: None
Product: Antest
Classification: Infrastructures
Component: 测试用例 (show other bugs) 测试用例
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: yunmeng365524
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-26 14:12 UTC by yunhe123
Modified: 2023-11-09 15:14 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description yunhe123 alibaba_cloud_group 2023-04-26 14:12:51 UTC
Description of problem:
[Anck 5.10][aarch64][内部nightly]xfstests->ext4-2-bigalloc:generic/075用例执行失败,output mismatch
用例执行日志:
generic/075       [failed, exit status 1]- output mismatch (see /tmp/tone/run/xfstests/results//generic/075.out.bad)
    --- tests/generic/075.out   2023-04-26 10:23:01.689749430 +0800
    +++ /tmp/tone/run/xfstests/results//generic/075.out.bad     2023-04-26 14:01:45.677749430 +0800
    @@ -4,15 +4,5 @@
     -----------------------------------------------
     fsx.0 : -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -d -N numops -S 0 -x
    ------------------------------------------------
    ...
    (Run 'diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad'  to see the entire diff)


# diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad
--- /tmp/tone/run/xfstests/tests/generic/075.out        2023-04-26 10:23:01.689749430 +0800
+++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-04-26 14:01:45.677749430 +0800
@@ -4,15 +4,5 @@
 -----------------------------------------------
 fsx.0 : -d -N numops -S 0
 -----------------------------------------------
-
------------------------------------------------
-fsx.1 : -d -N numops -S 0 -x
------------------------------------------------
-
------------------------------------------------
-fsx.2 : -d -N numops -l filelen -S 0
------------------------------------------------
-
------------------------------------------------
-fsx.3 : -d -N numops -l filelen -S 0 -x
------------------------------------------------
+    fsx (-d -N 1000 -S 0) failed, 0 - compare /tmp/tone/run/xfstests/results//generic/075.0.{good,bad,fsxlog}
+mv: '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' and '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' are the same file



Version-Release number of selected component (if applicable):


How reproducible:
必现

Steps to Reproduce:
disk1=nvme0n1p1
disk2=nvme0n1p2
mkdir -p /fs/$disk1 /fs/$disk2
export TEST_DIR=/fs/$disk1
export TEST_DEV=/dev/$disk1
export SCRATCH_MNT=/fs/$disk2
export SCRATCH_DEV=/dev/$disk2

git clone --branch anck-4.19 https://gitee.com/anolis/xfstests.git
export CFLAGS="-fcommon"
make configure
./configure
make && make install
./check tests/generic/075


Actual results:
用例执行失败

Expected results:
用例执行pass


Additional info:
# uname -r
5.10.134-631.git.df0033244.an8.aarch64
[root@nu4f13165 xfstests]#
[root@nu4f13165 xfstests]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

[root@nu4f13165 xfstests]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        1
Vendor ID:           HiSilicon
BIOS Vendor ID:      HiSilicon
Model:               0
Model name:          Kunpeng-920
BIOS Model name:     HUAWEI Kunpeng 920 5250
Stepping:            0x1
CPU MHz:             2600.000
CPU max MHz:         2600.0000
CPU min MHz:         200.0000
BogoMIPS:            200.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            24576K
NUMA node0 CPU(s):   0-95
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
[root@nu4f13165 xfstests]# free -h
              total        used        free      shared  buff/cache   available
Mem:          753Gi       3.4Gi       746Gi        10Mi       3.7Gi       746Gi
Swap:         2.0Gi          0B       2.0Gi
Comment 1 zhixin01 alibaba_cloud_group 2023-05-16 16:15:08 UTC
anolis8-4.19-x86_64也有同样的失败:
# uname -r
4.19.91-710.git.30c6cdce0a.an8.x86_64

测试日志如下:
FSTYP         -- ext4
PLATFORM      -- Linux/x86_64 i22e11409 4.19.91-710.git.30c6cdce0a.an8.x86_64 #1 SMP Mon May 15 13:58:52 UTC 2023
MKFS_OPTIONS  -- -F /dev/nvme0n1p2
MOUNT_OPTIONS -- -o acl,user_xattr /dev/nvme0n1p2 /fs/nvme0n1p2

generic/075       [failed, exit status 1]- output mismatch (see /tmp/tone/run/xfstests/results//generic/075.out.bad)
    --- tests/generic/075.out   2023-05-16 16:00:09.499949657 +0800
    +++ /tmp/tone/run/xfstests/results//generic/075.out.bad     2023-05-16 16:08:50.843943104 +0800
    @@ -4,15 +4,5 @@
     -----------------------------------------------
     fsx.0 : -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -d -N numops -S 0 -x
    ------------------------------------------------
    ...
    (Run 'diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad'  to see the entire diff)
Ran: generic/075
Failures: generic/075
Failed 1 of 1 tests

[tone]Error: The return code of run() in run.sh is not 0
generic/075: Failed
Test running: Done

# diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad
--- /tmp/tone/run/xfstests/tests/generic/075.out        2023-05-16 16:00:09.499949657 +0800
+++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-05-16 16:08:50.843943104 +0800
@@ -4,15 +4,5 @@
 -----------------------------------------------
 fsx.0 : -d -N numops -S 0
 -----------------------------------------------
-
------------------------------------------------
-fsx.1 : -d -N numops -S 0 -x
------------------------------------------------
-
------------------------------------------------
-fsx.2 : -d -N numops -l filelen -S 0
------------------------------------------------
-
------------------------------------------------
-fsx.3 : -d -N numops -l filelen -S 0 -x
------------------------------------------------
+    fsx (-d -N 1000 -S 0) failed, 0 - compare /tmp/tone/run/xfstests/results//generic/075.0.{good,bad,fsxlog}
+mv: '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' and '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' are the same file
Comment 2 wangpingping alibaba_cloud_group 2023-10-11 15:34:47 UTC
5.10.134-16_rc1.al8内核依旧存在此问题;
失败日志如下:
generic/075 3s ... [failed, exit status 1]- output mismatch (see /tmp/tone/run/xfstests/results//generic/075.ou t.bad)
    --- tests/generic/075.out   2023-10-11 10:34:12.087844522 +0800
    +++ /tmp/tone/run/xfstests/results//generic/075.out.bad     2023-10-11 15:13:17.428501965 +0800
    @@ -4,15 +4,5 @@
     -----------------------------------------------
     fsx.0 : -d -N numops -S 0
     -----------------------------------------------
    -
    ------------------------------------------------
    -fsx.1 : -d -N numops -S 0 -x
    ------------------------------------------------
    ...
Comment 3 yunmeng365524 2023-10-17 22:05:25 UTC
首先,查看失败的日志,可以看到在执行fsx (-d -N 1000 -S 0)时失败了。
cat /var/tmp/tone/run/xfstests/results//generic/075.out.bad
QA output created by 075
brevity is wit...

-----------------------------------------------
fsx.0 : -d -N numops -S 0
-----------------------------------------------
    fsx (-d -N 1000 -S 0) failed, 0 - compare /var/tmp/tone/run/xfstests/results//generic/075.0.{good,bad,fsxlog}
mv: '/var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' and '/var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' are the same file

根据日志的提示信息,查看/var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog日志:
# cat /var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog
2 write	0x696b thru	0x9301	(0x2997 bytes)
copying to largest ever: 0x3ad33
5 copy	from 0x659 to 0x460d, (0x3fb4 bytes) at 0x36d7f
6 trunc	from 0x3ad33 to 0xda5a
7 collapse	from 0x5000 to 0xa000, (0x5000 bytes)
collapse range: 0x5000 to 0xa000
do_collapse_range: fallocate: Invalid argument
LOG DUMP (7 total operations):
1(  1 mod 256): SKIPPED (no operation)
2(  2 mod 256): WRITE    0x696b thru 0x9301	(0x2997 bytes) HOLE
3(  3 mod 256): SKIPPED (no operation)
4(  4 mod 256): SKIPPED (no operation)
5(  5 mod 256): COPY 0x659 thru 0x460c	(0x3fb4 bytes) to 0x36d7f thru 0x3ad32
6(  6 mod 256): TRUNCATE DOWN	from 0x3ad33 to 0xda5a
7(  7 mod 256): COLLAPSE 0x5000 thru 0x9fff	(0x5000 bytes)
Log of operations saved to "/var/tmp/tone/run/xfstests/results//generic/075.0.fsxops"; replay with --replay-ops
Correct content saved for comparison
(maybe hexdump "075.0" vs "/var/tmp/tone/run/xfstests/results//generic/075.0.fsxgood")

可以看出在执行测试用例generic/075时,出现了一个错误。具体来说,错误发生在第7个操作(COLLAPSE)中,错误信息是"fallocate: Invalid argument"。

这里是日志中的关键信息:

第2个操作是一个写操作(WRITE),写入了一段数据从0x696b到0x9301,共计0x2997字节。
第5个操作是一个复制操作(COPY),从0x659到0x460c,共计0x3fb4字节的数据,将其复制到了从0x36d7f到0x3ad32的位置。
第6个操作是一个截断操作(TRUNCATE DOWN),将0x3ad33字节截断为0xda5a字节。
第7个操作是一个合并操作(COLLAPSE),从0x5000到0x9fff的数据尝试合并,共计0x5000字节。
在执行第7个操作时,出现了"do_collapse_range: fallocate: Invalid argument"错误。这表示在尝试执行fallocate操作时,传递了一个无效的参数。
Comment 4 yunmeng365524 2023-10-18 14:42:29 UTC
016版本在内部跟踪
Comment 5 yunhe123 alibaba_cloud_group 2023-11-09 15:14:51 UTC
内部跟踪已有分析结论:在开启bigalloc的情况下, fallocate 操作需要 offset|len 与 cluster_size(16K) 对齐,但目前xfstests本身代码不会主动读取磁盘的cluster_zie信息并以此调整合并区间;且测试本身就会对小块做操作,ext4-2-bigalloc下满足不了条件,所以这个case不适用于在ext4-2-bigalloc文件系统中执行,适配用例,关闭该bug。