Description of problem: [Anck 5.10][aarch64][内部nightly]xfstests->ext4-2-bigalloc:generic/075用例执行失败,output mismatch 用例执行日志: generic/075 [failed, exit status 1]- output mismatch (see /tmp/tone/run/xfstests/results//generic/075.out.bad) --- tests/generic/075.out 2023-04-26 10:23:01.689749430 +0800 +++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-04-26 14:01:45.677749430 +0800 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ ... (Run 'diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad' to see the entire diff) # diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad --- /tmp/tone/run/xfstests/tests/generic/075.out 2023-04-26 10:23:01.689749430 +0800 +++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-04-26 14:01:45.677749430 +0800 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ - ------------------------------------------------ -fsx.2 : -d -N numops -l filelen -S 0 ------------------------------------------------ - ------------------------------------------------ -fsx.3 : -d -N numops -l filelen -S 0 -x ------------------------------------------------ + fsx (-d -N 1000 -S 0) failed, 0 - compare /tmp/tone/run/xfstests/results//generic/075.0.{good,bad,fsxlog} +mv: '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' and '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' are the same file Version-Release number of selected component (if applicable): How reproducible: 必现 Steps to Reproduce: disk1=nvme0n1p1 disk2=nvme0n1p2 mkdir -p /fs/$disk1 /fs/$disk2 export TEST_DIR=/fs/$disk1 export TEST_DEV=/dev/$disk1 export SCRATCH_MNT=/fs/$disk2 export SCRATCH_DEV=/dev/$disk2 git clone --branch anck-4.19 https://gitee.com/anolis/xfstests.git export CFLAGS="-fcommon" make configure ./configure make && make install ./check tests/generic/075 Actual results: 用例执行失败 Expected results: 用例执行pass Additional info: # uname -r 5.10.134-631.git.df0033244.an8.aarch64 [root@nu4f13165 xfstests]# [root@nu4f13165 xfstests]# cat /etc/os-release NAME="Anolis OS" VERSION="8.8" ID="anolis" ID_LIKE="rhel fedora centos" VERSION_ID="8.8" PLATFORM_ID="platform:an8" PRETTY_NAME="Anolis OS 8.8" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" [root@nu4f13165 xfstests]# lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 1 Vendor ID: HiSilicon BIOS Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 BIOS Model name: HUAWEI Kunpeng 920 5250 Stepping: 0x1 CPU MHz: 2600.000 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 24576K NUMA node0 CPU(s): 0-95 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm [root@nu4f13165 xfstests]# free -h total used free shared buff/cache available Mem: 753Gi 3.4Gi 746Gi 10Mi 3.7Gi 746Gi Swap: 2.0Gi 0B 2.0Gi
anolis8-4.19-x86_64也有同样的失败: # uname -r 4.19.91-710.git.30c6cdce0a.an8.x86_64 测试日志如下: FSTYP -- ext4 PLATFORM -- Linux/x86_64 i22e11409 4.19.91-710.git.30c6cdce0a.an8.x86_64 #1 SMP Mon May 15 13:58:52 UTC 2023 MKFS_OPTIONS -- -F /dev/nvme0n1p2 MOUNT_OPTIONS -- -o acl,user_xattr /dev/nvme0n1p2 /fs/nvme0n1p2 generic/075 [failed, exit status 1]- output mismatch (see /tmp/tone/run/xfstests/results//generic/075.out.bad) --- tests/generic/075.out 2023-05-16 16:00:09.499949657 +0800 +++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-05-16 16:08:50.843943104 +0800 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ ... (Run 'diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad' to see the entire diff) Ran: generic/075 Failures: generic/075 Failed 1 of 1 tests [tone]Error: The return code of run() in run.sh is not 0 generic/075: Failed Test running: Done # diff -u /tmp/tone/run/xfstests/tests/generic/075.out /tmp/tone/run/xfstests/results//generic/075.out.bad --- /tmp/tone/run/xfstests/tests/generic/075.out 2023-05-16 16:00:09.499949657 +0800 +++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-05-16 16:08:50.843943104 +0800 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ - ------------------------------------------------ -fsx.2 : -d -N numops -l filelen -S 0 ------------------------------------------------ - ------------------------------------------------ -fsx.3 : -d -N numops -l filelen -S 0 -x ------------------------------------------------ + fsx (-d -N 1000 -S 0) failed, 0 - compare /tmp/tone/run/xfstests/results//generic/075.0.{good,bad,fsxlog} +mv: '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' and '/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' are the same file
5.10.134-16_rc1.al8内核依旧存在此问题; 失败日志如下: generic/075 3s ... [failed, exit status 1]- output mismatch (see /tmp/tone/run/xfstests/results//generic/075.ou t.bad) --- tests/generic/075.out 2023-10-11 10:34:12.087844522 +0800 +++ /tmp/tone/run/xfstests/results//generic/075.out.bad 2023-10-11 15:13:17.428501965 +0800 @@ -4,15 +4,5 @@ ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- - ------------------------------------------------ -fsx.1 : -d -N numops -S 0 -x ------------------------------------------------ ...
首先,查看失败的日志,可以看到在执行fsx (-d -N 1000 -S 0)时失败了。 cat /var/tmp/tone/run/xfstests/results//generic/075.out.bad QA output created by 075 brevity is wit... ----------------------------------------------- fsx.0 : -d -N numops -S 0 ----------------------------------------------- fsx (-d -N 1000 -S 0) failed, 0 - compare /var/tmp/tone/run/xfstests/results//generic/075.0.{good,bad,fsxlog} mv: '/var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' and '/var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog' are the same file 根据日志的提示信息,查看/var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog日志: # cat /var/tmp/tone/run/xfstests/results//generic/075.0.fsxlog 2 write 0x696b thru 0x9301 (0x2997 bytes) copying to largest ever: 0x3ad33 5 copy from 0x659 to 0x460d, (0x3fb4 bytes) at 0x36d7f 6 trunc from 0x3ad33 to 0xda5a 7 collapse from 0x5000 to 0xa000, (0x5000 bytes) collapse range: 0x5000 to 0xa000 do_collapse_range: fallocate: Invalid argument LOG DUMP (7 total operations): 1( 1 mod 256): SKIPPED (no operation) 2( 2 mod 256): WRITE 0x696b thru 0x9301 (0x2997 bytes) HOLE 3( 3 mod 256): SKIPPED (no operation) 4( 4 mod 256): SKIPPED (no operation) 5( 5 mod 256): COPY 0x659 thru 0x460c (0x3fb4 bytes) to 0x36d7f thru 0x3ad32 6( 6 mod 256): TRUNCATE DOWN from 0x3ad33 to 0xda5a 7( 7 mod 256): COLLAPSE 0x5000 thru 0x9fff (0x5000 bytes) Log of operations saved to "/var/tmp/tone/run/xfstests/results//generic/075.0.fsxops"; replay with --replay-ops Correct content saved for comparison (maybe hexdump "075.0" vs "/var/tmp/tone/run/xfstests/results//generic/075.0.fsxgood") 可以看出在执行测试用例generic/075时,出现了一个错误。具体来说,错误发生在第7个操作(COLLAPSE)中,错误信息是"fallocate: Invalid argument"。 这里是日志中的关键信息: 第2个操作是一个写操作(WRITE),写入了一段数据从0x696b到0x9301,共计0x2997字节。 第5个操作是一个复制操作(COPY),从0x659到0x460c,共计0x3fb4字节的数据,将其复制到了从0x36d7f到0x3ad32的位置。 第6个操作是一个截断操作(TRUNCATE DOWN),将0x3ad33字节截断为0xda5a字节。 第7个操作是一个合并操作(COLLAPSE),从0x5000到0x9fff的数据尝试合并,共计0x5000字节。 在执行第7个操作时,出现了"do_collapse_range: fallocate: Invalid argument"错误。这表示在尝试执行fallocate操作时,传递了一个无效的参数。
016版本在内部跟踪
内部跟踪已有分析结论:在开启bigalloc的情况下, fallocate 操作需要 offset|len 与 cluster_size(16K) 对齐,但目前xfstests本身代码不会主动读取磁盘的cluster_zie信息并以此调整合并区间;且测试本身就会对小块做操作,ext4-2-bigalloc下满足不了条件,所以这个case不适用于在ext4-2-bigalloc文件系统中执行,适配用例,关闭该bug。