[问题简述] Anolis8.6 aarch64环境,libfuse-test测试,2022/10/26晚发起的nightly测试新增了test_ctests.pytest_notify_file_size[True]测试fail [复现步骤] git clone https://github.com/libfuse/libfuse.git mkdir build cd build export PATH=/usr/local/bin/:$PATH meson .. #yum install -y meson ninja-build && ninja-build install modprobe fuse && modprobe cuse # teardown: umount fusectl;modprobe -r cuse;modprobe -r fuse python3 -m pytest test/ [期望结果] 用例pass [实际结果] [root@nu4f13168 build]# python3 -m pytest test/ ===================================================== test session starts ===================================================== platform linux -- Python 3.6.8, pytest-3.4.2, py-1.5.3, pluggy-0.6.0 -- /usr/local/bin/python3 cachedir: test/.pytest_cache rootdir: /root/libfuse/build/test, inifile: pytest.ini collected 47 items test/test_ctests.py::test_write_cache[False] PASSED [ 2%] test/test_ctests.py::test_write_cache[True] PASSED [ 4%] test/test_ctests.py::test_notify1[True-notify_inval_inode] PASSED [ 6%] test/test_ctests.py::test_notify1[True-invalidate_path] PASSED [ 8%] test/test_ctests.py::test_notify1[True-notify_store_retrieve] PASSED [ 10%] test/test_ctests.py::test_notify1[False-notify_inval_inode] PASSED [ 12%] test/test_ctests.py::test_notify1[False-invalidate_path] PASSED [ 14%] test/test_ctests.py::test_notify1[False-notify_store_retrieve] PASSED [ 17%] test/test_ctests.py::test_notify_file_size[True] FAILED [ 19%] =================================================== short test summary info =================================================== FAIL test/test_ctests.py::test_notify_file_size[True] ========================================================== FAILURES =========================================================== _________________________________________________ test_notify_file_size[True] _________________________________________________ Traceback (most recent call last): File "/root/libfuse/build/test/test_ctests.py", line 97, in test_notify_file_size assert new_size > size AssertionError: assert 1 > 1 ============================================= 1 failed, 8 passed in 16.48 seconds ============================================= [root@nu4f13168 build]# cat /etc/redhat-release Anolis OS release 8.6 [root@nu4f13168 build]# [测试环境] [root@nu4f13168 build]# uname -r 4.19.91-518.git.5b6c906de.an8.aarch64 [root@nu4f13168 build]# cat /etc/redhat-release Anolis OS release 8.6 [root@nu4f13168 build]# cat /etc/os-release NAME="Anolis OS" VERSION="8.6" ID="anolis" ID_LIKE="rhel fedora centos" VERSION_ID="8.6" PLATFORM_ID="platform:an8" PRETTY_NAME="Anolis OS 8.6" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" [root@nu4f13168 build]# free -g total used free shared buff/cache available Mem: 755 13 739 0 2 737 Swap: 1 0 1 [root@nu4f13168 build]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 378G 0 378G 0% /dev tmpfs 378G 128K 378G 1% /dev/shm tmpfs 378G 435M 378G 1% /run tmpfs 378G 0 378G 0% /sys/fs/cgroup /dev/sda2 49G 38G 8.6G 82% / /dev/sda1 1022M 6.7M 1016M 1% /boot/efi tmpfs 76G 0 76G 0% /run/user/0 [root@nu4f13168 build]# lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 1 Vendor ID: HiSilicon BIOS Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 BIOS Model name: HUAWEI Kunpeng 920 5250 Stepping: 0x1 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 24576K NUMA node0 CPU(s): 0-95 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm [root@nu4f13168 build]# python3 -V Python 3.6.8 [root@nu4f13168 build]# [出现频率] 使用了两台机器,分别验证了多次,都fail
anolis8 4.19 x86 nightly从10月25日晚上开始也有同样失败 分析可能是以下PR合入引起的: pytest_notify_file_size[True]用例执行前需要加载fuse module,而且失败的地方在于前后两次获取文件的size一样导致的 与此相关的合入PR只有一个: https://gitee.com/anolis/cloud-kernel/pulls/786 [4.19] [Feature] fuse: don't need GETATTR after every READ 请开发同学排查下是否https://gitee.com/anolis/cloud-kernel/pulls/786提交引入的问题
fuse 中会调用到 fuse_invalidate_attr() 函数 static void fuse_invalidate_attr_mask(struct inode *inode, u32 mask) { set_mask_bits(&get_fuse_inode(inode)->inval_mask, 0, mask); } /* * Mark the attributes as stale, so that at the next call to * ->getattr() they will be fetched from userspace */ void fuse_invalidate_attr(struct inode *inode) { fuse_invalidate_attr_mask(inode, STATX_BASIC_STATS); } 在 set_mask_bits() 中打印发现传入的 mask 参数为 0 怀疑是编译器问题,对 fuse_invalidate_attr() 反汇编得到: Dump of assembler code for function fuse_invalidate_attr: fs/fuse/dir.c: 110 { 0xffffffff813eafe0 <+0>: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 111 fuse_invalidate_attr_mask(inode, STATX_BASIC_STATS); 0xffffffff813eafe5 <+5>: 48 8d 97 88 02 00 00 lea 0x288(%rdi),%rdx ./include/linux/compiler.h: 198 __READ_ONCE_SIZE; 0xffffffff813eafec <+12>: 8b 8f 88 02 00 00 mov 0x288(%rdi),%ecx fs/fuse/dir.c: 102 set_mask_bits(&get_fuse_inode(inode)->inval_mask, 0, mask); 0xffffffff813eaff2 <+18>: 89 c8 mov %ecx,%eax 0xffffffff813eaff4 <+20>: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 0xffffffff813eaff8 <+24>: 39 c1 cmp %eax,%ecx 0xffffffff813eaffa <+26>: 75 f0 jne 0xffffffff813eafec <fuse_invalidate_attr+12> 0xffffffff813eaffc <+28>: c3 retq 看到这里反汇编确实有问题 $gcc --version gcc (GCC) 9.2.1 20200522 (Alibaba 9.2.1-3 2.17) Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#define STATX_BASIC_STATS 0x000007ffU
测试使用的内核版本是 4.19.91-515.git.d746817431.an8.x86_64 (root@i22e09246.eu95sqa) (gcc version 8.4.1 20200928 (Anolis 8.4.1-1.0.1) (GCC)) #1 SMP Sun Nov 6 13:57:16 UTC 2022 fuse_invalidate_attr() 反汇编得到 (gdb) disassemble fuse_invalidate_attr Dump of assembler code for function fuse_invalidate_attr: 0x0000000000004a40 <+0>: nopl 0x0(%rax,%rax,1) 0x0000000000004a45 <+5>: lea 0x288(%rdi),%rdx 0x0000000000004a4c <+12>: mov 0x288(%rdi),%ecx 0x0000000000004a52 <+18>: mov %ecx,%eax 0x0000000000004a54 <+20>: lock cmpxchg %ecx,(%rdx) 0x0000000000004a58 <+24>: cmp %eax,%ecx 0x0000000000004a5a <+26>: jne 0x4a4c <fuse_invalidate_attr+12> 0x0000000000004a5c <+28>: retq End of assembler dump.
主线内核没有这个问题,对应的反汇编是: $gdb -batch -ex "disassemble/rs fuse_invalidate_attr" -directory=. vmlinux Dump of assembler code for function fuse_invalidate_attr: fs/fuse/fuse_i.h: 884 return container_of(inode, struct fuse_inode, inode); 0xffffffff8144dd50 <+0>: 8b 87 78 02 00 00 mov 0x278(%rdi),%eax 0xffffffff8144dd56 <+6>: 48 81 c7 78 02 00 00 add $0x278,%rdi fs/fuse/dir.c: 130 set_mask_bits(&get_fuse_inode(inode)->inval_mask, 0, mask); 0xffffffff8144dd5d <+13>: 89 c2 mov %eax,%edx 0xffffffff8144dd5f <+15>: 81 ca ff 07 00 00 or $0x7ff,%edx 0xffffffff8144dd65 <+21>: f0 0f b1 17 lock cmpxchg %edx,(%rdi) 0xffffffff8144dd69 <+25>: 75 f2 jne 0xffffffff8144dd5d <fuse_invalidate_attr+13> 0xffffffff8144dd6b <+27>: e9 d0 71 db 00 jmpq 0xffffffff82204f40 <__x86_return_thunk> End of assembler dump. 供参考
Linux 4.19.115 442d7668a54d bitops: protect variables in set_mask_bits() macro
fixed in https://gitee.com/anolis/cloud-kernel/pulls/854