Bug 2594 - [Anolis8.6][ck-4.19][aarch64]libfuse-test测试套test_ctests.pytest_notify_file_size[True]测试fail
Summary: [Anolis8.6][ck-4.19][aarch64]libfuse-test测试套test_ctests.pytest_notify_file_si...
Status: RESOLVED FIXED
Alias: None
Product: Anolis OS 8
Classification: Anolis OS
Component: kernel - anck-4.19 (show other bugs) kernel - anck-4.19
Version: 8.6
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Jingbo Xu
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-26 17:32 UTC by anolislw
Modified: 2022-11-08 15:36 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description anolislw alibaba_cloud_group 2022-10-26 17:32:42 UTC
[问题简述]
Anolis8.6 aarch64环境,libfuse-test测试,2022/10/26晚发起的nightly测试新增了test_ctests.pytest_notify_file_size[True]测试fail

[复现步骤]
git clone https://github.com/libfuse/libfuse.git
mkdir build
cd build
export PATH=/usr/local/bin/:$PATH
meson .. #yum install -y meson
ninja-build && ninja-build install
modprobe fuse && modprobe cuse # teardown: umount fusectl;modprobe -r cuse;modprobe -r fuse
python3 -m pytest test/

[期望结果]
用例pass

[实际结果]
[root@nu4f13168 build]# python3 -m pytest test/
===================================================== test session starts =====================================================
platform linux -- Python 3.6.8, pytest-3.4.2, py-1.5.3, pluggy-0.6.0 -- /usr/local/bin/python3
cachedir: test/.pytest_cache
rootdir: /root/libfuse/build/test, inifile: pytest.ini
collected 47 items

test/test_ctests.py::test_write_cache[False] PASSED                                                                     [  2%]
test/test_ctests.py::test_write_cache[True] PASSED                                                                      [  4%]
test/test_ctests.py::test_notify1[True-notify_inval_inode] PASSED                                                       [  6%]
test/test_ctests.py::test_notify1[True-invalidate_path] PASSED                                                          [  8%]
test/test_ctests.py::test_notify1[True-notify_store_retrieve] PASSED                                                    [ 10%]
test/test_ctests.py::test_notify1[False-notify_inval_inode] PASSED                                                      [ 12%]
test/test_ctests.py::test_notify1[False-invalidate_path] PASSED                                                         [ 14%]
test/test_ctests.py::test_notify1[False-notify_store_retrieve] PASSED                                                   [ 17%]
test/test_ctests.py::test_notify_file_size[True] FAILED                                                                 [ 19%]
=================================================== short test summary info ===================================================
FAIL test/test_ctests.py::test_notify_file_size[True]

========================================================== FAILURES ===========================================================
_________________________________________________ test_notify_file_size[True] _________________________________________________
Traceback (most recent call last):
  File "/root/libfuse/build/test/test_ctests.py", line 97, in test_notify_file_size
    assert new_size > size
AssertionError: assert 1 > 1
============================================= 1 failed, 8 passed in 16.48 seconds =============================================
[root@nu4f13168 build]# cat /etc/redhat-release
Anolis OS release 8.6
[root@nu4f13168 build]#

[测试环境]
[root@nu4f13168 build]# uname -r
4.19.91-518.git.5b6c906de.an8.aarch64
[root@nu4f13168 build]# cat /etc/redhat-release
Anolis OS release 8.6
[root@nu4f13168 build]# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.6"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.6"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.6"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

[root@nu4f13168 build]# free -g
              total        used        free      shared  buff/cache   available
Mem:            755          13         739           0           2         737
Swap:             1           0           1
[root@nu4f13168 build]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        378G     0  378G   0% /dev
tmpfs           378G  128K  378G   1% /dev/shm
tmpfs           378G  435M  378G   1% /run
tmpfs           378G     0  378G   0% /sys/fs/cgroup
/dev/sda2        49G   38G  8.6G  82% /
/dev/sda1      1022M  6.7M 1016M   1% /boot/efi
tmpfs            76G     0   76G   0% /run/user/0
[root@nu4f13168 build]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        1
Vendor ID:           HiSilicon
BIOS Vendor ID:      HiSilicon
Model:               0
Model name:          Kunpeng-920
BIOS Model name:     HUAWEI Kunpeng 920 5250
Stepping:            0x1
CPU max MHz:         2600.0000
CPU min MHz:         200.0000
BogoMIPS:            200.00
L1d cache:           64K
L1i cache:           64K
L2 cache:            512K
L3 cache:            24576K
NUMA node0 CPU(s):   0-95
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
[root@nu4f13168 build]# python3 -V
Python 3.6.8
[root@nu4f13168 build]#

[出现频率]
使用了两台机器,分别验证了多次,都fail
Comment 1 zhixin01 alibaba_cloud_group 2022-10-27 14:39:38 UTC
anolis8 4.19 x86 nightly从10月25日晚上开始也有同样失败
分析可能是以下PR合入引起的:
pytest_notify_file_size[True]用例执行前需要加载fuse module,而且失败的地方在于前后两次获取文件的size一样导致的

与此相关的合入PR只有一个:
https://gitee.com/anolis/cloud-kernel/pulls/786
[4.19] [Feature] fuse: don't need GETATTR after every READ

请开发同学排查下是否https://gitee.com/anolis/cloud-kernel/pulls/786提交引入的问题
Comment 2 Jingbo Xu alibaba_cloud_group 2022-11-07 13:31:29 UTC
fuse 中会调用到 fuse_invalidate_attr() 函数

static void fuse_invalidate_attr_mask(struct inode *inode, u32 mask)
{
        set_mask_bits(&get_fuse_inode(inode)->inval_mask, 0, mask);
}

/*
 * Mark the attributes as stale, so that at the next call to
 * ->getattr() they will be fetched from userspace
 */
void fuse_invalidate_attr(struct inode *inode)
{
        fuse_invalidate_attr_mask(inode, STATX_BASIC_STATS);
}

在 set_mask_bits() 中打印发现传入的 mask 参数为 0


怀疑是编译器问题,对 fuse_invalidate_attr() 反汇编得到:
Dump of assembler code for function fuse_invalidate_attr:
fs/fuse/dir.c:
110     {
   0xffffffff813eafe0 <+0>:     0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)

111             fuse_invalidate_attr_mask(inode, STATX_BASIC_STATS);
   0xffffffff813eafe5 <+5>:     48 8d 97 88 02 00 00    lea    0x288(%rdi),%rdx

./include/linux/compiler.h:
198             __READ_ONCE_SIZE;
   0xffffffff813eafec <+12>:    8b 8f 88 02 00 00       mov    0x288(%rdi),%ecx

fs/fuse/dir.c:
102             set_mask_bits(&get_fuse_inode(inode)->inval_mask, 0, mask);
   0xffffffff813eaff2 <+18>:    89 c8   mov    %ecx,%eax
   0xffffffff813eaff4 <+20>:    f0 0f b1 0a     lock cmpxchg %ecx,(%rdx)
   0xffffffff813eaff8 <+24>:    39 c1   cmp    %eax,%ecx
   0xffffffff813eaffa <+26>:    75 f0   jne    0xffffffff813eafec <fuse_invalidate_attr+12>
   0xffffffff813eaffc <+28>:    c3      retq


看到这里反汇编确实有问题

$gcc --version
gcc (GCC) 9.2.1 20200522 (Alibaba 9.2.1-3 2.17)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Comment 3 Jingbo Xu alibaba_cloud_group 2022-11-07 13:32:45 UTC
#define STATX_BASIC_STATS       0x000007ffU
Comment 4 Jingbo Xu alibaba_cloud_group 2022-11-07 13:59:11 UTC
测试使用的内核版本是
4.19.91-515.git.d746817431.an8.x86_64 (root@i22e09246.eu95sqa) (gcc version 8.4.1 20200928 (Anolis 8.4.1-1.0.1) (GCC)) #1 SMP Sun Nov 6 13:57:16 UTC 2022

fuse_invalidate_attr() 反汇编得到

(gdb) disassemble fuse_invalidate_attr
Dump of assembler code for function fuse_invalidate_attr:
   0x0000000000004a40 <+0>:	nopl   0x0(%rax,%rax,1)
   0x0000000000004a45 <+5>:	lea    0x288(%rdi),%rdx
   0x0000000000004a4c <+12>:	mov    0x288(%rdi),%ecx
   0x0000000000004a52 <+18>:	mov    %ecx,%eax
   0x0000000000004a54 <+20>:	lock cmpxchg %ecx,(%rdx)
   0x0000000000004a58 <+24>:	cmp    %eax,%ecx
   0x0000000000004a5a <+26>:	jne    0x4a4c <fuse_invalidate_attr+12>
   0x0000000000004a5c <+28>:	retq
End of assembler dump.
Comment 5 Jingbo Xu alibaba_cloud_group 2022-11-07 14:10:49 UTC
主线内核没有这个问题,对应的反汇编是:

$gdb -batch -ex "disassemble/rs fuse_invalidate_attr" -directory=. vmlinux
Dump of assembler code for function fuse_invalidate_attr:
fs/fuse/fuse_i.h:
884		return container_of(inode, struct fuse_inode, inode);
   0xffffffff8144dd50 <+0>:	8b 87 78 02 00 00	mov    0x278(%rdi),%eax
   0xffffffff8144dd56 <+6>:	48 81 c7 78 02 00 00	add    $0x278,%rdi

fs/fuse/dir.c:
130		set_mask_bits(&get_fuse_inode(inode)->inval_mask, 0, mask);
   0xffffffff8144dd5d <+13>:	89 c2	mov    %eax,%edx
   0xffffffff8144dd5f <+15>:	81 ca ff 07 00 00	or     $0x7ff,%edx
   0xffffffff8144dd65 <+21>:	f0 0f b1 17	lock cmpxchg %edx,(%rdi)
   0xffffffff8144dd69 <+25>:	75 f2	jne    0xffffffff8144dd5d <fuse_invalidate_attr+13>
   0xffffffff8144dd6b <+27>:	e9 d0 71 db 00	jmpq   0xffffffff82204f40 <__x86_return_thunk>
End of assembler dump.

供参考
Comment 6 Joseph Qi alibaba_cloud_group 2022-11-07 17:09:00 UTC
Linux 4.19.115
442d7668a54d bitops: protect variables in set_mask_bits() macro
Comment 7 Jingbo Xu alibaba_cloud_group 2022-11-08 15:36:56 UTC
fixed in https://gitee.com/anolis/cloud-kernel/pulls/854