Bug 6375 - [5.10] erofs: arm64架构下, 64K页时 failover, domain 测试失败
Summary: [5.10] erofs: arm64架构下, 64K页时 failover, domain 测试失败
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: fs (show other bugs) fs
Version: unspecified
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: Jingbo Xu
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-06 16:12 UTC by 苟浩
Modified: 2023-09-19 14:50 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description 苟浩 uniontech_group 2023-09-06 16:12:25 UTC
Description of problem:

arm64架构下, 64K页时 failover, domain 测试失败

Version-Release number of selected component (if applicable):

openanolis kernel release/release-5.10.134-15.y

How reproducible:
domain-id 测试:

使用https://github.com/lostjeffle/demand-read-cachefilesd/tree/main项目, 
在run.sh脚本里添加domain-id的挂载选项:
mount -t erofs none -o fsid=${_bootstrap} -o device=${_datablob} -o domain_id="test1" ${mntdir2}

然后执行run.sh

./run.sh ../src ../mntdir/ ../fscachedir/ ../mntdir2

输出如下:
mkfs.erofs 1.6
Build completed.
[HEADER] id 0, opcode 0 [OPEN] volume key erofs,test (volume_key_size 11), cookie key test.img (cookie_key_size 8), object id 1, fd 4, flags 0
Writing cmd: copen 0,4096
[HEADER] id 0, opcode 2 [READ] object_id 1, fd 4, src_path test.img, off 0, len 10000
read src image failed, ret 4096, 0 (Success)
^C

内核输出
$ dmesg|tail
[   34.675418] FS-Cache: Loaded
[   34.944592] FS-Cache: Netfs 'erofs' registered for caching
[   34.972669] erofs: (device erofs): mounted with root inode @ nid 128.
[   72.415045] erofs: (device erofs): erofs_fscache_meta_readpage: erofs_fscache_meta_readpage: -105
[   72.415808] erofs: (device erofs): erofs_read_superblock: cannot read erofs superblock
[   72.437722] erofs: (device erofs): erofs_fscache_meta_readpage: erofs_fscache_meta_readpage: -105
[   72.438407] erofs: (device erofs): erofs_read_superblock: cannot read erofs superblock
[   96.285052] CacheFiles: Loaded
[   98.723166] FS-Cache: Cache "test" added (type cachefiles)
[   98.723170] CacheFiles: File cache on dm-0 registered


failover测试:
测试程序: https://github.com/userzj/demand-read-cachefilesd/commits/failover-test

$ cd demand-read-cachefilesd-failover-test/test
$ ./test-private-01.sh

mount: /mnt: can't read superblock on none.
cachefilesd mount failed

$ dmesg|tail
[    6.058068] Adding 4222912k swap on /dev/mapper/uos-swap.  Priority:-2 extents:1 across:4222912k FS
[    6.063951] xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
[    6.077982] systemd-journald[627]: Received client request to flush runtime journal.
[    6.114386] XFS (sda2): Mounting V5 Filesystem
[    6.175611] XFS (sda2): Ending clean mount
[    6.176231] xfs filesystem being mounted at /boot supports timestamps until 2038 (0x7fffffff)
[   30.900020] FS-Cache: Loaded
[   31.171706] FS-Cache: Netfs 'erofs' registered for caching
[   31.172114] erofs: (device erofs): erofs_fscache_meta_readpage: erofs_fscache_meta_readpage: -105
[   31.172847] erofs: (device erofs): erofs_read_superblock: cannot read erofs superblock
Comment 1 苟浩 uniontech_group 2023-09-18 17:33:47 UTC
在龙芯16k页大小下,这2项也会测试失败。
Comment 2 Jingbo Xu alibaba_cloud_group 2023-09-19 14:50:10 UTC
目前 erofs 镜像 size 小于 PAGE_SIZE 的时候,fscache 的实现确实存在问题

1. 首先测试使用的 demand-read-cachefilesd 的实现存在问题,内核在触发按需读的时候,会向 daemon 请求读取 PAGE_SIZE 例如 64K 的数据,daemon 在接收到这个请求的时候,先从 erofs 镜像读取数据,然后将读取的数据写入 cachefiles。daemon 在读取 erofs 镜像的过程中,发现 erofs 镜像的大小(例如只有 4KB)小于内核请求的大小(例如 PAGE_SIZE 64K)的时候,daemon 没有处理这种情况,就会报错 hang 在那边

2. 另外就算 daemon 修复了上述问题,当 erofs 镜像 size 小于 PAGE_SIZE 的时候,fscache 的目前的实现都是 break 的,需要在内核层面进行修复(linux 内核主线也存在该问题)

由于目前 fscache 尚未实际场景使用,而且上述问题只在特定场景存在,由于当前人力原因,建议先标记为已知问题,待后续追踪。