Description of problem: If I use pwrite to write 4 MB data by O_DIRECT on the NFS client, the rsize and wsize parameters of the NFS client are 1 MB, so the 4 MB data will be split four times when the NFS client initiate rpc write request. If the NFS client keeps calling pwrite to write data of 4 MB, and the NFS server deletes the IP address and executes the systemctl stop nfs-server command, the actual data size (1 MB, 2 MB, 3 MB) returned to pwrite is not equal to the specified 4 MB of data, but the pwrite return value is 0 and no error code is returned. This issue occurs in the 5.10 kernel and persists in the 5.10.214 kernel version. The analysis found that the community was resolved in the 6.6.2 kernel version, but the 6.6.2 kernel version was too high to be incorporated into the 5.10 kernel version. How reproducible: First, the NFS client executes the test program in step 1, and then the NFS server executes the script in step 2 Steps to Reproduce: 1.the NFS client executes the test program 1.1 the following is a pwrite test program,the program file is compiled into an executable pwrite program using gcc. #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> #include <string.h> #include <errno.h> #define DATA_SIZE (4 * 1024 * 1024) // 4M int main() { // 打开或创建一个文件 int fd = open("/mnt/nfs/output.txt", O_WRONLY | O_CREAT | O_TRUNC | 0x004000, 0644); if (fd == -1) { perror("open"); return 1; } // 准备数据 char *data = (char *)malloc(DATA_SIZE); char fill_chars[] = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'}; // 用于填充的字符数组 int num_fill_chars = sizeof(fill_chars) / sizeof(fill_chars[0]); for (int i = 0; i < 100; i++) { for (int j = 0; j < 4; j++) { memset(data + j * 1024 * 1024, fill_chars[i % num_fill_chars], 1024 * 1024); // 每1M填充不同字符 } off_t offset = i * DATA_SIZE; // 计算当前数据的偏移量 ssize_t num_bytes_written = pwrite(fd, data, DATA_SIZE, offset); if (num_bytes_written != DATA_SIZE) { fprintf(stderr, "Error: errno=%d %s, Data size written=%d does not match expected=%d size.\n",errno,strerror(errno),num_bytes_written,DATA_SIZE); close(fd); free(data); return 1; } } // 关闭文件 close(fd); free(data); printf("Data written successfully using pwrite.\n"); return 0; } 1.2 the script keeps calling the executable pwrite program in step 1.1. while true do echo `date` >> pwrite_test.log ./pwrite_test >> pwrite_test.log 2>&1 result=$? if [ "$result" == "0" ];then continue; else break; fi done 2.the NFS server executes scripts the following script is a script that is executed on the NFS server: while true do ip addr delete 192.168.122.170/24 dev ens3 && systemctl stop nfs-server; sleep 3; ip addr add 192.168.122.170/24 dev ens3 && systemctl start nfs-server; sleep 3; done Actual results: the actual data size (1 MB, 2 MB, 3 MB) returned to pwrite is not equal to the specified 4 MB of data. the results of the test program executed by the NFS client: Error: errno=0 Success, Data size written=3145728 does not match expected=4194304 size. Error: errno=0 Success, Data size written=1048576 does not match expected=4194304 size. Error: errno=0 Success, Data size written=2097152 does not match expected=4194304 size. Expected results: the actual data size returned to pwrite is equal to the specified 4 MB of data Additional info: 1、patch that introduced the problem: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=031d73ed768a40684f3ca21992265ffdb6a270bf 2、patch that resolved the problem: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8982f7aff39fb526aba4441fff2525fcedd5e1a3 the associated patch that was submitted with the patch: https://patchwork.kernel.org/project/linux-nfs/patch/20230904163441.11950-4-trondmy@kernel.org/
In the test environment, the following modifications can be verified to solve the problem from the theoretical and practical tests: diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index c220810c6..67d01dde7 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -550,11 +550,8 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); nfs_direct_join_group(&reqs, dreq->inode); + pr_warn_ratelimited("fs/nfs/direct.c: nfs_direct_write_reschedule dreq->max_count=%d\n", dreq->max_count); - dreq->count = 0; - dreq->max_count = 0; - list_for_each_entry(req, &reqs, wb_list) - dreq->max_count += req->wb_bytes; nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); get_dreq(dreq); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index a9d31f9bf..224568f65 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1043,6 +1043,7 @@ nfs_scan_commit_list(struct list_head *src, struct list_head *dst, break; cond_resched(); } + pr_warn_ratelimited("fs/nfs/write.c: nfs_scan_commit_list ret=%d \n", ret); return ret; } the message log is as follows: [root@nfs-client-122-175 ~]# grep -nr nfs_scan_commit_list /var/log/messages | grep "Apr 8 17:17:50" 110511:Apr 8 17:17:50 nfs-client-122-175 kernel: nfs_scan_commit_list: 25 callbacks suppressed 110512:Apr 8 17:17:50 nfs-client-122-175 kernel: fs/nfs/write.c: nfs_scan_commit_list ret=1028 110514:Apr 8 17:17:50 nfs-client-122-175 kernel: fs/nfs/write.c: nfs_scan_commit_list ret=0 110515:Apr 8 17:17:50 nfs-client-122-175 kernel: fs/nfs/write.c: nfs_scan_commit_list ret=1028 110516:Apr 8 17:17:50 nfs-client-122-175 kernel: fs/nfs/write.c: nfs_scan_commit_list ret=0 110517:Apr 8 17:17:50 nfs-client-122-175 kernel: fs/nfs/write.c: nfs_scan_commit_list ret=1028 [root@nfs-client-122-175 ~]# [root@nfs-client-122-175 ~]# grep -nr nfs_direct_write_reschedule /var/log/messages | grep "Apr 8 17:17:50" 110513:Apr 8 17:17:50 nfs-client-122-175 kernel: fs/nfs/direct.c: nfs_direct_write_reschedule dreq->max_count=4194304
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3037
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3038
(In reply to 小龙 from comment #2) > The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3037 obsolete
(In reply to 小龙 from comment #3) > The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3038 merge
(In reply to josephqi from comment #5) > (In reply to 小龙 from comment #3) > > The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3038 > > merge It will fail some xfstests cases, so revert it first.
Reopen this bug.