Description of problem: 在nfs文件系统客户端,将nfs文件系统对应的目录映射进容器中使用,在容器内执行dd命令创建大文件时,容器内很容易发生OOM问题,该问题是从3.10内核升级到5.10内核后测试发现,在最新的kernel开源社区中已经不存在该问题。 Version-Release number of selected component (if applicable): How reproducible: 该问题必现 Steps to Reproduce: 1.在nfs客户端使用挂载nfs文件系统(192.168.122.38为nfs服务端ip) mount -o lookupcache=none -t nfs 192.168.122.38:/NFS_Dir /NFS_Dir/ 2.使用如下命令起容器,并将nfs文件系统目录映射到容器中使用。 docker run -tid -v /NFS_Dir/:/NFS_Dir --memory-swappiness 0 --memory 500M --name test centos:8 3.进入容器中 docker exec -it test sh 4、在容器中的/NFS_Dir目录下创建测试脚本create_file.sh 如下所示: mkdir -p /NFS_Dir/file_test_dir for i in {0..10} do dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file done 5、执行测试脚本,复现出问题。 sh NFS_Dir/create_file.sh Actual results: 容器内在执行dd命令时oom: sh-4.4# sh NFS_Dir/create_file.sh NFS_Dir/create_file.sh: line 3: 24 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 25 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 26 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 27 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 28 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 29 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 30 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 31 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 32 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 33 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file NFS_Dir/create_file.sh: line 3: 34 Killed dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file sh-4.4# Expected results: 在容器内执行测试脚本create_file.sh时,容器内没有OOM问题。 Additional info: 容器内发生OOM问题时,主机上系统日志信息如下: Apr 7 15:19:46 NFS-Client kernel: dd invoked oom-killer: gfp_mask=0x101cca(GFP_HIGHUSER_MOVABLE|__GFP_WRITE), order=0, oom_score_adj=0 Apr 7 15:19:46 NFS-Client kernel: CPU: 1 PID: 2031 Comm: dd Tainted: G E 5.10.134 #1 Apr 7 15:19:46 NFS-Client kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Apr 7 15:19:46 NFS-Client kernel: Call Trace: Apr 7 15:19:46 NFS-Client kernel: dump_stack+0x5c/0x90 Apr 7 15:19:46 NFS-Client kernel: dump_memcg_header+0x12/0x50 Apr 7 15:19:46 NFS-Client kernel: oom_kill_process+0x141/0x180 Apr 7 15:19:46 NFS-Client kernel: out_of_memory+0x121/0x600 Apr 7 15:19:46 NFS-Client kernel: mem_cgroup_out_of_memory+0xda/0xe0 Apr 7 15:19:46 NFS-Client kernel: try_charge+0x779/0x7f0 Apr 7 15:19:46 NFS-Client kernel: mem_cgroup_charge+0x9c/0x400 Apr 7 15:19:46 NFS-Client kernel: __add_to_page_cache_locked+0x3d1/0x4b0 Apr 7 15:19:46 NFS-Client kernel: ? scan_shadow_nodes+0x30/0x30 Apr 7 15:19:46 NFS-Client kernel: add_to_page_cache_lru+0x42/0x150 Apr 7 15:19:46 NFS-Client kernel: pagecache_get_page.part.66+0x123/0x410 Apr 7 15:19:46 NFS-Client kernel: grab_cache_page_write_begin+0x1c/0x40 Apr 7 15:19:46 NFS-Client kernel: nfs_write_begin+0x5a/0x4d0 [nfs] Apr 7 15:19:46 NFS-Client kernel: generic_perform_write+0xcd/0x190 Apr 7 15:19:46 NFS-Client kernel: nfs_file_write+0x129/0x2d0 [nfs] Apr 7 15:19:46 NFS-Client kernel: new_sync_write+0x10b/0x190 Apr 7 15:19:46 NFS-Client kernel: vfs_write+0x18a/0x260 Apr 7 15:19:46 NFS-Client kernel: ksys_write+0x49/0xc0 Apr 7 15:19:46 NFS-Client kernel: do_syscall_64+0x33/0x40 Apr 7 15:19:46 NFS-Client kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6 Apr 7 15:19:46 NFS-Client kernel: RIP: 0033:0x7f6c16814815 Apr 7 15:19:46 NFS-Client kernel: Code: Unable to access opcode bytes at RIP 0x7f6c168147eb. Apr 7 15:19:46 NFS-Client kernel: RSP: 002b:00007ffcfd7a6948 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Apr 7 15:19:46 NFS-Client kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6c16814815 Apr 7 15:19:46 NFS-Client kernel: RDX: 000000000c800000 RSI: 00007f6c08f75000 RDI: 0000000000000001 Apr 7 15:19:46 NFS-Client kernel: RBP: 000000000c800000 R08: 00000000ffffffff R09: 0000000000000000 Apr 7 15:19:46 NFS-Client kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 00007f6c08f75000 Apr 7 15:19:46 NFS-Client kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00007f6c08f75000 Apr 7 15:19:46 NFS-Client kernel: memory: usage 512000kB, limit 512000kB, failcnt 389427 Apr 7 15:19:46 NFS-Client kernel: memory+swap: usage 512000kB, limit 1024000kB, failcnt 0 Apr 7 15:19:46 NFS-Client kernel: kmem: usage 9580kB, limit 9007199254740988kB, failcnt 0 Apr 7 15:19:46 NFS-Client kernel: Memory cgroup stats for /docker/6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138: Apr 7 15:19:46 NFS-Client kernel: anon 210817024#012file 303316992#012kernel_stack 49152#012pagetables 540672#012percpu 0#012sock 0#012shmem 0#012file_mapped 0#012file_dirty 62582784#012file_writeback 135168#012anon_thp 207618048#012file_thp 0#012shmem_thp 0#012inactive_anon 210825216#012active_anon 0#012inactive_file 151748608#012active_file 152059904#012unevictable 0#012slab_reclaimable 8804440#012slab_unreclaimable 131208#012slab 8935648#012workingset_refault_anon 0#012workingset_refault_file 6897#012workingset_activate_anon 0#012workingset_activate_file 165#012workingset_restore_anon 0#012workingset_restore_file 66#012workingset_nodereclaim 0#012pgfault 3036#012pgmajfault 0#012pgrefill 15569388#012pgscan 16919804#012pgsteal 881980#012pgactivate 16030377#012pgdeactivate 15568607#012pglazyfree 0#012pglazyfreed 0#012thp_fault_alloc 99#012thp_collapse_alloc 0 Apr 7 15:19:46 NFS-Client kernel: Tasks state (memory values in pages): Apr 7 15:19:46 NFS-Client kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Apr 7 15:19:46 NFS-Client kernel: [ 1929] 0 1929 3013 104 73728 0 0 bash Apr 7 15:19:46 NFS-Client kernel: [ 2024] 0 2024 3013 108 73728 0 0 sh Apr 7 15:19:46 NFS-Client kernel: [ 2031] 0 2031 56965 51254 503808 0 0 dd Apr 7 15:19:46 NFS-Client kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138,mems_allowed=0,oom_memcg=/docker/6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138,task_memcg=/docker/6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138,task=dd,pid=2031,uid=0
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3015
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3023
(In reply to 小龙 from comment #2) > The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3023 该PR提交的代码错误,已关闭
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3029
(In reply to 小龙 from comment #4) > The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3029 A wrong PR by mistake.
(In reply to 小龙 from comment #1) > The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3015 merged