Bug 8708 - [ANCK 5.10] nfs文件系统映射到容器中使用,容器内写大文件到该目录下时容易触发OOM问题
Summary: [ANCK 5.10] nfs文件系统映射到容器中使用,容器内写大文件到该目录下时容易触发OOM问题
Status: RESOLVED FIXED
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: fs (show other bugs) fs
Version: 5.10.y-13
Hardware: All Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: Jie_1005
QA Contact: shuming
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-07 15:24 UTC by Jie_1005
Modified: 2024-04-08 17:59 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jie_1005 2024-04-07 15:24:45 UTC
Description of problem:
    在nfs文件系统客户端,将nfs文件系统对应的目录映射进容器中使用,在容器内执行dd命令创建大文件时,容器内很容易发生OOM问题,该问题是从3.10内核升级到5.10内核后测试发现,在最新的kernel开源社区中已经不存在该问题。

Version-Release number of selected component (if applicable):


How reproducible:
该问题必现

Steps to Reproduce:
1.在nfs客户端使用挂载nfs文件系统(192.168.122.38为nfs服务端ip)
   mount -o lookupcache=none -t nfs 192.168.122.38:/NFS_Dir /NFS_Dir/
2.使用如下命令起容器,并将nfs文件系统目录映射到容器中使用。
   docker run -tid -v /NFS_Dir/:/NFS_Dir --memory-swappiness 0 --memory 500M --name test centos:8

3.进入容器中
  docker exec -it test sh

4、在容器中的/NFS_Dir目录下创建测试脚本create_file.sh 如下所示:
mkdir -p /NFS_Dir/file_test_dir
for i in {0..10}
do 
        dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
done

5、执行测试脚本,复现出问题。
sh NFS_Dir/create_file.sh 

Actual results:
容器内在执行dd命令时oom:
sh-4.4# sh NFS_Dir/create_file.sh 
NFS_Dir/create_file.sh: line 3:    24 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    25 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    26 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    27 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    28 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    29 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    30 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    31 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    32 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    33 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
NFS_Dir/create_file.sh: line 3:    34 Killed                  dd if=/dev/zero bs=200M count=55 of=/NFS_Dir/file_test_dir/file
sh-4.4# 

Expected results:
在容器内执行测试脚本create_file.sh时,容器内没有OOM问题。

Additional info:
容器内发生OOM问题时,主机上系统日志信息如下:
Apr  7 15:19:46 NFS-Client kernel: dd invoked oom-killer: gfp_mask=0x101cca(GFP_HIGHUSER_MOVABLE|__GFP_WRITE), order=0, oom_score_adj=0
Apr  7 15:19:46 NFS-Client kernel: CPU: 1 PID: 2031 Comm: dd Tainted: G            E     5.10.134 #1
Apr  7 15:19:46 NFS-Client kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Apr  7 15:19:46 NFS-Client kernel: Call Trace:
Apr  7 15:19:46 NFS-Client kernel: dump_stack+0x5c/0x90
Apr  7 15:19:46 NFS-Client kernel: dump_memcg_header+0x12/0x50
Apr  7 15:19:46 NFS-Client kernel: oom_kill_process+0x141/0x180
Apr  7 15:19:46 NFS-Client kernel: out_of_memory+0x121/0x600
Apr  7 15:19:46 NFS-Client kernel: mem_cgroup_out_of_memory+0xda/0xe0
Apr  7 15:19:46 NFS-Client kernel: try_charge+0x779/0x7f0
Apr  7 15:19:46 NFS-Client kernel: mem_cgroup_charge+0x9c/0x400
Apr  7 15:19:46 NFS-Client kernel: __add_to_page_cache_locked+0x3d1/0x4b0
Apr  7 15:19:46 NFS-Client kernel: ? scan_shadow_nodes+0x30/0x30
Apr  7 15:19:46 NFS-Client kernel: add_to_page_cache_lru+0x42/0x150
Apr  7 15:19:46 NFS-Client kernel: pagecache_get_page.part.66+0x123/0x410
Apr  7 15:19:46 NFS-Client kernel: grab_cache_page_write_begin+0x1c/0x40
Apr  7 15:19:46 NFS-Client kernel: nfs_write_begin+0x5a/0x4d0 [nfs]
Apr  7 15:19:46 NFS-Client kernel: generic_perform_write+0xcd/0x190
Apr  7 15:19:46 NFS-Client kernel: nfs_file_write+0x129/0x2d0 [nfs]
Apr  7 15:19:46 NFS-Client kernel: new_sync_write+0x10b/0x190
Apr  7 15:19:46 NFS-Client kernel: vfs_write+0x18a/0x260
Apr  7 15:19:46 NFS-Client kernel: ksys_write+0x49/0xc0
Apr  7 15:19:46 NFS-Client kernel: do_syscall_64+0x33/0x40
Apr  7 15:19:46 NFS-Client kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6
Apr  7 15:19:46 NFS-Client kernel: RIP: 0033:0x7f6c16814815
Apr  7 15:19:46 NFS-Client kernel: Code: Unable to access opcode bytes at RIP 0x7f6c168147eb.
Apr  7 15:19:46 NFS-Client kernel: RSP: 002b:00007ffcfd7a6948 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Apr  7 15:19:46 NFS-Client kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6c16814815
Apr  7 15:19:46 NFS-Client kernel: RDX: 000000000c800000 RSI: 00007f6c08f75000 RDI: 0000000000000001
Apr  7 15:19:46 NFS-Client kernel: RBP: 000000000c800000 R08: 00000000ffffffff R09: 0000000000000000
Apr  7 15:19:46 NFS-Client kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 00007f6c08f75000
Apr  7 15:19:46 NFS-Client kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00007f6c08f75000
Apr  7 15:19:46 NFS-Client kernel: memory: usage 512000kB, limit 512000kB, failcnt 389427
Apr  7 15:19:46 NFS-Client kernel: memory+swap: usage 512000kB, limit 1024000kB, failcnt 0
Apr  7 15:19:46 NFS-Client kernel: kmem: usage 9580kB, limit 9007199254740988kB, failcnt 0
Apr  7 15:19:46 NFS-Client kernel: Memory cgroup stats for /docker/6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138:
Apr  7 15:19:46 NFS-Client kernel: anon 210817024#012file 303316992#012kernel_stack 49152#012pagetables 540672#012percpu 0#012sock 0#012shmem 0#012file_mapped 0#012file_dirty 62582784#012file_writeback 135168#012anon_thp 207618048#012file_thp 0#012shmem_thp 0#012inactive_anon 210825216#012active_anon 0#012inactive_file 151748608#012active_file 152059904#012unevictable 0#012slab_reclaimable 8804440#012slab_unreclaimable 131208#012slab 8935648#012workingset_refault_anon 0#012workingset_refault_file 6897#012workingset_activate_anon 0#012workingset_activate_file 165#012workingset_restore_anon 0#012workingset_restore_file 66#012workingset_nodereclaim 0#012pgfault 3036#012pgmajfault 0#012pgrefill 15569388#012pgscan 16919804#012pgsteal 881980#012pgactivate 16030377#012pgdeactivate 15568607#012pglazyfree 0#012pglazyfreed 0#012thp_fault_alloc 99#012thp_collapse_alloc 0
Apr  7 15:19:46 NFS-Client kernel: Tasks state (memory values in pages):
Apr  7 15:19:46 NFS-Client kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Apr  7 15:19:46 NFS-Client kernel: [   1929]     0  1929     3013      104    73728        0             0 bash
Apr  7 15:19:46 NFS-Client kernel: [   2024]     0  2024     3013      108    73728        0             0 sh
Apr  7 15:19:46 NFS-Client kernel: [   2031]     0  2031    56965    51254   503808        0             0 dd
Apr  7 15:19:46 NFS-Client kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138,mems_allowed=0,oom_memcg=/docker/6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138,task_memcg=/docker/6ec42ac6a3b78b534d4a0376ecd0020c36a2a13e36c7d931aab941cfbd58b138,task=dd,pid=2031,uid=0
Comment 1 小龙 admin 2024-04-07 16:44:26 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3015
Comment 2 小龙 admin 2024-04-08 15:03:30 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3023
Comment 3 Jie_1005 2024-04-08 15:21:04 UTC
(In reply to 小龙 from comment #2)
> The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3023
该PR提交的代码错误,已关闭
Comment 4 小龙 admin 2024-04-08 17:31:27 UTC
The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3029
Comment 5 Joseph Qi alibaba_cloud_group 2024-04-08 17:59:00 UTC
(In reply to 小龙 from comment #4)
> The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3029

A wrong PR by mistake.
Comment 6 Joseph Qi alibaba_cloud_group 2024-04-08 17:59:17 UTC
(In reply to 小龙 from comment #1)
> The PR Link: https://gitee.com/anolis/cloud-kernel/pulls/3015

merged