Created attachment 698 [details] 复现步骤 Description of problem: 当磁盘限速功能生效时,并且磁盘io达到限速上限,进程状态就会变成D,并影响主机上其他进程 4.19\5.10版本上都发现有这个问题 How reproducible: Steps to Reproduce: 1.参考https://help.aliyun.com/document_detail/155509.html 文档设置打开主机io限速功能 2.参考图片来复现 3、 查看容器启动的进程 状态为D Expected results: 不影响宿主机其他容器创建 Additional info:
It seems that something wrong in your test procedure.
(In reply to josephqi from comment #1) > It seems that something wrong in your test procedure. You expected to limit the created container1, but actually set limit to blkcg4. So I guess when you create container2, it also has been under throttling, which is an expected behavior.
Created attachment 700 [details] 复现步骤2
你可以看下我刚上传的复现步骤2的图片,打开两个终端,一个进行限速操作,当限速生效时,在第二个终端执行docker run 命令会直接卡住。 当限速生效时 查看写进程 其状态是D root 3345447 9.4 0.0 4396 876 pts/0 D+ 09:15 0:03 dd if=/dev/zero of=/data/testfile bs=4k count=1000000
(In reply to jinriyang from comment #4) > 你可以看下我刚上传的复现步骤2的图片,打开两个终端,一个进行限速操作,当限速生效时,在第二个终端执行docker run 命令会直接卡住。 > 当限速生效时 查看写进程 其状态是D > root 3345447 9.4 0.0 4396 876 pts/0 D+ 09:15 0:03 dd > if=/dev/zero of=/data/testfile bs=4k count=1000000 贴下你具体的内核版本 uname -r 然后贴下你所要限制的 blkcg 的 cgroup.procs, 看里面有哪些进程, 注意需要将pid转为进程名,方便定位,该步骤是想排查是否有无关的 docker 服务进程被限速。 “docker run 命令会直接卡住”, 这个会卡住多久?你可以做个实验,将 dd if=/dev/zero of=/data/testfile bs=4k count=1000000 调整为 count=2560 再试下,看下 docker run 是否只卡住10s左右。
Created attachment 701 [details] 复现3
在多个版本测试都有遇到这个问题 5.10的是阿里云的ECS uname -r 4.19.91-24.8.an8.x86_64 5.10.134-13.al8.x86_64 当dd命令运行持续写入并且达到限速上线时docker run就会卡住; 等写入结束后,docker run 就正常了; 如果将 dd if=/dev/zero of=/data/testfile bs=4k count=1000000 调整为 count=2560,docker也不会卡很久的,dd写完就正常了
(In reply to jinriyang from comment #6) > Created attachment 701 [details] > 复现3 有一个bash进程在里面,这个 bash 进程和你后面运行 docker run 命令的不是一个终端吧? 另外你再跑下测试,然后 while [ 1 ]; do cat /proc/meminfo | grep -i back sleep 1 done 跑 dd 的过程中,把输出贴给我,贴个 120 s 左右的吧,我主要想看下 writeback 是否一直在增长。
(In reply to xiaoguangwang from comment #8) > (In reply to jinriyang from comment #6) > > Created attachment 701 [details] > > 复现3 > > 有一个bash进程在里面,这个 bash 进程和你后面运行 docker run 命令的不是一个终端吧? > > 另外你再跑下测试,然后 > while [ 1 ]; do > cat /proc/meminfo | grep -i back > sleep 1 > done > > 跑 dd 的过程中,把输出贴给我,贴个 120 s 左右的吧,我主要想看下 writeback 是否一直在增长。 bash进程是运行dd命令的那个终端,跟运行docker命令的是不同的终端 ,按照你的要求,跑完了这个 [root@stark05 blkio]# sudo dd if=/dev/zero of=/data/testfile bs=4k count=100000 100000+0 records in 100000+0 records out 409600000 bytes (410 MB, 391 MiB) copied, 38.8928 s, 10.5 MB/s cat /proc/meminfo | grep -i back 的输出在下面: Writeback: 0 kB WritebackTmp: 0 kB Writeback: 4 kB WritebackTmp: 0 kB Writeback: 0 kB WritebackTmp: 0 kB Writeback: 14016 kB WritebackTmp: 0 kB Writeback: 24392 kB WritebackTmp: 0 kB Writeback: 34736 kB WritebackTmp: 0 kB Writeback: 44872 kB WritebackTmp: 0 kB Writeback: 55204 kB WritebackTmp: 0 kB Writeback: 64152 kB WritebackTmp: 0 kB Writeback: 8216 kB WritebackTmp: 0 kB Writeback: 16292 kB WritebackTmp: 0 kB Writeback: 26304 kB WritebackTmp: 0 kB Writeback: 36636 kB WritebackTmp: 0 kB Writeback: 47012 kB WritebackTmp: 0 kB Writeback: 51552 kB WritebackTmp: 0 kB Writeback: 5912 kB WritebackTmp: 0 kB Writeback: 5872 kB WritebackTmp: 0 kB Writeback: 5360 kB WritebackTmp: 0 kB Writeback: 5472 kB WritebackTmp: 0 kB Writeback: 4768 kB WritebackTmp: 0 kB Writeback: 5444 kB WritebackTmp: 0 kB Writeback: 5036 kB WritebackTmp: 0 kB Writeback: 5332 kB WritebackTmp: 0 kB Writeback: 5500 kB WritebackTmp: 0 kB Writeback: 5396 kB WritebackTmp: 0 kB Writeback: 5100 kB WritebackTmp: 0 kB Writeback: 4824 kB WritebackTmp: 0 kB Writeback: 7196 kB WritebackTmp: 0 kB Writeback: 5436 kB WritebackTmp: 0 kB Writeback: 4944 kB WritebackTmp: 0 kB Writeback: 5536 kB WritebackTmp: 0 kB Writeback: 5640 kB WritebackTmp: 0 kB Writeback: 5224 kB WritebackTmp: 0 kB Writeback: 6044 kB WritebackTmp: 0 kB Writeback: 4824 kB WritebackTmp: 0 kB Writeback: 5008 kB WritebackTmp: 0 kB Writeback: 6212 kB WritebackTmp: 0 kB Writeback: 5580 kB WritebackTmp: 0 kB Writeback: 5660 kB WritebackTmp: 0 kB Writeback: 5444 kB WritebackTmp: 0 kB Writeback: 4896 kB WritebackTmp: 0 kB Writeback: 2616 kB WritebackTmp: 0 kB Writeback: 320 kB WritebackTmp: 0 kB Writeback: 0 kB WritebackTmp: 0 kB Writeback: 0 kB WritebackTmp: 0 kB Writeback: 0 kB
原因是 docker 相关的操作会发起 sync 文件的全局操作,导致需要等待 dd 进程产生的脏页落盘,但由于限速的问题,导致这个回刷很慢,所以表现为 docker 相关的操作 hang 住。 因此算是一种 by-design 设计,如果需要改善,需要尽量避免在文件系统上全局的sync操作,docker 相关的命令需要去评估能否避免全局 sync 操作。
by-design 设计。
Refer to above comments.