Bug 20484 - Cgroup blkio leak
Summary: Cgroup blkio leak
Status: NEW
Alias: None
Product: ANCK 5.10 Dev
Classification: ANCK
Component: block/storage (show other bugs) block/storage
Version: 5.10.y-13
Hardware: x86_64 Linux
: P3-Medium S2-major
Target Milestone: ---
Assignee: Ferry Meng
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-17 19:17 UTC by sunwuhao
Modified: 2025-04-28 13:47 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sunwuhao 2025-04-17 19:17:27 UTC
Description of problem:
Cgroup blokio 泄露

Version-Release number of selected component (if applicable):

cat /etc/os-release 
NAME="Anolis OS"
VERSION="8.8"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.8"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.8"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

uname -r 
5.10.134-13.1.an8.x86_64

How reproducible:

Steps to Reproduce:
1. 设置限速
  cat /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
  259:0 1000000000
2. 节点在 k8s 集群中正常运行

Actual results:

# uname -r
5.10.134-13.1.an8.x86_64
# cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  10      175     1
cpu     9       235     1
cpuacct 9       235     1
blkio   11      38733   1
memory  4       252     1
devices 2       234     1
freezer 12      174     1
net_cls 8       174     1
perf_event      7       174     1
net_prio        8       174     1
hugetlb 3       174     1
pids    5       241     1
rdma    6       1       1


Expected results:


Additional info:
Comment 1 Joseph Qi alibaba_cloud_group 2025-04-17 22:37:09 UTC
(In reply to sunwuhao from comment #0)
> Description of problem:
> Cgroup blokio 泄露
> 
> Version-Release number of selected component (if applicable):
> 
> cat /etc/os-release 
> NAME="Anolis OS"
> VERSION="8.8"
> ID="anolis"
> ID_LIKE="rhel fedora centos"
> VERSION_ID="8.8"
> PLATFORM_ID="platform:an8"
> PRETTY_NAME="Anolis OS 8.8"
> ANSI_COLOR="0;31"
> HOME_URL="https://openanolis.cn/"
> 
> uname -r 
> 5.10.134-13.1.an8.x86_64
> 
> How reproducible:
> 
> Steps to Reproduce:
> 1. 设置限速
>   cat /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
>   259:0 1000000000
> 2. 节点在 k8s 集群中正常运行

具体执行什么操作?
另外,/proc/cmdline 也请提供下

> 
> Actual results:
> 
> # uname -r
> 5.10.134-13.1.an8.x86_64
> # cat /proc/cgroups
> #subsys_name    hierarchy       num_cgroups     enabled
> cpuset  10      175     1
> cpu     9       235     1
> cpuacct 9       235     1
> blkio   11      38733   1
> memory  4       252     1
> devices 2       234     1
> freezer 12      174     1
> net_cls 8       174     1
> perf_event      7       174     1
> net_prio        8       174     1
> hugetlb 3       174     1
> pids    5       241     1
> rdma    6       1       1
> 
> 
> Expected results:
> 
> 
> Additional info:
Comment 2 sunwuhao 2025-04-18 14:16:56 UTC
(In reply to Joseph Qi from comment #1)
> (In reply to sunwuhao from comment #0)
> > Description of problem:
> > Cgroup blokio 泄露
> > 
> > Version-Release number of selected component (if applicable):
> > 
> > cat /etc/os-release 
> > NAME="Anolis OS"
> > VERSION="8.8"
> > ID="anolis"
> > ID_LIKE="rhel fedora centos"
> > VERSION_ID="8.8"
> > PLATFORM_ID="platform:an8"
> > PRETTY_NAME="Anolis OS 8.8"
> > ANSI_COLOR="0;31"
> > HOME_URL="https://openanolis.cn/"
> > 
> > uname -r 
> > 5.10.134-13.1.an8.x86_64
> > 
> > How reproducible:
> > 
> > Steps to Reproduce:
> > 1. 设置限速
> >   cat /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
> >   259:0 1000000000
> > 2. 节点在 k8s 集群中正常运行
> 
> 具体执行什么操作?
> 另外,/proc/cmdline 也请提供下
> 

正常跑在线(例如 java 后台)和离线(例如 转码)的 workload

/proc/cmdline 如下:

BOOT_IMAGE=/boot/vmlinuz-5.10.134-13.1.an8.x86_64 root=UUID=03294622-c93a-4596-9b53-709af92736c6 ro ipv6.disable=1 crashkernel=2G-8G:192M,8G-128G:256M,128G-:384M rdt=l3cat,l3cdp,cmt,mba,mbmtotal,mbmlocal nodmraid nomodeset biosdevname=0 rhgb quiet cgroup.memory=nokmem


> > 
> > Actual results:
> > 
> > # uname -r
> > 5.10.134-13.1.an8.x86_64
> > # cat /proc/cgroups
> > #subsys_name    hierarchy       num_cgroups     enabled
> > cpuset  10      175     1
> > cpu     9       235     1
> > cpuacct 9       235     1
> > blkio   11      38733   1
> > memory  4       252     1
> > devices 2       234     1
> > freezer 12      174     1
> > net_cls 8       174     1
> > perf_event      7       174     1
> > net_prio        8       174     1
> > hugetlb 3       174     1
> > pids    5       241     1
> > rdma    6       1       1
> > 
> > 
> > Expected results:
> > 
> > 
> > Additional info:
Comment 3 Joseph Qi alibaba_cloud_group 2025-04-18 19:10:32 UTC
(In reply to sunwuhao from comment #2)
> (In reply to Joseph Qi from comment #1)
> > (In reply to sunwuhao from comment #0)
> > > Description of problem:
> > > Cgroup blokio 泄露
> > > 
> > > Version-Release number of selected component (if applicable):
> > > 
> > > cat /etc/os-release 
> > > NAME="Anolis OS"
> > > VERSION="8.8"
> > > ID="anolis"
> > > ID_LIKE="rhel fedora centos"
> > > VERSION_ID="8.8"
> > > PLATFORM_ID="platform:an8"
> > > PRETTY_NAME="Anolis OS 8.8"
> > > ANSI_COLOR="0;31"
> > > HOME_URL="https://openanolis.cn/"
> > > 
> > > uname -r 
> > > 5.10.134-13.1.an8.x86_64
> > > 
> > > How reproducible:
> > > 
> > > Steps to Reproduce:
> > > 1. 设置限速
> > >   cat /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
> > >   259:0 1000000000
> > > 2. 节点在 k8s 集群中正常运行
> > 
> > 具体执行什么操作?
> > 另外,/proc/cmdline 也请提供下
> > 
> 
> 正常跑在线(例如 java 后台)和离线(例如 转码)的 workload
> 
> /proc/cmdline 如下:
> 
> BOOT_IMAGE=/boot/vmlinuz-5.10.134-13.1.an8.x86_64
> root=UUID=03294622-c93a-4596-9b53-709af92736c6 ro ipv6.disable=1
> crashkernel=2G-8G:192M,8G-128G:256M,128G-:384M
> rdt=l3cat,l3cdp,cmt,mba,mbmtotal,mbmlocal nodmraid nomodeset biosdevname=0
> rhgb quiet cgroup.memory=nokmem
> 

从 cmdline 上看也没有打开 cgwb 特性,那么就是最基础的 direct io 限流。
我本地使用脚本创建了 1000 个 block cgroup 并配置限流,然后在每个 block cgroup 写入 IO,再删除 cgroup,对应的 /proc/cgroups 都已正常删除。
所以,请排查下是不是你的环境下,还存在进程对这些 cgroup 的引用。
Comment 4 fengyehong 2025-04-21 18:06:37 UTC
环境:
# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.10.134-13.1.an8.x86_64 root=UUID=d973c951-cf65-4658-90ac-d239db05ee7b ro ipv6.disable=1 crashkernel=2G-8G:192M,8G-128G:256M,128G-:384M rdt=l3cat,l3cdp,cmt,mba,mbmtotal,mbmlocal cgwb_v1 nodmraid nomodeset biosdevname=0 rhgb quiet cgroup.memory=nokmem
# ls /sys/fs/cgroup/
blkio  cpu  cpuacct  cpu,cpuacct  cpuset  devices  freezer  hugetlb  memory  memory,blkio  net_cls  net_cls,net_prio  net_prio  perf_event  pids  rdma  systemd

用这个脚本可以复现:
# 这里要把设备号换成 /data 挂载的磁盘设备号
echo "259:0 1000000000" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
PARENT=/sys/fs/cgroup/blkio/test
mkdir $PARENT
for i in {1..1000};
do
	mkdir -p $PARENT/t$i
	cgexec -g blkio:test/t$i sh -c 'dd if=/dev/zero of=/data/test bs=1M count=100; sync; rm -f /data/test'
	rmdir $PARENT/t$i
done
Comment 5 fengyehong 2025-04-21 18:18:11 UTC
并且经过验证,如果不开启 cgwb_v1,将脚本中的 sync 替换为 vmtouch -e /data/test 同样可以触发 blkio 泄露。
Comment 6 Ferry Meng alibaba_cloud_group 2025-04-23 15:59:49 UTC
提供一下环境上加载了哪些hotfix? 
命令是 kpatch list
Comment 7 fengyehong 2025-04-24 11:56:06 UTC
(In reply to Ferry Meng from comment #6)
> 提供一下环境上加载了哪些hotfix? 
> 命令是 kpatch list

看了下是空的。

```
# kpatch list
Loaded patch modules:

Installed patch modules:
```
Comment 8 Ferry Meng alibaba_cloud_group 2025-04-28 13:47:31 UTC
https://bugzilla.openanolis.cn/show_bug.cgi?id=20642
我在排查期间使用的是devel-5.10分支的内核,有复现到相同的现象,后续有进行修复。

但您这个场景下,我使用13.1分支编译的内核,并没有复现到相同的现象。供参考