5437 – [Anolis23.1][RC1][软件兼容性] 云上ecs拉起后，安装 kubernetes 后执行 kubeadm init 报错，提示 [ERROR SystemVerification]: missing required cgroups: cpu

Bug 5437 - [Anolis23.1][RC1][软件兼容性] 云上ecs拉起后，安装 kubernetes 后执行 kubeadm init 报错，提示 [ERROR SystemVerification]: missing required cgroups: cpu

Summary: [Anolis23.1][RC1][软件兼容性] 云上ecs拉起后，安装 kubernetes 后执行 kubeadm init 报错，提示 [ERROR...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Anolis OS 23
Classification:	Anolis OS
Component:	BaseOS Packages (show other bugs)	BaseOS Packages
Sub Component:
Version:	23.1
Hardware:	All Linux

Importance:	P3-Medium S3-normal
Target Milestone:	---
Assignee:	happy_orange
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2023-06-07 14:09 UTC by Janos
Modified:	2024-03-29 09:56 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Janos alibaba_cloud_group

2023-06-07 14:09:58 UTC

[缺陷描述]：
  云上ecs拉起后，安装 kubernetes 后执行 kubeadm init 报错，提示 [ERROR SystemVerification]: missing required cgroups: cpu


[重现环境]：
环境信息：云上ecs
OS：Anolis 23 x86_64/aarch64

# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

KERNEL：
# uname -r
5.10.134-14.an23.x86_64


[重现步骤]：
参考SIG：https://openanolis.cn/sig/third_software_compatibility/doc/426352745466167442

# 安装containerd
yum install -y containerd
containerd config default > /etc/containerd/config.toml

# 修改容器配置，使用cgroup模式，和 aliyun 的镜像源
sed -i 's|SystemdCgroup = .*|SystemdCgroup = true|g' /etc/containerd/config.toml
sed -i 's|sandbox_image = .*|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"|g' /etc/containerd/config.toml

# 启动containerd
systemctl daemon-reload
systemctl start containerd

# 安装k8s
yum install -y kubernetes kubernetes-kubeadm cri-tools

# 生成初始化配置文件，修改使用containerd作为容器，修改使用aliyun镜像源
kubeadm config print init-defaults > /root/init.yaml
sed -i 's|/var/run/dockershim.sock|/run/containerd/containerd.sock|g' /root/init.yaml
sed -i "s|k8s.gcr.io|registry.aliyuncs.com/google_containers|g" /root/init.yaml
sed -i "s|advertiseAddress: .*|advertiseAddress: ${ip}|g" /root/init.yaml
sed -i '/serviceSubnet: 10.96.0.0\/12/a\  podSubnet: 10.244.0.0\/16' /root/init.yaml

# 根据配置文件进行初始化
kubeadm init --config=/root/init.yaml


[期望结果]：
k8s初始化成功

[实际结果]：
k8s初始化失败，提示如下：

I0607 13:49:43.537965 1111267 checks.go:203] validating availability of port 2379
I0607 13:49:43.537999 1111267 checks.go:203] validating availability of port 2380
I0607 13:49:43.538015 1111267 checks.go:243] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Some fatal errors occurred:
        [ERROR SystemVerification]: missing required cgroups: cpu
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1040
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:968
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1598

当前环境cgroup信息如下：
# mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)
tmpfs on /usr/local/aegis/cgroup type tmpfs (rw,relatime,seclabel,size=51200k)
cgroup on /usr/local/aegis/cgroup/cpu type cgroup (rw,relatime,seclabel,cpu)

[原因定位]：
环境重启后再次进行k8s初始化操作，该错误提示消失，怀疑ecs装机的cgroup目录挂载与k8s有冲突

Comment 1 yunmeng365524 2023-06-07 17:37:11 UTC

在多台环境上都必现，请帮忙确认一下。

Comment 2 happy_orange alibaba_cloud_group

2023-06-09 22:54:12 UTC

这个问题是 aegis 的 bug。因为启动了 aegis 服务之后会自行挂 cpu group，导致sysfs目录下面cpu没了。
该问题属于 aegis 的已知问题，已经联系相关同学进行处理。
这里建议方式：关掉 aegis 的 service，即可一次启动 k8s init 配置。

Comment 3 Janos alibaba_cloud_group

2024-01-08 11:56:27 UTC

这个问题在 23.1 版本上依然存在

Comment 4 zhixin01 alibaba_cloud_group

2024-01-08 15:46:49 UTC

aegis挂载问题，导致ctr创建动态容器失败，因此软件兼容性测试用例containerd.py失败（去挂载aegis后用例执行成功）

用例执行失败日志如下：
Sending CMD: ctr run -d --net-host docker.io/library/nginx:latest nginx22

Jan 08 15:02:41 LEST:MainThre [INFO]: ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/default/nginx22/cpu.weight: no such file or directory: unknown