Bug 5437 - [Anolis23.1][RC1][软件兼容性] 云上ecs拉起后,安装 kubernetes 后执行 kubeadm init 报错,提示 [ERROR SystemVerification]: missing required cgroups: cpu
Summary: [Anolis23.1][RC1][软件兼容性] 云上ecs拉起后,安装 kubernetes 后执行 kubeadm init 报错,提示 [ERROR...
Status: RESOLVED WONTFIX
Alias: None
Product: Anolis OS 23
Classification: Anolis OS
Component: BaseOS Packages (show other bugs) BaseOS Packages
Version: 23.1
Hardware: All Linux
: P3-Medium S3-normal
Target Milestone: ---
Assignee: happy_orange
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-07 14:09 UTC by Janos
Modified: 2024-03-29 09:56 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Janos alibaba_cloud_group 2023-06-07 14:09:58 UTC
[缺陷描述]:
  云上ecs拉起后,安装 kubernetes 后执行 kubeadm init 报错,提示 [ERROR SystemVerification]: missing required cgroups: cpu


[重现环境]:
环境信息:云上ecs
OS:Anolis 23 x86_64/aarch64

# cat /etc/os-release
NAME="Anolis OS"
VERSION="23"
ID="anolis"
VERSION_ID="23"
PLATFORM_ID="platform:an23"
PRETTY_NAME="Anolis OS 23"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"
BUG_REPORT_URL="https://bugzilla.openanolis.cn/"

KERNEL:
# uname -r
5.10.134-14.an23.x86_64


[重现步骤]:
参考SIG:https://openanolis.cn/sig/third_software_compatibility/doc/426352745466167442

# 安装containerd
yum install -y containerd
containerd config default > /etc/containerd/config.toml

# 修改容器配置,使用cgroup模式,和 aliyun 的镜像源
sed -i 's|SystemdCgroup = .*|SystemdCgroup = true|g' /etc/containerd/config.toml
sed -i 's|sandbox_image = .*|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"|g' /etc/containerd/config.toml

# 启动containerd
systemctl daemon-reload
systemctl start containerd

# 安装k8s
yum install -y kubernetes kubernetes-kubeadm cri-tools

# 生成初始化配置文件,修改使用containerd作为容器,修改使用aliyun镜像源
kubeadm config print init-defaults > /root/init.yaml
sed -i 's|/var/run/dockershim.sock|/run/containerd/containerd.sock|g' /root/init.yaml
sed -i "s|k8s.gcr.io|registry.aliyuncs.com/google_containers|g" /root/init.yaml
sed -i "s|advertiseAddress: .*|advertiseAddress: ${ip}|g" /root/init.yaml
sed -i '/serviceSubnet: 10.96.0.0\/12/a\  podSubnet: 10.244.0.0\/16' /root/init.yaml

# 根据配置文件进行初始化
kubeadm init --config=/root/init.yaml


[期望结果]:
k8s初始化成功

[实际结果]:
k8s初始化失败,提示如下:

I0607 13:49:43.537965 1111267 checks.go:203] validating availability of port 2379
I0607 13:49:43.537999 1111267 checks.go:203] validating availability of port 2380
I0607 13:49:43.538015 1111267 checks.go:243] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Some fatal errors occurred:
        [ERROR SystemVerification]: missing required cgroups: cpu
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
        cmd/kubeadm/app/cmd/init.go:111
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:916
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1040
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:968
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1598

当前环境cgroup信息如下:
# mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)
tmpfs on /usr/local/aegis/cgroup type tmpfs (rw,relatime,seclabel,size=51200k)
cgroup on /usr/local/aegis/cgroup/cpu type cgroup (rw,relatime,seclabel,cpu)

[原因定位]:
环境重启后再次进行k8s初始化操作,该错误提示消失,怀疑ecs装机的cgroup目录挂载与k8s有冲突
Comment 1 yunmeng365524 2023-06-07 17:37:11 UTC
在多台环境上都必现,请帮忙确认一下。
Comment 2 happy_orange alibaba_cloud_group 2023-06-09 22:54:12 UTC
这个问题是 aegis 的 bug。因为启动了 aegis 服务之后会自行挂 cpu group,导致sysfs目录下面cpu没了。
该问题属于 aegis 的已知问题,已经联系相关同学进行处理。
这里建议方式:关掉 aegis 的 service,即可一次启动 k8s init 配置。
Comment 3 Janos alibaba_cloud_group 2024-01-08 11:56:27 UTC
这个问题在 23.1 版本上依然存在
Comment 4 zhixin01 alibaba_cloud_group 2024-01-08 15:46:49 UTC
aegis挂载问题,导致ctr创建动态容器失败,因此软件兼容性测试用例containerd.py失败(去挂载aegis后用例执行成功)

用例执行失败日志如下:
Sending CMD: ctr run -d --net-host docker.io/library/nginx:latest nginx22

Jan 08 15:02:41 LEST:MainThre [INFO]: ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/default/nginx22/cpu.weight: no such file or directory: unknown