[缺陷描述]: 云上ecs拉起后,安装 kubernetes 后执行 kubeadm init 报错,提示 [ERROR SystemVerification]: missing required cgroups: cpu [重现环境]: 环境信息:云上ecs OS:Anolis 23 x86_64/aarch64 # cat /etc/os-release NAME="Anolis OS" VERSION="23" ID="anolis" VERSION_ID="23" PLATFORM_ID="platform:an23" PRETTY_NAME="Anolis OS 23" ANSI_COLOR="0;31" HOME_URL="https://openanolis.cn/" BUG_REPORT_URL="https://bugzilla.openanolis.cn/" KERNEL: # uname -r 5.10.134-14.an23.x86_64 [重现步骤]: 参考SIG:https://openanolis.cn/sig/third_software_compatibility/doc/426352745466167442 # 安装containerd yum install -y containerd containerd config default > /etc/containerd/config.toml # 修改容器配置,使用cgroup模式,和 aliyun 的镜像源 sed -i 's|SystemdCgroup = .*|SystemdCgroup = true|g' /etc/containerd/config.toml sed -i 's|sandbox_image = .*|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"|g' /etc/containerd/config.toml # 启动containerd systemctl daemon-reload systemctl start containerd # 安装k8s yum install -y kubernetes kubernetes-kubeadm cri-tools # 生成初始化配置文件,修改使用containerd作为容器,修改使用aliyun镜像源 kubeadm config print init-defaults > /root/init.yaml sed -i 's|/var/run/dockershim.sock|/run/containerd/containerd.sock|g' /root/init.yaml sed -i "s|k8s.gcr.io|registry.aliyuncs.com/google_containers|g" /root/init.yaml sed -i "s|advertiseAddress: .*|advertiseAddress: ${ip}|g" /root/init.yaml sed -i '/serviceSubnet: 10.96.0.0\/12/a\ podSubnet: 10.244.0.0\/16' /root/init.yaml # 根据配置文件进行初始化 kubeadm init --config=/root/init.yaml [期望结果]: k8s初始化成功 [实际结果]: k8s初始化失败,提示如下: I0607 13:49:43.537965 1111267 checks.go:203] validating availability of port 2379 I0607 13:49:43.537999 1111267 checks.go:203] validating availability of port 2380 I0607 13:49:43.538015 1111267 checks.go:243] validating the existence and emptiness of directory /var/lib/etcd [preflight] Some fatal errors occurred: [ERROR SystemVerification]: missing required cgroups: cpu [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` error execution phase preflight k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1 cmd/kubeadm/app/cmd/phases/workflow/runner.go:260 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll cmd/kubeadm/app/cmd/phases/workflow/runner.go:446 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run cmd/kubeadm/app/cmd/phases/workflow/runner.go:232 k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1 cmd/kubeadm/app/cmd/init.go:111 github.com/spf13/cobra.(*Command).execute vendor/github.com/spf13/cobra/command.go:916 github.com/spf13/cobra.(*Command).ExecuteC vendor/github.com/spf13/cobra/command.go:1040 github.com/spf13/cobra.(*Command).Execute vendor/github.com/spf13/cobra/command.go:968 k8s.io/kubernetes/cmd/kubeadm/app.Run cmd/kubeadm/app/kubeadm.go:50 main.main cmd/kubeadm/kubeadm.go:25 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1598 当前环境cgroup信息如下: # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot) tmpfs on /usr/local/aegis/cgroup type tmpfs (rw,relatime,seclabel,size=51200k) cgroup on /usr/local/aegis/cgroup/cpu type cgroup (rw,relatime,seclabel,cpu) [原因定位]: 环境重启后再次进行k8s初始化操作,该错误提示消失,怀疑ecs装机的cgroup目录挂载与k8s有冲突
在多台环境上都必现,请帮忙确认一下。
这个问题是 aegis 的 bug。因为启动了 aegis 服务之后会自行挂 cpu group,导致sysfs目录下面cpu没了。 该问题属于 aegis 的已知问题,已经联系相关同学进行处理。 这里建议方式:关掉 aegis 的 service,即可一次启动 k8s init 配置。
这个问题在 23.1 版本上依然存在
aegis挂载问题,导致ctr创建动态容器失败,因此软件兼容性测试用例containerd.py失败(去挂载aegis后用例执行成功) 用例执行失败日志如下: Sending CMD: ctr run -d --net-host docker.io/library/nginx:latest nginx22 Jan 08 15:02:41 LEST:MainThre [INFO]: ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/default/nginx22/cpu.weight: no such file or directory: unknown