1630 – [AnolisOS8.6][anck][x86_64][Kubernetes兼容性] kubeadm init失败：Error getting node" err="node \"localhost.localdomain\" not found"

Bug 1630 - [AnolisOS8.6][anck][x86_64][Kubernetes兼容性] kubeadm init失败：Error getting node" err="node \"localhost.localdomain\" not found"

Summary: [AnolisOS8.6][anck][x86_64][Kubernetes兼容性] kubeadm init失败：Error getting node"...

Status:	NEW

Alias:	None

Product:	Anolis OS 8
Classification:	Anolis OS
Component:	Others (show other bugs)	Others
Sub Component:
Version:	8.6
Hardware:	All Linux

Importance:	P3-Medium S3-normal
Target Milestone:	---
Assignee:	Jacob
QA Contact:	shuming

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2022-07-06 16:44 UTC by kangjiangbo
Modified:	2022-07-06 16:44 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description kangjiangbo 2022-07-06 16:44:14 UTC

Description of problem:
AnolisOS8.6 anck x86_64， kubeadm init失败
# kubeadm init --image-repository='registry.aliyuncs.com/google_containers'
[init] Using Kubernetes version: v1.24.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost.localdomain] and IPs [10.96.0.1 172.16.1.146]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [172.16.1.146 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [172.16.1.146 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher



Version-Release number of selected component (if applicable):
# cat /etc/os-release
NAME="Anolis OS"
VERSION="8.6"
ID="anolis"
ID_LIKE="rhel fedora centos"
VERSION_ID="8.6"
PLATFORM_ID="platform:an8"
PRETTY_NAME="Anolis OS 8.6"
ANSI_COLOR="0;31"
HOME_URL="https://openanolis.cn/"

How reproducible:
kubeadm init --image-repository='registry.aliyuncs.com/google_containers'

Steps to Reproduce:

参照https://openanolis.cn/sig/third_software_compatibility/doc/426352745466167442

#关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
#设置selinux为宽松模式
setenforce 0
#关闭交换空间
swapoff -a

#安装docker
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum install -y docker-ce-19.03.15 docker-ce-cli containerd.io
systemctl restart docker
systemctl enable docker

#安装kubectl，kubelet，kubeadm
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-\$basearch/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

yum install -y kubectl kubelet kubeadm
systemctl enable kubelet

#初始化kubernetes
kubeadm init --image-repository='registry.aliyuncs.com/google_containers'


Actual results:


Expected results:


Additional info:
# journalctl -xeu kubelet
Jul 06 16:43:14 localhost.localdomain kubelet[10369]: E0706 16:43:14.692599   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:14 localhost.localdomain kubelet[10369]: E0706 16:43:14.792979   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:14 localhost.localdomain kubelet[10369]: E0706 16:43:14.893479   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:14 localhost.localdomain kubelet[10369]: E0706 16:43:14.993868   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:15 localhost.localdomain kubelet[10369]: E0706 16:43:15.094141   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:15 localhost.localdomain kubelet[10369]: E0706 16:43:15.163217   10369 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://172.16.1.146:6443/apis/coo>
Jul 06 16:43:15 localhost.localdomain kubelet[10369]: E0706 16:43:15.194307   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:15 localhost.localdomain kubelet[10369]: E0706 16:43:15.294595   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:15 localhost.localdomain kubelet[10369]: E0706 16:43:15.394899   10369 kubelet.go:2424] "Error getting node" err="node \"localhost.localdomain\" not found"
Jul 06 16:43:15 localhost.localdomain kubelet[10369]: I0706 16:43:15.395639   10369