部署Kubeadm遇到的哪些问题，并且如何解决

1）设置错误导致kubeadm安装k8s失败

提示：ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables

[root@node01 data]# kubeadm join masterIP地址:6443 --token abcdef.0123456789abcdef \
>     --discovery-token-ca-cert-hash sha256:e7a08a24b68c738cccfcc3ae56b7a433704f3753991c782208b46f20c57cf0c9
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.12. Latest validated version: 19.03
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决办法:

echo "1" >/proc/sys/net/bridge/bridge-nf-call-iptables
echo "1" >/proc/sys/net/bridge/bridge-nf-call-ip6tables

2）kubeadm安装k8s1.20.1版本添加node节点时出错

参考这个地址：https://bbs.csdn.net/topics/397412529?page=1

accepts at most 1 arg(s), received 2
To see the stack trace of this error execute with --v=5 or higher

稍等片刻后, 加入成功如下:

W0801 19:27:06.500319   12557 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING FileExisting-tc]: tc not found in system path
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

3）部署Kubernetes遇到的问题与解决方法（初始化等）

参考这个地址：https://blog.csdn.net/clareeee/article/details/121100431

4）子节点服务器上运行kubectl get node却发现报错了, 如下

[root@k8s-node02-17:~]# kubectl get node
The connection to the server localhost:8080 was refused - did you specify the right host or port?

可以发现按安装成功后，日志提示的如下步骤操作即可

# 在各个子节点创建.kube目录
[root@master data]# kubectl get nodes
W1116 20:29:22.881159 20594 loader.go:223] Config not found: /etc/kubernetes/admin.conf
The connection to the server localhost:8080 was refused - did you specify the right host or port?

#Master节点执行：
scp /etc/kubernetes/admin.conf root@node01:/etc/kubernetes/admin.conf
scp /etc/kubernetes/admin.conf root@node01:/etc/kubernetes/admin.conf

#Node节点执行：
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile

# 最后运行测试, 发现不报错了
[root@k8s-master01-15 ~]# kubectl get node
NAME               STATUS     ROLES    AGE   VERSION
k8s-master01-15    NotReady   master   20m   v1.18.6
k8s-node01-16      NotReady   <none>   19m   v1.18.6
k8s-node02-17      NotReady   <none>   19m   v1.18.6

5) kubectl get cs 问题

提示：Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused. 出现这种情况，是/etc/kubernetes/manifests/下的kube-controller-manager.yaml和kube-scheduler.yaml设置的默认端口是0导致的，解决方式是注释掉对应的port即可，操作如下：

[root@k8s-master01-15 manifests]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS      MESSAGE                                                                                       ERROR
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
controller-manager   Healthy     ok                                                                                            
etcd-0               Healthy     {"health":"true","reason":""} 


[root@k8s-master01-15 /etc/kubernetes/manifests]# vim kube-scheduler.yaml +19
.....
spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
    #- --port=0      # 注释这行


[root@k8s-master01-15 /etc/kubernetes/manifests]# vim kube-controller-manager.yaml +26
.....
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-cidr=10.244.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    #- --port=0      # 注释这行

# 重启kubelet.service 服务
systemctl restart kubelet.service

# 检查cs状态
[root@k8s-master01-15 manifests]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Healthy   ok                              
etcd-0               Healthy   {"health":"true","reason":""}   
controller-manager   Healthy   ok

6) k8s的coredns组件的问题（部署flannel网络插件）

提示：network: open /run/flannel/subnet.env: no such file or directory

[root@k8s-master01-15 core]# kubectl get pods -n kube-system -o wide
NAME                             READY   STATUS              RESTARTS         AGE     IP               NODE     NOMINATED NODE   READINESS GATES
coredns-78fcd69978-6t76m         0/1     ContainerCreating   0                126m    <none>           node01   <none>           <none>
coredns-78fcd69978-nthg8         0/1     ContainerCreating   0                126m    <none>           node01   <none>           <none>
.....

[root@k8s-master01-15 core]# kubectl describe pods coredns-78fcd69978-6t76m -n kube-system
.......
Events:
  Type     Reason                  Age                      From     Message
  ----     ------                  ----                     ----     -------
  Normal   SandboxChanged          19m (x4652 over 104m)    kubelet  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  4m26s (x5461 over 104m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8e2992e19a969235ff30271e317565b48ffe57d8261dc86f92249005a5eaaec5" network for pod "coredns-78fcd69978-6t76m": networkPlugin cni failed to set up pod "coredns-78fcd69978-6t76m_kube-system" network: open /run/flannel/subnet.env: no such file or directory

解决方案：

查看是否有 /run/flannel/subnet.env 这个文件，master 上是存在的，也有内容：

[root@k8s-master01-15:~]# mkdir -p /run/flannel/

cat >> /run/flannel/subnet.env <<EOF
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
EOF

7）发现 kube-proxy出现异常，状态：CrashLoopBackOff

kube-proxy是作用于service的，作用主要是负责service的实现，实现了内部从pod到service和外部的从node port向service的访问

[root@master core]# kubectl get pods -n kube-system -o wide
NAME                             READY   STATUS              RESTARTS         AGE    IP               NODE     NOMINATED NODE   READINESS GATES
coredns-78fcd69978-dts5t         0/1     ContainerCreating   0                146m   <none>           node02   <none>           <none>
coredns-78fcd69978-v8g7z         0/1     ContainerCreating   0                146m   <none>           node02   <none>           <none>
etcd-master                      1/1     Running             0                147m   172.23.199.15   master   <none>           <none>
kube-apiserver-master            1/1     Running             0                147m   172.23.199.15   master   <none>           <none>
kube-controller-manager-master   1/1     Running             0                147m   172.23.199.15   master   <none>           <none>
kube-proxy-9nxhp                 0/1     CrashLoopBackOff    33 (37s ago)     144m   172.23.199.16   node01   <none>           <none>
kube-proxy-gqrvl                 0/1     CrashLoopBackOff    33 (86s ago)     145m   172.23.199.17   node02   <none>           <none>
kube-proxy-p825v                 0/1     CrashLoopBackOff    33 (2m54s ago)   146m   172.23.199.15   master   <none>           <none>
kube-scheduler-master            1/1     Running             0                147m   172.23.199.15   master   <none>           <none>

# 使用kubectl describe XXX 排查pod状态的信息
[root@master core]# kubectl describe pod kube-proxy-9nxhp -n kube-system
...
Events:
  Type     Reason   Age                  From     Message
  ----     ------   ----                 ----     -------
  Warning  BackOff  8s (x695 over 150m)  kubelet  Back-off restarting failed container

解决方案：

在1.19版本之前,kubeadm部署方式启用ipvs模式时,初始化配置文件需要添加以下内容:

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
featureGates:
  SupportIPVSProxyMode: true
mode: ipvs

如果部署是1.20以上版本,又是使用kubeadm进行集群初始化时,虽然可以正常部署,但是查看pod情况的时候可以看到kube-proxy无法运行成功,报错部分内容如下:

[root@k8s-master01-15:~]# kubectl get pod -A|grep kube-proxy
kube-system kube-proxy-7vrbv 0/1 CrashLoopBackOff 9 43m
kube-system kube-proxy-ghs7h 0/1 CrashLoopBackOff 9 43m
kube-system kube-proxy-l9twb 0/1 CrashLoopBackOff 1 7s

查看日志信息

[root@k8s-master01-15:~]# kubectl logs kube-proxy-9qbwp  -n kube-system
E0216 03:00:11.595910       1 run.go:74] "command failed" err="failed complete: unrecognized feature gate: SupportIPVSProxyMode"

通过报错可以看到kube-proxy无法识别SupportIPVSProxyMode这个字段,于是访问官方查看最新版本ipvs开启的正确配置,通过https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/README.md#check-ipvs-proxy-rules，可以看到官方说明:

Cluster Created by Kubeadm
If you are using kubeadm with a configuration file, you have to add mode: ipvs in a KubeProxyConfiguration (separated by -- that is also passed to kubeadm init).

kubeProxy:
  config:
    mode: ipvs

由于集群已经初始化成功了,所以现在改kubeadm初始化配置文件没有意义,因为我们需要直接修改kube-proxy的启动配置

通过查看kube-pxory的资源清单可以知道, kube-proxy的配置文件是通过configmap方式挂载到容器中的,因此我们只需要对应修改configmap中的配置内容,就可以将无效字段删除

[root@k8s-master01-15:~]# kubectl -o yaml get pod -n kube-system kube-proxy-24tkb
...... # 其他内容省略
   99     volumes:
   100    - configMap:
   101        defaultMode: 420
   102        name: kube-proxy
   103      name: kube-proxy
...... # 其他内容省略

[root@k8s-master01-15:~]# kubectl get cm -n kube-system
NAME                                 DATA   AGE
coredns                              1      19h
extension-apiserver-authentication   6      19h
kube-flannel-cfg                     2      109m
kube-proxy                           2      19h
kube-root-ca.crt                     1      19h
kubeadm-config                       1      19h
kubelet-config-1.23                  1      19h

在编辑模式中找到以下字段,删除后保存退出

[root@k8s-master01-15:~]# kubectl edit cm kube-proxy -n kube-system　　　　
featureGates: 
  SupportIPVSProxyMode: true

然后将删除所有kube-proxy进行重启,查看pod运行情况

[root@k8s-master01-15:~]# watchpod -n kube-system
NAME                               READY   STATUS    RESTARTS       AGE
coredns-64897985d-c9t7s            1/1     Running   0              19h
coredns-64897985d-knvxg            1/1     Running   0              19h
etcd-master01                      1/1     Running   0              19h
kube-apiserver-master01            1/1     Running   0              19h
kube-controller-manager-master01   1/1     Running   0              19h
kube-flannel-ds-6lbmw              1/1     Running   15 (56m ago)   110m
kube-flannel-ds-97mkh              1/1     Running   15 (56m ago)   110m
kube-flannel-ds-fthvm              1/1     Running   15 (56m ago)   110m
kube-proxy-4jj7b                   1/1     Running   0              55m
kube-proxy-ksltf                   1/1     Running   0              55m
kube-proxy-w8dcr                   1/1     Running   0              55m
kube-scheduler-master01            1/1     Running   0              19h

在服务器上安装ipvsadm,查看ipvs模式是否启用成功

[root@k8s-master01-15:~]# yum install ipvsadm -y
[root@k8s-master01-15:~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.96.0.1:443 rr
  -> 172.23.142.233:6443          Masq    1      3          0         
TCP  10.96.0.10:53 rr
  -> 10.244.0.2:53                Masq    1      0          0         
  -> 10.244.0.3:53                Masq    1      0          0         
TCP  10.96.0.10:9153 rr
  -> 10.244.0.2:9153              Masq    1      0          0         
  -> 10.244.0.3:9153              Masq    1      0          0         
UDP  10.96.0.10:53 rr
  -> 10.244.0.2:53                Masq    1      0          0         
  -> 10.244.0.3:53                Masq    1      0          0

8）解决pod的IP无法ping通的问题

集群安装完成后, 启动一个pod

# 启动pod, 命名为nginx-offi, 里面运行的容器为从官网拉取的Nginx镜像
[root@k8s-master01-15:~]# kubectl run nginx-offi --image=nginx
pod/nginx-offi created

# 查看pod的运行信息, 可以看到状态为 "Running" ,IP为 "10.244.1.7", 运行在了 "k8s-node01-16" 节点上
[root@k8s-master01-15:~]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
nginx-offi      1/1     Running   0          55s   10.244.1.7   k8s-node01-16   <none>           <none>

但是如果在主节点k8s-master01-15 或另一个子节点 k8s-node02-17上访问刚才运行的pod, 却发现访问不到, 可以尝试 ping一下该IP地址：10.244.1.7也ping不通, 尽管前面我们已经安装好了flannel.

UP发现：是iptables 规则的问题, 前面我们在初始化服务器设置的时候清除了iptables的规则, 但主要原因是不是安装了 flannel 还是哪一步的问题, 会导致 iptables 里面又多出了规则

# 查看iptables
(root@k8s-master01-15:~)# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-FORWARD  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */
DOCKER-USER  all  --  0.0.0.0/0            0.0.0.0/0           
DOCKER-ISOLATION-STAGE-1  all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
DOCKER     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0           
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  0.0.0.0/0            0.0.0.0/0           

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
# Warning: iptables-legacy tables present, use iptables-legacy to see them

需要再次清空 iptables 规则

iptables -F &&  iptables -X &&  iptables -F -t nat &&  iptables -X -t nat

再次查看iptables

(root@k8s-master01-15:~)# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-FORWARD  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
# Warning: iptables-legacy tables present, use iptables-legacy to see them

再次ping或者访问pod, 即可成功

(root@k8s-master01-15:~)# curl 10.244.1.7
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

部署Kubeadm遇到的哪些问题，并且如何解决

1）设置错误导致kubeadm安装k8s失败

2）kubeadm安装k8s1.20.1版本添加node节点时出错

3）部署Kubernetes遇到的问题与解决方法（初始化等）

4）子节点服务器上运行kubectl get node却发现报错了, 如下

5) kubectl get cs 问题

6) k8s的coredns组件的问题（部署flannel网络插件）

解决方案：

7）发现 kube-proxy出现异常，状态：CrashLoopBackOff

解决方案：

8）解决pod的IP无法ping通的问题

推荐阅读更多精彩内容