准备
centos7主机两台
192.168.209.102 node102
192.168.209.103 node103 (master)
设置hostname的方法
#设置hostname 的方法
hostnamectl set-hostname node102 #在 192.168.209.102 上执行
hostnamectl set-hostname node103 #在 192.168.209.103 上执行
hostnamectl --static #查看设置结果
软件版本
docker 18.09.0
kubelet-1.15.4 kubeadm-1.15.4 kubectl-1.15.4
helm 2.14.0
所有操作无特殊说明都需要在所有节点(k8s-master 和 k8s-node)上执行
环境
关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
如果不想启用防火墙,设置可以参考这里看一下kubernetes需要开放的端口 https://kubernetes.io/docs/setup/independent/install-kubeadm/#check-required-ports
关闭swap
kubernetes1.8开始不关闭swap无法启动
禁用交换分区
swapoff -a && sed -i '/ swap / s/^/#/' /etc/fstab
也就是去掉 /etc/fstab 里面这一行/dev/mapper/centos-swap swap swap defaults
禁用SELinux
将 SELinux 设置为 permissive 模式(将其禁用)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
安装docker
设置docker镜像源
curl -o /etc/yum.repos.d/Docker-ce-Ali.repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum makecache fast
运行命令查看可用docker镜像
yum list docker-ce --showduplicates | sort -r
安装指定版本的docker
yum install docker-ce-18.09.0-3.el7 -y
安装完毕后启动docker
systemctl start docker
如果不是从全新的环境开始安装,最好把docker清理干净
docker ps #为空
docker ps -a #为空**加粗样式**
docker network #只有默认三种网络
docker volume ls #为空
管理节点上 docker service ls #为空
修改iptables
CentOS 7上的一些用户报告了由于iptables被绕过而导致流量路由不正确的问题。创建/etc/sysctl.d/k8s.conf文件,添加如下内容
cat <<EOF > /etc/sysctl.d/k8s.conf
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
使配置生效
#使配置生效
modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf
加载ipvs模块
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
#这条命令有点长
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
开始
用kubeadm 部署 kubernetes
安装kubeadm, kubelet 注意: ==yum install 安装的时候一定要看一下kubernetes的版本号后面kubeadm init 的时候需要用到==
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF
#安装 注意::这里一定要看一下版本号,因为 Kubeadm init 的时候 填写的版本号不能低于kuberenete版本
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
#注 如果需要指定版本 用下面的命令 kubelet-<version>
yum install kubelet-1.15.4 kubeadm-1.15.4 kubectl-1.15.4 --disableexcludes=kubernetes
#启动 kubelet
systemctl enable kubelet.service && systemctl start kubelet.service
#查看 kubelet 状态
[root@node103 ~]# systemctl status kubelet.service
#查看出错信息
[root@node103 ~]# journalctl -xefu kubelet
启动kubelet.service之后 我们查看一下kubelet状态是未启动状态,查看原因发现是 “/var/lib/kubelet/config.yaml”文件不存在,这里可以暂时先不用处理,当kubeadm init 之后会创建此文件
Master 节点
kubeadm init \
--apiserver-advertise-address=192.168.209.103 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.15.4 \
--pod-network-cidr=10.244.0.0/16 \
--token-ttl 0
--apiserver-advertise-address 对应的是master节点的IP
--image-repository 设置为阿里云的镜像,防止部分镜像无法拉取
--kubernetes-version 关闭版本探测,因为它的默认值是stable-1,会从https://storage.googleapis.com/kubernetes-release/release/stable-1.txt下载最新的版本号,指定版本跳过网络请求,再次强调==一定要和Kubernetes版本号一致==
[init] Using Kubernetes version: v1.15.4
[preflight] Running pre-flight checks
......
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.209.103:6443 --token 9yh5wi.15p63wyw19kkzxsl \
--discovery-token-ca-cert-hash sha256:e6ddd0a8419514ab5e98bceea40b347aaf8b2044db9fceade811da7bec51c362
初始化成功后会提示在使用之前需要再配置一下,配置方法已经给出,另外会生成一个临时token以及增加节点的方法
#普通用户要使用k8s 需要执行下面操作
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#如果是root 可以直接执行
export KUBECONFIG=/etc/kubernetes/admin.conf
# 以上两个二选一即可,这里我是直接用的root 所以直接执行
export KUBECONFIG=/etc/kubernetes/admin.conf
现在我们查看一下 kubelet 的状态 已经是 running 状态 ,启动成功
[root@node103 ~]# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sat 2019-10-12 14:24:49 CST; 47min ago
Docs: https://kubernetes.io/docs/
Main PID: 2092 (kubelet)
CGroup: /system.slice/kubelet.service
└─2092 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf -...
Oct 12 14:25:11 node103 kubelet[2092]: E1012 14:25:11.090966 2092 kuberuntime_manager.go:692] createPodSandbox for pod "coredns-bccdc95cf...
Oct 12 14:25:11 node103 kubelet[2092]: E1012 14:25:11.091062 2092 pod_workers.go:190] Error syncing pod a6dd2502-9bcf-426e-a5a8-5...a8-53caf
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.522357 2092 docker_sandbox.go:384] failed to read pod IP from plugin/docker: Networ...
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.536992 2092 pod_container_deletor.go:75] Container "40bb67673e4888b1b69b9f4...ntainers
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.547593 2092 cni.go:309] CNI failed to retrieve network namespace path: cann...25eb6a8"
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.548785 2092 docker_sandbox.go:384] failed to read pod IP from plugin/docker: Networ...
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.562485 2092 pod_container_deletor.go:75] Container "e9e61a73a517720c96508bd...ntainers
Oct 12 14:25:11 node103 kubelet[2092]: W1012 14:25:11.566360 2092 cni.go:309] CNI failed to retrieve network namespace path: cann...bb38bc4"
Oct 12 14:25:12 node103 kubelet[2092]: W1012 14:25:12.806694 2092 pod_container_deletor.go:75] Container "cfb51541812189c587dbdf8...ntainers
Oct 12 14:25:12 node103 kubelet[2092]: W1012 14:25:12.843299 2092 pod_container_deletor.go:75] Container "27626aad865161213e9e2e1...ntainers
Hint: Some lines were ellipsized, use -l to show in full.
查看状态
确认每个 组件都是 Healthy 状态
[root@node103 ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
查看node状态
[root@node103 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
node103 NotReady master 18h v1.15.4
安装port Network( flannel )
k8s cluster 工作 必须安装pod网络,否则pod之间无法通信,k8s支持多种方案,这里选择flannel
[root@node103 ~]# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.extensions/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.extensions/kube-flannel-ds-amd64 created
daemonset.extensions/kube-flannel-ds-arm64 created
daemonset.extensions/kube-flannel-ds-arm created
daemonset.extensions/kube-flannel-ds-ppc64le created
daemonset.extensions/kube-flannel-ds-s390x created
检查Pod状态
[root@node103 ~]# kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-bccdc95cf-4x6dk 1/1 Running 10 13h 10.244.0.4 node103 <none> <none>
kube-system coredns-bccdc95cf-tjvv9 1/1 Running 10 13h 10.244.0.5 node103 <none> <none>
kube-system etcd-node103 1/1 Running 3 13h 192.168.209.103 node103 <none> <none>
kube-system kube-apiserver-node103 1/1 Running 3 13h 192.168.209.103 node103 <none> <none>
kube-system kube-controller-manager-node103 1/1 Running 3 13h 192.168.209.103 node103 <none> <none>
kube-system kube-flannel-ds-amd64-qxchh 1/1 Running 1 13h 192.168.209.103 node103 <none> <none>
kube-system kube-proxy-kpbkx 1/1 Running 3 13h 192.168.209.103 node103 <none> <none>
kube-system kube-scheduler-node103 1/1 Running 3 13h 192.168.209.103 node103 <none> <none>
好事多磨,kube-flannel-ds-amd64-qxchh
, coredns-bccdc95cf-tjvv9
.coredns-bccdc95cf-4x6dk
这几个服务不正常折腾了我将近一天的时间。现在把遇到的问题和解决方案简略的讲一下:
查看错误最好的方法就是根据日志定位错误,查询pod状态信息的命令如下
kubectl describe pod coredns-bccdc95cf-4x6dk --namespace=kube-system
- 镜像包下不下来
设置阿里云镜像,单独执型docker pull命令,我遇到了这个包无法pull failed的问题
docker pull quay.io/coreos/flannel:v0.11.0-amd64
- 发现报错plugin flannel does not support config version
修改配置文件
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
systemctl daemon-reload
- 使用 kubectl 命令是报错
[root@node103 ~]# kubectl get pod
The connection to the server localhost:8080 was refused - did you specify the right host or port?
原因: 由于使用kubeadm安装的k8s ,所以需要使用 kubernetes-admin 来运行。
解决方法: (如果admin.conf没有就从master节点上copy一份到当前节点)
- dial tcp 10.96.0.1:443: getsockopt: no route to host --- kubernetes(k8s)DNS 服务反复重启
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker
增加节点
在从节点node102执行如下命令:
kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>
<master-ip>
:<master-port>
,本文这里对应得是192.168.209.103:6443
token,一般token两天就过期了,如果过期了你需要重新创建(查看token命令是kubeadm token list,创建token命令是kubeadm token create),如下
[root@node103 ~]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
9yh5wi.15p63wyw19kkzxsl <forever> <never> authentication,signing The default bootstrap token generated by 'kubeadm init'. system:boot
[root@node103 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
e6ddd0a8419514ab5e98bceea40b347aaf8b2044db9fceade811da7bec51c362
# 增加子节点的命令如下:
[root@node103 ~]# kubeadm join 192.168.209.103:6443 --token 9yh5wi.15p63wyw19kkzxsl --discovery-token-ca-cert-hash sha256:e6ddd0a8419514ab5e98bceea40b347aaf8b2044db9fceade811da7bec51c362
用kubeadm 增加节点的方法::有时 忘记token 或 token过期,以及查看 --discovery-token-ca-cert-hash 的方法
#查看当前存在的token
[root@]# kubeadm token list
#生成新的token
[root@]# kubeadm token create
#再次查看已有的token 发现多了一个
[root@]# kubeadm token list
#查看 --discovery-token-ca-cert-hash 方法
[root@]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
#删掉oken
[root@]# kubeadm token delete token字符串
bootstrap token with id "hpvhe4" deleted
执行完成之后查看节点
[root@node103 linux-amd64]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node102 Ready <none> 173m v1.15.4
node103 Ready master 17h v1.15.4
[root@node103 linux-amd64]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-4x6dk 1/1 Running 10 17h
coredns-bccdc95cf-tjvv9 1/1 Running 10 17h
etcd-node103 1/1 Running 3 17h
kube-apiserver-node103 1/1 Running 3 17h
kube-controller-manager-node103 1/1 Running 3 17h
kube-flannel-ds-amd64-qxchh 1/1 Running 1 16h
kube-flannel-ds-amd64-txjq8 1/1 Running 0 174m
kube-proxy-4fgbn 1/1 Running 0 174m
kube-proxy-kpbkx 1/1 Running 3 17h
kube-scheduler-node103 1/1 Running 3 17h
tiller-deploy-7bbf796b9c-f4sws 1/1 Running 0 2m51s
删除node
删除节点之后,节点想再次加入到集群中 需要先执行 kubeadm reset , 之后再执行 kubeadm join
[root@node103 ~]# kubectl delete node node102
---node102 节点名称,当然不只这一种删除pod的方法,我这里不一一列出了
安装helm
helm
在helm的github上下载想要安装的helm版本包,本文为helm-v2.14.0-linux-amd64.tar.gz
[root@node103 ~]# tar -zxvf helm-v2.14.0-linux-amd64.tar.gz
[root@node103 ~]# mv linux-amd64/helm /usr/bin
tiller
初始化并验证 Helm,这样就会自动安装服务器端Tiller。注意:由于国内网络的问题,在安装 Tiller 的时候,需要下载镜像 gcr.io/kubernetes-helm/tiller:v2.14.0,很有可能会安装失败。所以我们这里使用阿里镜像来安装Tiller。
[root@node103 linux-amd64]# helm init --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.0 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
Creating /root/.helm
Creating /root/.helm/repository
Creating /root/.helm/repository/cache
Creating /root/.helm/repository/local
Creating /root/.helm/plugins
Creating /root/.helm/starters
Creating /root/.helm/cache/archive
Creating /root/.helm/repository/repositories.yaml
Adding stable repo with URL: https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
Adding local repo with URL: http://127.0.0.1:8879/charts
$HELM_HOME has been configured at /root/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
1.6开始,API Server启用了RBAC授权。而Tiller部署没有定义授权的ServiceAccount,这会导致访问API Server时被拒绝。我们可以采用如下方法,为Tiller部署添加授权
kubectl create serviceaccount --namespace kube-system tiller
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
若没有找到该pods,则重新执行命令
helm init --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.0 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
如果需要删除服务端,可以使用下面的命令
helm reset
helm reset -f
当要移除helm init创建的目录等数据时,执行helm reset --remove-helm-home
问题
-
Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 192.168.209.102:10250: connect: no route to host
我重启了一下虚拟机就好了。应该是iptables的问题,解决方式参考上面的问题4.
参考文档