Kubernetes集群:Master节点部署

晚上看了一篇云原生的链接,突然想自己部署一个Kubernetes集群玩玩,说干就干,相比以前ALL-In-One的OpenStack部署,Kubernetes还是简单太多,本文主要是部署Master节点,直接通过kubeadm来部署,最终所有组件正常,基本没遇到比较大的问题

首先是部署环境:

VMware Fusion:CentOS7虚拟机,这里就安装了一个Mini版,需要什么再自行进行安装

网络:NAT方式,最终用Host-Only应该也OK

资源:2C,4G,20G,MAC资源还是够的,后面如果可以部署集群再适当分配

其次是一些前提:

1、关闭swap分区,/etc/fstab里注释下面一行

#/dev/mapper/centos_2020-swap swap                    swap    defaults        0 0

2、关闭selinux,/etc/selinux/config里修改下面一行

SELINUX=disabled

3、修改iptables内核参数和开启路由转发,如果嫌烦部署的话先直接关掉firewalld好了,一堆端口要放开

[root@2020 lihui]# cat /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
et.ipv4.ip_forward=1
vm.swappiness=0
[root@2020 lihui]# sysctl --system

然后就开始干正事了,配置yum源直接安装

docker-ce源:

[root@2020 lihui]# cd /etc/yum.repos.d/
[root@2020 yum.repos.d]# wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
[root@2020 yum.repos.d]# yum clean all
[root@2020 yum.repos.d]# yum repolist

kubernetes源:

[root@2020 lihui]# cat /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
gpgcheck=0
enable=1

安装它们

yum install docker-ce kubeadm

启动服务

systemctl start docker
systemctl start kubelet

到这里docker和kubelet就已经部署启动好了,接下来就是其它的组件,直接通过容器的方式部署;但是由于某些原因,对应的docker镜像是没法直接拉下来的,因此可以自行找办法pull下来,然后打tag即可,在此之前需要知道组件的版本,不然的话也没法pull,可以通过下面两步获取版本号

[root@2020 ~]# kubeadm config print init-defaults --kubeconfig MasterConfiguration > kubeadm.yaml
W0108 22:36:32.862368    3395 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0108 22:36:32.862424    3395 validation.go:28] Cannot validate kubelet config - no validator is available
[root@2020 ~]# vim kubeadm.yaml
[root@2020 ~]# kubeadm config images list --config kubeadm.yaml
W0108 22:36:51.939616    3416 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0108 22:36:51.939670    3416 validation.go:28] Cannot validate kubelet config - no validator is available
k8s.gcr.io/kube-apiserver:v1.17.0
k8s.gcr.io/kube-controller-manager:v1.17.0
k8s.gcr.io/kube-scheduler:v1.17.0
k8s.gcr.io/kube-proxy:v1.17.0
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.4.3-0
k8s.gcr.io/coredns:1.6.5

这样就可以写一个简单的脚本来执行下载操作

#!/bin/bash

APISERVER=v1.17.0
MANAGER=v1.17.0
SCHEDULER=v1.17.0
PROXY=v1.17.0
PAUSE=3.1
ETCD=3.4.3-0
COREDNS=1.6.5

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:$APISERVER
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:$MANAGER
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:$SCHEDULER
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:$PROXY
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$COREDNS

docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:$APISERVER k8s.gcr.io/kube-apiserver:$APISERVER
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:$MANAGER k8s.gcr.io/kube-controller-manager:$MANAGER
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:$SCHEDULER k8s.gcr.io/kube-scheduler:$SCHEDULER
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:$PROXY k8s.gcr.io/kube-proxy:$PROXY
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE k8s.gcr.io/pause:$PAUSE
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD k8s.gcr.io/etcd:$ETCD
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$COREDNS k8s.gcr.io/coredns:$COREDNS

docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:$APISERVER
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:$MANAGER
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:$SCHEDULER
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:$PROXY
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/pause:$PAUSE
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:$ETCD
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:$COREDNS

执行之后,就可以看到拉下来的镜像

[root@2020 lihui]# docker image ls
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
k8s.gcr.io/kube-proxy                v1.17.0             7d54289267dc        4 weeks ago         116MB
k8s.gcr.io/kube-apiserver            v1.17.0             0cae8d5cc64c        4 weeks ago         171MB
k8s.gcr.io/kube-controller-manager   v1.17.0             5eb3b7486872        4 weeks ago         161MB
k8s.gcr.io/kube-scheduler            v1.17.0             78c190f736b1        4 weeks ago         94.4MB
k8s.gcr.io/coredns                   1.6.5               70f311871ae1        2 months ago        41.6MB
k8s.gcr.io/etcd                      3.4.3-0             303ce5db0e90        2 months ago        288MB
k8s.gcr.io/pause                     3.1                 da86e6ba6ca1        2 years ago         742kB

接着,修改了上面重定向的YAML文件,这里做了基础改动

advertiseAddress:我设置的是部署机器的NAT网络分配的IP地址

serviceSubnet:这个子网我设置的是docker网桥的子网段,凭service这个字段猜测的,如果不对,服务有异常再改

imageRepository:修改为阿里云的仓库

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.16.247.132
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
name: "2020"
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.17.0
networking:
dnsDomain: cluster.local
serviceSubnet: 172.17.0.0/16
scheduler: {}

接着执行init操作,可以看到一堆警告

[root@2020 ~]# kubeadm init --config kubeadm.yaml
W0108 23:10:08.023913   15723 validation.go:28] Cannot validate kube-proxy config - no validator is available
W0108 23:10:08.023961   15723 validation.go:28] Cannot validate kubelet config - no validator is available
[init] Using Kubernetes version: v1.17.0
[preflight] Running pre-flight checks
	[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
	[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
	[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'

有的是端口没开放,有的是服务没设置开机启动,既然如此,就回退一步,直接执行reset就可以撤销

[root@2020 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "2020" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
W0108 23:17:01.884181   18453 removeetcdmember.go:61] [reset] failed to remove etcd member: error syncing endpoints with etc: etcdclient: no available endpoints
.Please manually remove this etcd member using etcdctl
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

接着,直接将防火墙关掉,有需要部署完了在开放端口好了

[root@2020 ~]# systemctl stop firewalld
[root@2020 ~]# iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
[root@2020 ~]# systemctl enable docker.service
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@2020 ~]# systemctl enable kubelet.service
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.

再次执行init

[root@2020 ~]# kubeadm init --config kubeadm.yaml

最后有这样几个成功字眼

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.16.247.132:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:8bb43d2d4c8aa40f19831be3cf0ae7b8a6a4e78bf40a7d53f20e93db6079f499

上面嗨提示我们To start using your cluster,必须进行下面操作,就照做好了,这部分是因为Kubernetes集群需要加密访问,默认会使用这个目录下授权信息来访问集群

[root@2020 ~]# mkdir -p .kube
[root@2020 ~]# cp -i /etc/kubernetes/admin.conf .kube/config
[root@2020 ~]# chown $(id -u):$(id -g) .kube/config
[root@2020 ~]# ls -l .kube/config
-rw-------. 1 root root 5450 Jan  8 23:25 .kube/config

到这里就可以来查看节点的状态了

[root@2020 ~]# kubectl get nodes
NAME   STATUS     ROLES    AGE     VERSION
2020   NotReady   master   5m55s   v1.17.0

但是状态却是NotReady,查看一下本节点各个pod的情况,coredns是Pending

[root@2020 ~]# kubectl get pods -n kube-system
NAME                           READY   STATUS    RESTARTS   AGE
coredns-9d85f5447-p9jbh        0/1     Pending   0          11m
coredns-9d85f5447-tvnd2        0/1     Pending   0          11m
etcd-2020                      1/1     Running   0          11m
kube-apiserver-2020            1/1     Running   0          11m
kube-controller-manager-2020   1/1     Running   0          11m
kube-proxy-4bs8c               1/1     Running   0          11m
kube-scheduler-2020            1/1     Running   0          11m

通过describe可以看到,原因是网络没有部署的原因,所以上面依赖网络的CoreDNS也是Pending的状态

[root@2020 ~]# kubectl describe node 2020
Name:               2020
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=2020
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 08 Jan 2020 23:20:30 +0800
Taints:             node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  2020
  AcquireTime: 
  RenewTime:       Wed, 08 Jan 2020 23:29:34 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 08 Jan 2020 23:26:05 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 08 Jan 2020 23:26:05 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 08 Jan 2020 23:26:05 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 08 Jan 2020 23:26:05 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

至于网络,直接通过Kubernetes的apply就行了

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

执行完了之后,要等待一段时间,一直状态么有改变,我还以为有问题

查看健康状态

[root@2020 ~]# kubectl get cs
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true"}

瞧,这样就全好了

[root@2020 ~]# kubectl get pods -n kube-system
NAME                           READY   STATUS    RESTARTS   AGE
coredns-9d85f5447-p9jbh        1/1     Running   0          9h
coredns-9d85f5447-tvnd2        1/1     Running   0          9h
etcd-2020                      1/1     Running   0          9h
kube-apiserver-2020            1/1     Running   0          9h
kube-controller-manager-2020   1/1     Running   0          9h
kube-proxy-4bs8c               1/1     Running   0          9h
kube-scheduler-2020            1/1     Running   0          9h
weave-net-jcn82                2/2     Running   0          9h

可以看到嗨新建了一个weave-net-jcn82的Pod,就是容器网络插件在节点上的控制组件

最后再来看看本node状态就正常了

[root@2020 ~]# kubectl get nodes
NAME   STATUS   ROLES    AGE   VERSION
2020   Ready    master   9h    v1.17.0

整个Master节点的组件情况就如下了:

[root@2020 ~]# kubectl describe node 2020
Name:               2020
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=2020
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/master=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 08 Jan 2020 23:20:30 +0800
Taints:             node-role.kubernetes.io/master:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  2020
  AcquireTime: 
  RenewTime:       Thu, 09 Jan 2020 08:59:04 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 09 Jan 2020 00:03:34 +0800   Thu, 09 Jan 2020 00:03:34 +0800   WeaveIsUp                    Weave pod has set this
  MemoryPressure       False   Thu, 09 Jan 2020 08:54:20 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 09 Jan 2020 08:54:20 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 09 Jan 2020 08:54:20 +0800   Wed, 08 Jan 2020 23:20:26 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 09 Jan 2020 08:54:20 +0800   Thu, 09 Jan 2020 00:03:44 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  172.16.247.132
  Hostname:    2020
Capacity:
  cpu:                2
  ephemeral-storage:  17394Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             4026228Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  16415037823
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3923828Ki
  pods:               110
System Info:
  Machine ID:                 4f699d9051a64357b561f19ea582feb9
  System UUID:                86764D56-2F54-C137-9462-7C851A6592CE
  Boot ID:                    3e40ef3c-3d70-40c0-8915-9c7d0b3b1cf6
  Kernel Version:             3.10.0-1062.el7.x86_64
  OS Image:                   CentOS Linux 7 (Core)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.5
  Kubelet Version:            v1.17.0
  Kube-Proxy Version:         v1.17.0
Non-terminated Pods:          (8 in total)
  Namespace                   Name                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                            ------------  ----------  ---------------  -------------  ---
  kube-system                 coredns-9d85f5447-p9jbh         100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)     9h
  kube-system                 coredns-9d85f5447-tvnd2         100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)     9h
  kube-system                 etcd-2020                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         9h
  kube-system                 kube-apiserver-2020             250m (12%)    0 (0%)      0 (0%)           0 (0%)         9h
  kube-system                 kube-controller-manager-2020    200m (10%)    0 (0%)      0 (0%)           0 (0%)         9h
  kube-system                 kube-proxy-4bs8c                0 (0%)        0 (0%)      0 (0%)           0 (0%)         9h
  kube-system                 kube-scheduler-2020             100m (5%)     0 (0%)      0 (0%)           0 (0%)         9h
  kube-system                 weave-net-jcn82                 20m (1%)      0 (0%)      0 (0%)           0 (0%)         9h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                770m (38%)  0 (0%)
  memory             140Mi (3%)  340Mi (8%)
  ephemeral-storage  0 (0%)      0 (0%)
Events:              

OVER

发表回复