(共556篇)
全部分类

k8s搭建全纪录
[ Docker ] 

k8s 搭建全纪录

注意: 本文是一次安装全纪录, 文中包含一些试错步骤, 如果想直接查看正确的安装步骤, 可直接拉到最下面的 **整理 master 节点部署流程**部分

环境

宿主机: window10 虚拟机: vmware 虚拟主机: 20.12/13/14/15 虚拟主机环境: CentOS7

其中 12 作为 master 节点, 13/14/15 作为 node

环境检查

  1. 由于是在虚拟机中安装 k8s, 虚拟机是复制出来的, 所以可能会出现相同的 MAC 地址 , 所以要先确保他们的 MAC 是否一致
1
2
3
4
5
6
7
8
9
[root@node12 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:5e:f2:96 brd ff:ff:ff:ff:ff:ff

# 或者下面的命令
[root@node12 ~]# cat /sys/class/dmi/id/product_uuid
01734D56-6F44-B2E3-E323-0B38105EF296
  1. 确保四个虚拟机之间能网络互通

安装 docker

docker-service 的安装参考官方文档

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

[root@node12 firewalld]# cd /etc/yum.repo.d
[root@node12 firewalld]# wget https://download.docker.com/linux/centos/docker-ce.repo
[root@node12 firewalld]# yum repolist
[root@node12 firewalld]# yum install docker-ce docker-ce-cli containerd.io docker-compose-plugin -y

# 激活开机启动服务
[root@node12 ~]# systemctl enable docker

# 启动服务
[root@node12 ~]# systemctl start docker

#修改docker的镜像源
[root@node12 docker]# cat /etc/docker/daemon.json
{
    "registry-mirrors": [
      "http://hub-mirror.c.163.com",
      "https://docker.mirrors.ustc.edu.cn/",
      "https://reg-mirror.qiniu.com"
    ]
}

# 重启服务
[root@node12 ~]# systemctl reload docker

安装容器运行时

k8s 提供了多种容器运行时, 本次部署使用通用的containerd

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
[root@node12 docker]# wget https://github.com/containerd/containerd/releases/download/v1.6.6/containerd-1.6.6-linux-amd64.tar.gz

[root@node12 ~]# tar Cxzvf /usr/local containerd-1.6.6-linux-amd64.tar.gz
bin/
bin/containerd-shim
bin/containerd
bin/containerd-shim-runc-v1
bin/containerd-stress
bin/containerd-shim-runc-v2
bin/ctr

[root@node12 docker]#  wget https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
[root@node12 docker]#  mv containerd.service /usr/local/lib/systemd/system/

[root@node12 ~]# systemctl daemon-reload
[root@node12 ~]# systemctl enable containerd
[root@node12 ~]# systemctl start containerd

# 安装 RUNC
[root@node12 docker]# wget https://github.com/opencontainers/runc/releases/download/v1.1.3/runc.amd64
[root@node12 ~]# install -m 755 runc.amd64 /usr/local/sbin/runc

# 安装 CNI PLUGIN
[root@node12 docker]# wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz
[root@node12 ~]# mkdir -p /opt/cni/bin
[root@node12 ~]# tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.1.1.tgz
./
./macvlan
./static
./vlan
./portmap
./host-local
./vrf
./bridge
./tuning
./firewall
./host-device
./sbr
./loopback
./dhcp
./ptp
./ipvlan
./bandwidth

安装 kubeadm

  1. 添加 kube 的 yum 源
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19

[root@node12 ~]# cat > /etc/yum.repos.d/kubernetes.repo << EOF

[kubernetes]

name=Kubernetes

baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64

enabled=1

gpgcheck=0

repo_gpgcheck=0

gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

EOF
[root@node12 ~]# yum repolist
  1. 安装基础组件
1
[root@node12 ~]# yum install -y kubeadm kubelet kubectl
  1. 查看安装的版本信息
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
[root@node12 docker]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-15T14:20:54Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

[root@node12 docker]# kubectl version --output=yaml
clientVersion:
  buildDate: "2022-06-15T14:22:29Z"
  compiler: gc
  gitCommit: f66044f4361b9f1f96f0053dd46cb7dce5e990a8
  gitTreeState: clean
  gitVersion: v1.24.2
  goVersion: go1.18.3
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4

The connection to the server localhost:8080 was refused - did you specify the right host or port?

[root@node12 docker]# kubelet --version
Kubernetes v1.24.2

初始化 k8s

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[root@node12 containerd]# kubeadm init
[init] Using Kubernetes version: v1.24.2
[preflight] Running pre-flight checks
	[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
	[WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
	[WARNING Hostname]: hostname "node12" could not be reached
	[WARNING Hostname]: hostname "node12": lookup node12 on 222.88.88.88:53: no such host
	[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

根据报错信息, 一步一步把预检警告信息排除

先把 firewalld 服务关掉(firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly):)

1
[root@node12 firewalld]# systemctl stop firewalld

暂时禁用 swap 分区(swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet)

1
[root@node12 firewalld]# swapoff -a

添加 node12 主机的映射(hostname “node12” could not be reached)

1
echo 127.0.0.1 node12 >> /etc/hosts

把 kubelet 服务加入系统启动服务(kubelet service is not enabled, please run ‘systemctl enable kubelet.service’)

1
2
[root@node12 sysctl.d]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.

修改配置文件(/proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1)

1
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
  1. 现在重新初始化 k8s:
1
2
3
4
5
6
7
8

# 重新初始化
[root@node12 containerd]# kubeadm init
[init] Using Kubernetes version: v1.24.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'

这次它卡在这里不动了, 我们换个命令重新初始化(这里只保留带有警告或者错误的行)

1
2
3
4
[root@node12 containerd]# kubeadm init --v=5
# ...
I0623 17:10:42.586219    3082 checks.go:834] using image pull policy: IfNotPresent
I0623 17:10:42.593875    3082 checks.go:851] pulling: k8s.gcr.io/kube-apiserver:v1.24.2

从日志中可以看出来, 是卡在 pull 镜像这一步了, 默认的 k8s 组件镜像由于国内网络问题不能访问, 所以我们加上国内镜像参数重新初始化(以下只保留带有警告或者错误的行)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[root@node12 ~]# kubeadm init --v=5 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
# ...
I0624 15:37:42.332030    4668 checks.go:403] checking whether the given node name is valid and reachable using net.LookupHost
	[WARNING Hostname]: hostname "node12" could not be reached
	[WARNING Hostname]: hostname "node12": lookup node12 on 222.88.88.88:53: no such host
# ...
[preflight] Some fatal errors occurred:
	[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
# ...

[WARNING Hostname]这一步, 我还没看出来是哪里的问题 先解决 ERROR

更新/proc/sys/net/ipv4/ip_forward 文件([ERROR FileContent–proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1)

1
[root@node12 ~]# echo 1 >  /proc/sys/net/ipv4/ip_forward

再来一次(以下只保留带有警告或者错误的行):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@node12 ~]# kubeadm init --v=5 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
# ...
I0624 15:41:37.853772    4689 checks.go:403] checking whether the given node name is valid and reachable using net.LookupHost
	[WARNING Hostname]: hostname "node12" could not be reached
	[WARNING Hostname]: hostname "node12": lookup node12 on 222.88.88.88:53: no such host
# ...
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
# ...

先看它的建议一:

1
2
3
4
5
6
7
This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

说可能是 kubelet 出问题了, 那就看看 kubelet 的状态:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@node12 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 五 2022-06-24 15:41:39 CST; 5min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 4763 (kubelet)
    Tasks: 14
   Memory: 39.0M
   CGroup: /system.slice/kubelet.service
           └─4763 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --contain...

6月 24 15:47:33 node12 kubelet[4763]: E0624 15:47:33.927489    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.028215    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.128942    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.229686    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.330417    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.431151    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"

# 这里有个failed的主要信息
6月 24 15:47:34 node12 kubelet[4763]: W0624 15:47:34.463426    4763 reflector.go:324] vendor/k8s.io/client-go/informers/factory.go:134: failed to list *v1.CSIDriver: Get "ht...tion refused
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.463478    4763 reflector.go:138] vendor/k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1...
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.531889    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:47:34 node12 kubelet[4763]: E0624 15:47:34.632610    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
Hint: Some lines were ellipsized, use -l to show in full.

可以看到 kubelet 服务正在运行, 但隐隐约约有个 failed 信息, 加个参数重新看一下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@node12 ~]# systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 五 2022-06-24 15:41:39 CST; 6min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 4763 (kubelet)
    Tasks: 14
   Memory: 39.0M
   CGroup: /system.slice/kubelet.service
           └─4763 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.7

6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.199534    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.300274    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.401027    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.501733    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.602478    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.703196    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.803954    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"

# 这句的意思可以看出来, 好像是apiserver没有启动成功
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.869054    4763 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.20.12:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node12?timeout=10s": dial tcp 192.168.20.12:6443: connect: connection refused
6月 24 15:48:31 node12 kubelet[4763]: E0624 15:48:31.904666    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"
6月 24 15:48:32 node12 kubelet[4763]: E0624 15:48:32.005441    4763 kubelet.go:2424] "Error getting node" err="node \"node12\" not found"

换个命令看一下更详细的日志

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[root@node12 ~]# journalctl -u kubelet | grep error

# 日志太多了, 截取一部分循环展示的信息
6月 24 16:29:01 node12 kubelet[4763]: E0624 16:29:01.260301    4763 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.20.12:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node12?timeout=10s": dial tcp 192.168.20.12:6443: connect: connection refused

6月 24 16:29:05 node12 kubelet[4763]: E0624 16:29:05.896088    4763 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

6月 24 16:29:07 node12 kubelet[4763]: E0624 16:29:07.347839    4763 remote_runtime.go:201] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.6\": failed to pull image \"k8s.gcr.io/pause:3.6\": failed to pull and unpack image \"k8s.gcr.io/pause:3.6\": failed to resolve reference \"k8s.gcr.io/pause:3.6\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.6\": dial tcp 74.125.23.82:443: i/o timeout"

6月 24 16:29:07 node12 kubelet[4763]: E0624 16:29:07.347883    4763 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.6\": failed to pull image \"k8s.gcr.io/pause:3.6\": failed to pull and unpack image \"k8s.gcr.io/pause:3.6\": failed to resolve reference \"k8s.gcr.io/pause:3.6\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.6\": dial tcp 74.125.23.82:443: i/o timeout" pod="kube-system/etcd-node12"

6月 24 16:29:07 node12 kubelet[4763]: E0624 16:29:07.347912    4763 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.6\": failed to pull image \"k8s.gcr.io/pause:3.6\": failed to pull and unpack image \"k8s.gcr.io/pause:3.6\": failed to resolve reference \"k8s.gcr.io/pause:3.6\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.6\": dial tcp 74.125.23.82:443: i/o timeout" pod="kube-system/etcd-node12"

6月 24 16:29:07 node12 kubelet[4763]: E0624 16:29:07.347960    4763 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-node12_kube-system(a24b428583b28071c521b2c26fbf6022)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-node12_kube-system(a24b428583b28071c521b2c26fbf6022)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull image \\\"k8s.gcr.io/pause:3.6\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.6\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.6\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.6\\\": dial tcp 74.125.23.82:443: i/o timeout\"" pod="kube-system/etcd-node12" podUID=a24b428583b28071c521b2c26fbf6022

这段日志中指出了两个问题:

1
2
1. Container runtime network not ready, cni plugin not initialized 容器运行时的网络模块没有准备好, 容器运行时的网络模块要等apiserver启动成功后, 再配置, 所以暂时先管他
2. 后面的几条日志, 都指向了一个问题: failed to get sandbox image k8s.grc.io/pause:3.6 先解决这个问题

k8s.grc.io/pause:3.6 是 containerd 中要使用的镜像, 但是前面我们下载过一个 registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.7 的镜像, 所以需要把 containerd 的配置修改一下

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 先把containerd的默认配置保存到指定文件
[root@node12 ~]# containerd config default > /etc/containerd/config.toml

# 找到
# sandbox_image = "k8s.grc.io/pause:3.6"
# 改为
# sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.7"

# 重启containerd
[root@node12 ~]# systemctl restart containerd

现在再看一下 kubelet 的日志

1
2
[root@node12 ~]# journalctl -f -u kubelet
6月 24 17:00:51 node12 kubelet[4763]: E0624 17:00:51.286205    4763 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

现在的日志里面只剩下一个错误了, 现在我们来安装一下 k8s 的网络插件 kube-flannel

1
2
3
[root@node12 ~]# kubectl apply -f /root/kube-flannel.yml
[root@node12 ~]# kubectl apply -f kube-flannel.yml
The connection to the server localhost:8080 was refused - did you specify the right host or port?

网络插件安装失败, 还是说我们的 kube-apiserver 没有启动成功, 现在重置一下 kubeadm 的初始化进度, 然后

再一次重新初始化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


[root@node12 ~]# kubectl reset
[root@node12 ~]# kubeadm init --v=5 --image-repository registry.cn-hangzhou.aliyuncs.com/google_container

# ... 看到以下这些就说明成功了
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.20.12:6443 --token nnos43.fq7rvr8347z29ope \
	--discovery-token-ca-cert-hash sha256:d723c9c4d43d3b2346e97624018ce90f866ce9df328718c81be322f05b7df595

现在按照上述提示, 执行一遍:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
[root@node12 ~]# mkdir -p $HOME/.kube
[root@node12 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@node12 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
[root@node12 ~]# export KUBECONFIG=/etc/kubernetes/admin.conf
[root@node12 ~]# kubectl apply -f kube-flannel.yml
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

现在可以正常通过 kubectl 查看节点信息了

1
2
3
[root@node12 ~]# kubectl get node
NAME     STATUS   ROLES           AGE   VERSION
node12   Ready    control-plane   11m   v1.24.2

整理 master 节点部署流程

在回顾上述部署步骤的时候, 发现在安装 docker 章节中, 除了安装 docker 相关服务, 还多了一个 containerd.io 服务, 经过验证, 这个服务的作用, 完全可以替代安装容器运行时的所有章节, 所以整理的 master 节点部署 k8s 脚本如下:

准备下述文件

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
# daemon.json
{
    "registry-mirrors": [
      "http://hub-mirror.c.163.com",
      "https://docker.mirrors.ustc.edu.cn/",
      "https://reg-mirror.qiniu.com"
    ]
}

# kube-flannel.yml
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp.flannel.unprivileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
  privileged: false
  volumes:
  - configMap
  - secret
  - emptyDir
  - hostPath
  allowedHostPaths:
  - pathPrefix: "/etc/cni/net.d"
  - pathPrefix: "/etc/kube-flannel"
  - pathPrefix: "/run/flannel"
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  # Capabilities
  allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: true
  hostPorts:
  - min: 0
    max: 65535
  # SELinux
  seLinux:
    # SELinux is unused in CaaSP
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
- apiGroups: ['extensions']
  resources: ['podsecuritypolicies']
  verbs: ['use']
  resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni-plugin
       #image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
        image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
        command:
        - cp
        args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        volumeMounts:
        - name: cni-plugin
          mountPath: /opt/cni/bin
      - name: install-cni
       #image: flannelcni/flannel:v0.18.1 for ppc64le and mips64le (dockerhub limitations may apply)
        image: rancher/mirrored-flannelcni-flannel:v0.18.1
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
       #image: flannelcni/flannel:v0.18.1 for ppc64le and mips64le (dockerhub limitations may apply)
        image: rancher/mirrored-flannelcni-flannel:v0.18.1
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
        - name: xtables-lock
          mountPath: /run/xtables.lock
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni-plugin
        hostPath:
          path: /opt/cni/bin
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate

# docker-ce.repo
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-stable-debuginfo]
name=Docker CE Stable - Debuginfo $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/debug-$basearch/stable
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-stable-source]
name=Docker CE Stable - Sources
baseurl=https://download.docker.com/linux/centos/$releasever/source/stable
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-test]
name=Docker CE Test - $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/$basearch/test
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-test-debuginfo]
name=Docker CE Test - Debuginfo $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/debug-$basearch/test
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-test-source]
name=Docker CE Test - Sources
baseurl=https://download.docker.com/linux/centos/$releasever/source/test
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-nightly]
name=Docker CE Nightly - $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/$basearch/nightly
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-nightly-debuginfo]
name=Docker CE Nightly - Debuginfo $basearch
baseurl=https://download.docker.com/linux/centos/$releasever/debug-$basearch/nightly
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

[docker-ce-nightly-source]
name=Docker CE Nightly - Sources
baseurl=https://download.docker.com/linux/centos/$releasever/source/nightly
enabled=0
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg

# k8s.repo
[kubernetes]

name=Kubernetes

baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64

enabled=1

gpgcheck=0

repo_gpgcheck=0

gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

然后进入上述文件所在的目录:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
systemctl stop firewalld
systemctl disable firewalld
swapoff -a
sed -i 's/\(.*swap.*\)/#&/' /etc/fstab

modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 >  /proc/sys/net/ipv4/ip_forward

cat docker-ce.repo > /etc/yum.repos.d/docker-ce.repo
cat k8s.repo > /etc/yum.repos.d/kubernetes.repo
yum repolist

yum install -y kubeadm kubelet kubectl docker-ce  containerd.io



containerd config default > /etc/containerd/config.toml
sed -i 's/k8s.gcr.io\/pause:3.6/registry.cn-hangzhou.aliyuncs.com\/google_containers\/pause:3.7/' /etc/containerd/config.toml

echo 192.168.20.12 node12 >> /etc/hosts
echo 192.168.20.13 node13 >> /etc/hosts
echo 192.168.20.14 node14 >> /etc/hosts
echo 192.168.20.15 node15 >> /etc/hosts

systemctl daemon-reload
systemctl enable containerd
systemctl start containerd
systemctl enable docker
systemctl start docker
cat daemon.json > /etc/docker/daemon.json

systemctl enable kubelet
systemctl start kubelet

# 使用flannel网络插件必须在init时 添加pod-network-cidr参数
kubeadm init --v=5 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf

# 安装网络插件之前, 查看一下pod的状态:
kubectl get pods -A
# NAMESPACE     NAME                             READY   STATUS    RESTARTS   AGE
# kube-system   coredns-7f74c56694-69mhm         0/1     Pending   0          23m
# kube-system   coredns-7f74c56694-r55hj         0/1     Pending   0          23m
# kube-system   etcd-node12                      1/1     Running   0          23m
# kube-system   kube-apiserver-node12            1/1     Running   0          23m
# kube-system   kube-controller-manager-node12   1/1     Running   0          23m
# kube-system   kube-proxy-hn2nc                 1/1     Running   0          23m
# kube-system   kube-scheduler-node12            1/1     Running   0          23m

kubectl apply -f kube-flannel.yml

# 安装网络插件之之后, 再查看一下pod的状态:
kubectl get pods -A
# NAMESPACE     NAME                             READY   STATUS              RESTARTS      AGE
# kube-system   coredns-7f74c56694-69mhm         0/1     ContainerCreating   0             28m
# kube-system   coredns-7f74c56694-r55hj         0/1     ContainerCreating   0             28m
# kube-system   etcd-node12                      1/1     Running             0             28m
# kube-system   kube-apiserver-node12            1/1     Running             0             28m
# kube-system   kube-controller-manager-node12   1/1     Running             0             28m
# kube-system   kube-flannel-ds-jsf28            0/1     CrashLoopBackOff    3 (46s ago)   2m1s
# kube-system   kube-proxy-hn2nc                 1/1     Running             0             28m
# kube-system   kube-scheduler-node12            1/1     Running             0             28m
echo 'Success'

如果没有其他意外: 当前控制台应当显示的内容应该像这样:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

部署子节点

根据要求, 所有子节点必须安装 容器运行时与 kubeadm, 所以子节点的部署与 master 节点都是差不多的, 只有init及之后的的步骤不同

1
2

[root@node13 ~]# kubeadm join 192.168.20.12:6443 --token 34kksk.ybm4xqq1x5mnayn4 --discovery-token-ca-cert-hash sha256:fa3cc12b200008e74240446443323f6b618061fd42d68775c07a238282c00572

如果没有意外, 加入集群后的显示内容应该是像这样的:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

此时在 node12 上可以看到当前子节点了

1
2
3
4
[root@node12 ~]# kubectl get node
NAME     STATUS   ROLES           AGE   VERSION
node12   Ready    control-plane   14m   v1.24.2
node13   Ready    <none>          13m   v1.24.2

通常我们 kubectl 无论在 master 还是 node 节点上都能够看到节点信息, 所以需要作如下操作

1
2
3
4
5
6
# 把master节点中的$HOME/.kube/config 复制到 node节点的$HOME/.kube/config

[root@node13 ~]# kubectl get node
NAME     STATUS   ROLES           AGE   VERSION
node12   Ready    control-plane   61m   v1.24.2
node13   Ready    <none>          60m   v1.24.2

安装 dashboard 面板

1
2
3
4
kubectl apply -f kube-dashboard.yaml
kubectl create serviceaccount liyun -n kubernetes-dashboard
kubectl create clusterrolebinding dash-liyun --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:liyun
kubectl create token liyun -n kubernetes-dashboard

然后在本地主机上安装好 kubectl 执行

1
kubectl proxy

就可以打开 kube-dashboard 的项目了

忘记 join 命令的 token 或者 token 过期了怎么办

init 生成的 token 是有有效期的, 如果这个 token 过期了, 或者忘记了, 可以使用如下命令重新生成新的 token

1
2
[root@node12 ~]# kubeadm token create --print-join-command
kubeadm join 192.168.20.12:6443 --token c7hct3.axbyfyiuj2xptd74 --discovery-token-ca-cert-hash sha256:bbf1fe9d085838927fb97538e389d03787311eff608af0af1850db4e15ee0839

使用生成的命令就可以重新加入集群