百度已收录

基于Centos8 - K8S集群部署及常用命令

一、基本环境准备

节点信息 角色 配置
192.168.0.21 master 8C16G100G
192.168.0.22 node01 8c32G100G
192.168.0.23 node02 8c32G100G
192.168.0.24 node03 8c32G100G

1、设置不同的hostname

# master执行
hostnamectl set-hostname master

# node01执行
hostnamectl set-hostname node01

# node02执行
hostnamectl set-hostname node02

# node03执行
hostnamectl set-hostname node03

退出terminal,重新登录验证主机名是否修改成功

2、关闭firewalld服务

# 关闭防火墙
systemctl stop firewalld

# 禁止开机自启
systemctl disable firewalld

3、设置host与ip绑定,每个节点都要操作

vim /etc/hosts

192.168.0.21 master
192.168.0.22 node01
192.168.0.23 node02
192.168.0.24 node03

配置完成后依次`ping 主机名`查看是否可以正常解析

4、所有机器配置免密登录,方法过于简单,自行百度

5、所有节点配置时间同步

# 查看是否已经安装chrony
rpm -qa |grep chrony

# 没有安装请执行
yum install chrony -y

# 启动chronyd服务
systemctl start chronyd

# 开机自启
systemctl enable chronyd

# 编辑`master`节点`chrony`配置
cat >> /etc/chrony.conf<<EOF
server 192.168.0.21 iburst
server 127.0.0.1 iburst
# 同步阿里云时间
pool ntp1.aliyun.com iburst
pool ntp2.aliyun.com iburst
pool ntp3.aliyun.com iburst
pool ntp4.aliyun.com iburst
pool ntp5.aliyun.com iburst
pool ntp6.aliyun.com iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
# 允许同步的网段
allow 192.168.0.0/24
local stratum 10
logdir /var/log/chrony
EOF

# 编辑`node01/02/03`节点`chrony`配置
cat >> /etc/chrony.conf<<EOF
# 同步服务端设置为master节点,使node节点与master节点时间保持一致
server 192.168.0.21 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
local stratum 10
logdir /var/log/chrony
EOF

# 在master节点和所有node节点查看是否同步,可使用`date`或`timedatectl`查看

timedatectl

重点查看 - `System clock synchronized: yes`

二、安装docker

本章节操作在k8s集群所有机器(即master、所有node)都需要执行成功

1、移除以前的docker(新机器则忽略这步)

yum remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate docker-logrotate docker-engine

2、配置yum源

yum install -y yum-utils
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

3、安装指定版本的docker并设置开机自启动

yum install -y docker-ce-20.10.7 docker-ce-cli-20.10.7  containerd.io-1.4.6

# 设置系统在启动时自动启动Docker服务
systemctl enable docker --now

以上第三步如报错:

报错提示:
docker-scan-plugin-0.23.0-3.el FAILED https://mirrors.aliyun.com/docker-ce/linux/centos/7/x86_64/stable/Packages/docker-scan-plugin-0.23.0-3.el7.x86_64.rpm: [Errno -1] 软件包与预期下载的不符

请参考如下解决方案:

# 清理YUM缓存,以确保下载的是最新的软件包信息
yum clean all
# 尝试重新下载软件包
yum install docker-scan-plugin
# 如果问题依旧存在,可以尝试直接从Docker官方仓库或其他可用的镜像仓库下载软件包。例如,可以尝试从Docker官方仓库下载
yum install --enablerepo=docker-ce-stable docker-scan-plugin

解决后再次执行第三步骤安装docker

4、配置docker加速

mkdir -p /etc/docker

cat >> /etc/docker/daemon.json <<EOF
{
  # 加速镜像
  "registry-mirrors": ["https://docker.nju.edu.cn/","https://mo7q8ico.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"],
  # 配置不安全的注册表
  "insecure-registries": ["http://easzlab.io.local:5000","192.168.50.32","http://192.168.50.202"],
  # 最大并发下载
  "max-concurrent-downloads": 10,
  "log-driver": "json-file",
  "log-opts": {
      "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

# 刷新配置
systemctl daemon-reload

# 重启docker
systemctl restart docker

5、查看docker是否启动成功

systemctl status docker

三、安装kubeadm

本章节操作在k8s集群所有机器(即master、所有node)都需要执行成功

1、设置基础环境

# 将 SELinux 设置为 permissive 模式(相当于将其禁用)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# 关闭swap交换区
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab

# 允许 iptables 检查桥接流量
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sudo sysctl --system

2、安装

Kubelet:运行在cluster所有节点上,负责启动POD和容器
Kubeadm:用于初始化cluster的一个工具
Kubectl:kubectl是kubenetes命令行工具,通过kubectl可以部署和管理应用,查看各种资源,创建,删除和更新组件

# 配置K8S下载的地址
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
   http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

# 安装3大件
yum install -y kubelet-1.20.9 kubeadm-1.20.9 kubectl-1.20.9 --disableexcludes=kubernetes

# 启动kubelet
systemctl enable --now kubelet

3、查看kubeadm、kubelet和kubectl 是否安装成功

[root@master ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"7a576bc3935a6b555e33346fd73ad77c925e9e4a", GitTreeState:"clean", BuildDate:"2021-07-15T21:00:30Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
[root@master ~]# kubelet --version
Kubernetes v1.20.9
[root@master ~]# kubectl version --client
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"7a576bc3935a6b555e33346fd73ad77c925e9e4a", GitTreeState:"clean", BuildDate:"2021-07-15T21:01:38Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}

4、 设置k8s服务自启动

systemctl enable kubelet

四、部署kubetnets

该操作只需要在master节点机器上执行

1、命令执行

#原命令
kubeadm init --kubernetes-version=1.19.0 --apiserver-advertise-address=master的ip --image-repository registry.aliyuncs.com/google_containers --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 

#根据机器实际修改后的命令
kubeadm init --kubernetes-version=1.20.9 --apiserver-advertise-address=192.168.0.21 --image-repository registry.aliyuncs.com/google_containers --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16

以上命令中参数解析:

param desc
--kubernetes-version=1.20.9 指定k8s版本
--apiserver-advertise-address=192.168.0.21 master主机的IP地址
--image-repository registry.aliyuncs.com/google_containers 镜像地址,由于国外地址无法访问,故使用的阿里云仓库地址:registry.aliyuncs.com/google_containers
--service-cidr=10.96.0.0/12 这个参数后的IP地址直接就套用10.96.0.0/12 ,以后安装时也套用即可,不要更改
--pod-network-cidr=10.244.0.0/16 k8s内部的pod节点之间网络可以使用的IP段,不能和service-cidr写一样

网段问题,两个网段不要重,后面是/16,也不要与当前机器网段一样

若在master节点上初始化kubetnets失败,如何重新执行kubeadm init相关命令

请参考以下步骤执行

# 将Kubernetes节点重置回未初始化状态(停止并清理所有Kubernetes服务,删除所有运行的Pod,清理网络接口,并删除所有kubeadm创建的配置文件)
kubeadm reset

# 用于删除/etc/kubernetes/目录下的所有文件和子目录。这个目录通常包含了Kubernetes的配置文件和证书
rm -rf /etc/kubernetes/*

# 使用docker ps -aq列出所有Docker容器的ID,然后通过管道将ID传递给xargs docker rm -f命令,后者会强制删除这些容器
docker ps -aq | xargs docker rm -f

# 使用docker images -q列出所有Docker镜像的ID,然后通过管道将ID传递给xargs docker rmi -f命令,后者会强制删除这些镜像
docker images -q | xargs docker rmi -f

# 禁用系统的所有交换空间(Kubernetes推荐在运行时禁用交换空间,因为它可能会导致性能问题)
swapoff -a

# 重启系统,确保系统干净地启动,并且没有遗留的进程或配置
reboot

2、如果看到如下successfully,那么恭喜你成功啦

[root@maste ~]# kubeadm init --kubernetes-version=1.20.9 --apiserver-advertise-address=192.168.0.21 --image-repository registry.aliyuncs.com/google_containers --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.9
[preflight] Running pre-flight checks
    [WARNING FileExisting-tc]: tc not found in system path
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 19.03
    [WARNING Hostname]: hostname "maste" could not be reached
    [WARNING Hostname]: hostname "maste": lookup maste on [fdee:c72f:89c2::1]:53: no such host
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local maste] and IPs [10.96.0.1 192.168.0.21]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost maste] and IPs [192.168.0.21 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost maste] and IPs [192.168.0.21 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 15.504352 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node maste as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node maste as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: s99dx8.qr05h9xva9udu4x6
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.0.21:6443 --token s99dx8.qr05h9xva9udu4x6 \
    --discovery-token-ca-cert-hash sha256:df312f88d0e723fa7acf03b689cae1873c13d2fe8d31707ed1c36f5d6630a39f

3、再根据控制台日志提示命令结果在对应机器上执行

  • Master
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
  • Node
#所有node节点都需要加入到集群中

# 注:命令行中的“\”换行符要记得去掉

kubeadm join 192.168.0.21:6443 --token s99dx8.qr05h9xva9udu4x6 --discovery-token-ca-cert-hash sha256:df312f88d0e723fa7acf03b689cae1873c13d2fe8d31707ed1c36f5d6630a39f

4、Node机器都执行完成后,在master节点机器执行该命令

# 此时,节点状态应该为NotReady
kubectl get nodes

五、安装部署CNI网络插件

1、网络插件在flannel和calico中二选一,在Master上执行,如果是云服务器建议安装flannel,calico可能会和云网络环境有冲突,这里因为是物理机,所以安装calico

  • flannel插件(轻量级用于快速搭建使用,初学推荐)
  • calico插件(master节点上执行,用于复杂网络环境)

查看calico版本与k8s版本的匹配关系

重点查看Kubernetes requirements - Calico v3.20

# 上述链接中查看K8S和Calico版本的对应关系,我这里的K8S版本为1.20,可以看到Calico v3.20是适配1.20的K8S的,所以这里使用Calico v3.20的版本
kubectl apply -f https://docs.projectcalico.org/v3.20/manifests/calico.yaml

2、查看节点状态

# 在master机器上执行,可能需要稍等一会儿,节点状态就会变为Ready状态
kubectl get nodes 

status为notReady排查解决方案,请参考以下步骤:

# 1-检查 kubelet 服务
systemctl status kubelet

# 2-查看节点日志,通过日志信息详细排查
kubectl describe node <node-name>

# 3-检查网络插件:确保网络插件(如 Calico、Flannel 等)已经正确部署并且正在运行。网络问题可能导致节点无法正确注册到集群
# 4-检查防火墙和端口:确保节点的防火墙没有阻止 Kubernetes 使用的端口,特别是 10250 端口(kubelet 默认端口)和 6443 端口(API server 默认端口)
# 5-检查 API server 可达性: 从节点上,使用 curl 或其他工具尝试访问 API server 的健康检查端点。如下命令:
curl -k https://<master-ip>:6443/healthz

# 6-检查证书,如果集群使用自签名证书,确保节点上的 kubelet 有正确的 CA 证书,以便它可以安全地与 API server 通信
# 7-重启 kubelet 服务,如果上述步骤都没有发现问题,可以尝试重启 kubelet 服务
systemctl restart kubelet
# 8-重新加入节点: 如果问题仍然存在,可能需要从集群中删除节点,然后重新加入它们。使用以下命令删除节点:
kubectl delete node <node-name>
# 随后使用 kubeadm join 命令重新加入节点

所以节点状态为Ready后

# 验证系统容器,所有容器STATUS要为running,即为安装成功
kubectl get pods -n kube-system

六、K8S常用命令

1、查看pod,service,endpoints,secret,pvc,node等状态

# 例如kubectl get pod 查看详细信息可以加上 -o wide 其他namespace的指定 -n namespace名
kubectl get 组件名

2、应用并且启动一个yaml资源

# 例如kubectl apply -f nginx.yaml 
# 这里是如果没有则创建,如果有则变更,比create好用
kubectl apply -f xxx.yaml

3、删除并停止一个yaml资源

kubectl delete -f xxx.yaml

4、查看资源状态,比如有一组deployment内的pod没起来,一般用于pod调度过程出现的问题排查

# 先用kubectl get pods -n namespaces查看
kubectl describe pod pod名 -n namespaces

5、查看pod日志,用于pod状态未就绪的故障排查

# 先用kubectl get pods -n namespaces查看
kubectl logs pod名 -n namespaces

6、查看node节点或者是pod资源(cpu,内存资源)使用情况

# 例如kubectl top node
# 例如kubectl top pod
kubectl top 组件名

7、进入pod内部

kubectl exec -it pod名 /bin/bash

8、查看K8S集群的api访问token

kubectl get secret $(kubectl get sa default -o=jsonpath='{.secrets[0].name}') -o=jsonpath='{.data.token}' | base64 --decode
  • kubectl get sa default -o=jsonpath='{.secrets[0].name}' 用于获取默认服务账户的第一个secret的名称
  • kubectl get secret -o=jsonpath='{.data.token}' 用于获取指定secret的token
  • base64 --decode 用于将获取到的token解码
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

9、K8S节点添加或更新标签(角色)

kubectl label node node01 kubernetes.io/role=node
  • kubectl: 这是 Kubernetes 的命令行工具,用于与 Kubernetes 集群进行交互。

  • label: 这是 kubectl 的一个子命令,用于添加、更新或删除资源的标签。

  • node: 这是 label 子命令的操作对象类型,指示我们要给一个节点添加标签。

  • node01: 这是节点的名称,指定了我们要给哪个节点添加标签。

  • kubernetes.io/role=node: 这是一个键值对,用于指定要添加的标签。在这个例子中,键是 kubernetes.io/role,值是 node。这个标签通常用于标识节点的角色,例如,标记某个节点为工作节点(worker node)

10、查看指定命名空间下的configMap配置

# 查看指定命名空间下的configMap配置列表
kubectl get configmap -n namespaces

# 查看指定命名空间下的指定configMap配置内容
kubectl get configmap configMap名称 -n namespaces名称 -o yaml