离线高可用部署
Kubernetes 离线高可用部署
本章介绍如何在无互联网连接的环境中部署高可用 Kubernetes 集群,适用于内网生产环境。
部署概述
离线高可用部署结合了离线部署和高可用架构的优势,提供生产级的可靠性和安全性。
架构图:
┌──────────────┐
│ 负载均衡器 │
│ (HAProxy) │
│ VIP: x.x.x.100
└──────┬───────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Master 1 │ │ Master 2 │ │ Master 3 │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ API │ │ │ │ API │ │ │ │ API │ │
│ │ etcd │◄├────┤ │ etcd │◄├────┤ │ etcd │ │
│ │ Sched │ │ │ │ Sched │ │ │ │ Sched │ │
│ │ Ctrl │ │ │ │ Ctrl │ │ │ │ Ctrl │ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└───────────────────┼───────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
│ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │
│ │ kubelet│ │ │ │ kubelet│ │ │ │ kubelet│ │
│ │ Pods │ │ │ │ Pods │ │ │ │ Pods │ │
│ └────────┘ │ │ └────────┘ │ │ └────────┘ │
└─────────────┘ └─────────────┘ └─────────────┘
特点:
- ✅ 高可用(99.9%+ SLA)
- ✅ 完全离线部署
- ✅ 生产级可靠性
- ✅ 符合安全合规要求
- ❌ 准备工作最复杂
- ❌ 成本最高
适用场景:
- 🏛️ 政企生产环境
- 🏦 金融行业关键系统
- 🔒 军工、保密单位
- 🌐 多数据中心部署
一、环境要求
1.1 服务器规划
| 节点类型 | 数量 | CPU | 内存 | 磁盘 | IP 地址示例 |
|---|---|---|---|---|---|
| 负载均衡器 | 2 | 2 核 | 4GB | 50GB | 192.168.1.100-101 |
| Master 节点 | 3 | 4 核 | 8GB | 100GB SSD | 192.168.1.10-12 |
| Worker 节点 | 3+ | 8 核 | 16GB | 200GB SSD | 192.168.1.20-22 |
最小配置(小型生产):
- 2 LB + 3 Master + 3 Worker = 8 台服务器
推荐配置(中型生产):
- 2 LB + 5 Master + 5 Worker = 12 台服务器
1.2 网络规划
网络配置:
节点网络: 192.168.1.0/24
VIP(虚拟IP): 192.168.1.100
负载均衡器:
- 192.168.1.100 (VIP)
- 192.168.1.101 (lb-1 主)
- 192.168.1.102 (lb-2 备)
Master 节点:
- 192.168.1.10 (master-1)
- 192.168.1.11 (master-2)
- 192.168.1.12 (master-3)
Worker 节点:
- 192.168.1.20 (worker-1)
- 192.168.1.21 (worker-2)
- 192.168.1.22 (worker-3)
Pod 网络: 10.244.0.0/16
Service 网络: 10.96.0.0/12
二、准备离线资源包(联网环境)
2.1 创建工作目录
mkdir -p ~/k8s-offline-ha/{images,packages,scripts,configs}
cd ~/k8s-offline-ha
2.2 设置版本变量
export K8S_VERSION=1.30.0
export CONTAINERD_VERSION=1.7.11
export RUNC_VERSION=1.1.10
export CNI_VERSION=1.4.0
export CRICTL_VERSION=1.30.0
export CALICO_VERSION=3.27.0
export HAPROXY_VERSION=2.8
export KEEPALIVED_VERSION=2.2
# 保存版本信息
cat <<EOF > versions.txt
K8S_VERSION=$K8S_VERSION
CONTAINERD_VERSION=$CONTAINERD_VERSION
RUNC_VERSION=$RUNC_VERSION
CNI_VERSION=$CNI_VERSION
CRICTL_VERSION=$CRICTL_VERSION
CALICO_VERSION=$CALICO_VERSION
HAPROXY_VERSION=$HAPROXY_VERSION
KEEPALIVED_VERSION=$KEEPALIVED_VERSION
EOF
2.3 下载所有二进制文件
cd packages
# Kubernetes 组件
wget https://dl.k8s.io/release/v${K8S_VERSION}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
# 容器运行时
wget https://github.com/containerd/containerd/releases/download/v${CONTAINERD_VERSION}/containerd-${CONTAINERD_VERSION}-linux-amd64.tar.gz
wget https://github.com/opencontainers/runc/releases/download/v${RUNC_VERSION}/runc.amd64
wget https://github.com/containernetworking/plugins/releases/download/v${CNI_VERSION}/cni-plugins-linux-amd64-v${CNI_VERSION}.tgz
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v${CRICTL_VERSION}/crictl-v${CRICTL_VERSION}-linux-amd64.tar.gz
# systemd 服务文件
wget -O containerd.service https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
wget -O kubelet.service https://raw.githubusercontent.com/kubernetes/release/master/cmd/krel/templates/latest/kubelet/kubelet.service
wget -O 10-kubeadm.conf https://raw.githubusercontent.com/kubernetes/release/master/cmd/krel/templates/latest/kubeadm/10-kubeadm.conf
# HAProxy 和 Keepalived(下载 deb/rpm 包)
# Ubuntu/Debian
apt-get download haproxy keepalived
# 或 CentOS/RHEL
# yumdownloader haproxy keepalived
cd ..
2.4 导出容器镜像
cd images
# 生成完整镜像列表
kubeadm config images list --kubernetes-version v${K8S_VERSION} > image-list.txt
cat <<EOF >> image-list.txt
docker.io/calico/cni:v${CALICO_VERSION}
docker.io/calico/node:v${CALICO_VERSION}
docker.io/calico/kube-controllers:v${CALICO_VERSION}
quay.io/tigera/operator:v1.32.0
EOF
# 批量导出镜像
while IFS= read -r image; do
echo "处理: $image"
sudo crictl pull "$image"
image_file=$(echo "$image" | tr '/:' '_').tar
sudo ctr -n k8s.io image export "$image_file" "$image"
echo "✅ $image 已导出"
done < image-list.txt
cd ..
2.5 下载配置文件
cd configs
# Calico 配置
wget https://raw.githubusercontent.com/projectcalico/calico/v${CALICO_VERSION}/manifests/calico.yaml
cd ..
2.6 创建完整部署脚本集
系统准备脚本(所有节点):
cat <<'EOF' > scripts/prepare-system.sh
#!/bin/bash
set -e
GREEN='\033[0;32m'
NC='\033[0m'
echo -e "${GREEN}[1/6] 禁用 swap...${NC}"
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
echo -e "${GREEN}[2/6] 加载内核模块...${NC}"
cat <<EOFMOD | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOFMOD
modprobe overlay
modprobe br_netfilter
echo -e "${GREEN}[3/6] 配置内核参数...${NC}"
cat <<EOFSYS | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOFSYS
sysctl --system > /dev/null
echo -e "${GREEN}[4/6] 关闭防火墙...${NC}"
systemctl stop firewalld 2>/dev/null || ufw disable 2>/dev/null || true
systemctl disable firewalld 2>/dev/null || true
echo -e "${GREEN}[5/6] 禁用 SELinux...${NC}"
setenforce 0 2>/dev/null || true
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config 2>/dev/null || true
echo -e "${GREEN}[6/6] 验证配置...${NC}"
lsmod | grep br_netfilter
lsmod | grep overlay
echo -e "${GREEN}✅ 系统准备完成${NC}"
EOF
chmod +x scripts/prepare-system.sh
负载均衡器安装脚本:
cat <<'EOF' > scripts/install-lb.sh
#!/bin/bash
# 安装 HAProxy 和 Keepalived
set -e
WORK_DIR=/root/k8s-offline-ha
GREEN='\033[0;32m'
NC='\033[0m'
echo -e "${GREEN}[1/3] 安装 HAProxy...${NC}"
cd ${WORK_DIR}/packages
dpkg -i haproxy*.deb 2>/dev/null || rpm -ivh haproxy*.rpm 2>/dev/null
echo -e "${GREEN}[2/3] 安装 Keepalived...${NC}"
dpkg -i keepalived*.deb 2>/dev/null || rpm -ivh keepalived*.rpm 2>/dev/null
echo -e "${GREEN}[3/3] 验证安装...${NC}"
haproxy -v
keepalived -v
echo -e "${GREEN}✅ 负载均衡器组件安装完成${NC}"
EOF
chmod +x scripts/install-lb.sh
配置 HAProxy 脚本:
cat <<'EOF' > scripts/config-haproxy.sh
#!/bin/bash
# 配置 HAProxy
set -e
MASTER_1=${1:-192.168.1.10}
MASTER_2=${2:-192.168.1.11}
MASTER_3=${3:-192.168.1.12}
cat <<EOFCFG > /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend kubernetes-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-apiserver
backend kubernetes-apiserver
mode tcp
option tcp-check
balance roundrobin
server master-1 ${MASTER_1}:6443 check fall 3 rise 2
server master-2 ${MASTER_2}:6443 check fall 3 rise 2
server master-3 ${MASTER_3}:6443 check fall 3 rise 2
listen stats
bind *:8080
mode http
stats enable
stats uri /stats
stats refresh 30s
stats realm HAProxy\ Statistics
stats auth admin:admin123
EOFCFG
systemctl enable haproxy
systemctl restart haproxy
echo "✅ HAProxy 配置完成"
EOF
chmod +x scripts/config-haproxy.sh
配置 Keepalived 脚本:
cat <<'EOF' > scripts/config-keepalived.sh
#!/bin/bash
# 配置 Keepalived
set -e
ROLE=${1:-MASTER} # MASTER 或 BACKUP
VIP=${2:-192.168.1.100}
INTERFACE=${3:-eth0}
PRIORITY=${4:-100}
cat <<EOFCFG > /etc/keepalived/keepalived.conf
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
state ${ROLE}
interface ${INTERFACE}
virtual_router_id 51
priority ${PRIORITY}
advert_int 1
authentication {
auth_type PASS
auth_pass 1234
}
virtual_ipaddress {
${VIP}
}
track_script {
check_haproxy
}
}
EOFCFG
systemctl enable keepalived
systemctl restart keepalived
echo "✅ Keepalived 配置完成 (${ROLE})"
EOF
chmod +x scripts/config-keepalived.sh
安装 Kubernetes 组件脚本(复用单节点脚本):
# 复制并修改之前的脚本
cp ~/k8s-offline/scripts/install-containerd.sh scripts/
cp ~/k8s-offline/scripts/install-k8s.sh scripts/
cp ~/k8s-offline/scripts/load-images.sh scripts/
初始化第一个 Master 脚本:
cat <<'EOF' > scripts/init-first-master.sh
#!/bin/bash
# 初始化第一个 Master 节点
set -e
VIP=${1:-192.168.1.100}
MASTER_IP=${2:-192.168.1.10}
cat <<EOFCFG > /tmp/kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: ${MASTER_IP}
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.30.0
controlPlaneEndpoint: "${VIP}:6443"
networking:
serviceSubnet: "10.96.0.0/12"
podSubnet: "10.244.0.0/16"
dnsDomain: "cluster.local"
apiServer:
certSANs:
- "${VIP}"
- "192.168.1.10"
- "192.168.1.11"
- "192.168.1.12"
extraArgs:
authorization-mode: "Node,RBAC"
controllerManager:
extraArgs:
bind-address: "0.0.0.0"
scheduler:
extraArgs:
bind-address: "0.0.0.0"
etcd:
local:
dataDir: "/var/lib/etcd"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
EOFCFG
echo "初始化第一个 Master 节点..."
kubeadm init --config=/tmp/kubeadm-config.yaml --upload-certs | tee /tmp/kubeadm-init.log
echo "配置 kubectl..."
mkdir -p /root/.kube
cp -i /etc/kubernetes/admin.conf /root/.kube/config
echo "✅ 第一个 Master 初始化完成"
echo ""
echo "请保存以下命令:"
echo ""
grep -A 2 "kubeadm join" /tmp/kubeadm-init.log
EOF
chmod +x scripts/init-first-master.sh
一键部署脚本:
cat <<'EOF' > scripts/deploy-ha.sh
#!/bin/bash
# 高可用集群一键部署脚本
set -e
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
echo_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
echo_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
echo_error() { echo -e "${RED}[ERROR]${NC} $1"; }
if [ "$EUID" -ne 0 ]; then
echo_error "请使用 root 用户运行"
exit 1
fi
echo ""
echo "==========================================="
echo " Kubernetes 高可用离线部署向导"
echo "==========================================="
echo ""
echo "请选择节点角色:"
echo " 1) 负载均衡器(主)"
echo " 2) 负载均衡器(备)"
echo " 3) Master 节点(第一个)"
echo " 4) Master 节点(其他)"
echo " 5) Worker 节点"
echo ""
read -p "请选择 [1-5]: " ROLE
case $ROLE in
1|2)
read -p "VIP 地址 [192.168.1.100]: " VIP
VIP=${VIP:-192.168.1.100}
read -p "网卡名称 [eth0]: " IFACE
IFACE=${IFACE:-eth0}
read -p "Master1 IP [192.168.1.10]: " M1
M1=${M1:-192.168.1.10}
read -p "Master2 IP [192.168.1.11]: " M2
M2=${M2:-192.168.1.11}
read -p "Master3 IP [192.168.1.12]: " M3
M3=${M3:-192.168.1.12}
./scripts/install-lb.sh
./scripts/config-haproxy.sh $M1 $M2 $M3
if [ "$ROLE" == "1" ]; then
./scripts/config-keepalived.sh MASTER $VIP $IFACE 100
else
./scripts/config-keepalived.sh BACKUP $VIP $IFACE 90
fi
echo_info "负载均衡器部署完成"
;;
3)
read -p "本节点 IP: " NODE_IP
read -p "VIP [192.168.1.100]: " VIP
VIP=${VIP:-192.168.1.100}
./scripts/prepare-system.sh
./scripts/install-containerd.sh
./scripts/load-images.sh
./scripts/install-k8s.sh
./scripts/init-first-master.sh $VIP $NODE_IP
echo_info "第一个 Master 部署完成"
;;
4|5)
read -p "请输入 join 命令: " JOIN_CMD
./scripts/prepare-system.sh
./scripts/install-containerd.sh
./scripts/load-images.sh
./scripts/install-k8s.sh
eval "sudo $JOIN_CMD"
if [ "$ROLE" == "4" ]; then
mkdir -p /root/.kube
scp root@192.168.1.10:/root/.kube/config /root/.kube/config
fi
echo_info "节点加入完成"
;;
*)
echo_error "无效选择"
exit 1
;;
esac
echo ""
echo_info "部署完成!"
EOF
chmod +x scripts/deploy-ha.sh
2.7 打包离线资源
cd ~
# 创建 README
cat <<EOF > k8s-offline-ha/README.md
# Kubernetes ${K8S_VERSION} 高可用离线部署包
## 版本信息
$(cat k8s-offline-ha/versions.txt)
## 架构
- 2 个负载均衡器(HAProxy + Keepalived)
- 3 个 Master 节点
- 3+ Worker 节点
## 快速部署
\`\`\`bash
# 解压
tar -xzf k8s-offline-ha-v${K8S_VERSION}.tar.gz
cd k8s-offline-ha
# 在每个节点执行
sudo ./scripts/deploy-ha.sh
\`\`\`
## 部署顺序
1. 部署两台负载均衡器
2. 部署第一个 Master
3. 安装网络插件
4. 部署其他 Master 节点
5. 部署 Worker 节点
EOF
# 打包
echo "正在打包..."
tar -czf k8s-offline-ha-v${K8S_VERSION}.tar.gz k8s-offline-ha/
# 校验
md5sum k8s-offline-ha-v${K8S_VERSION}.tar.gz > k8s-offline-ha-v${K8S_VERSION}.tar.gz.md5
sha256sum k8s-offline-ha-v${K8S_VERSION}.tar.gz > k8s-offline-ha-v${K8S_VERSION}.tar.gz.sha256
echo "✅ 离线包制作完成"
ls -lh k8s-offline-ha-v${K8S_VERSION}.tar.gz*
三、离线部署流程
3.1 部署负载均衡器
步骤 1: 传输离线包到两台负载均衡器
scp k8s-offline-ha-v1.30.0.tar.gz root@192.168.1.101:/root/
scp k8s-offline-ha-v1.30.0.tar.gz root@192.168.1.102:/root/
步骤 2: 部署主负载均衡器(192.168.1.101)
cd /root
tar -xzf k8s-offline-ha-v1.30.0.tar.gz
cd k8s-offline-ha
# 执行一键部署
sudo ./scripts/deploy-ha.sh
# 选择: 1 (负载均衡器主)
# VIP: 192.168.1.100
# 网卡: eth0
# Master IPs: 192.168.1.10, 192.168.1.11, 192.168.1.12
步骤 3: 部署备份负载均衡器(192.168.1.102)
cd /root/k8s-offline-ha
sudo ./scripts/deploy-ha.sh
# 选择: 2 (负载均衡器备)
# 其他参数同主负载均衡器
验证负载均衡器:
# 检查 VIP
ip addr show eth0 | grep 192.168.1.100
# 检查 HAProxy
systemctl status haproxy
curl -k https://192.168.1.100:6443/healthz
# 检查 Keepalived
systemctl status keepalived
3.2 部署第一个 Master 节点
# 传输离线包
scp k8s-offline-ha-v1.30.0.tar.gz root@192.168.1.10:/root/
# SSH 到 192.168.1.10
tar -xzf k8s-offline-ha-v1.30.0.tar.gz
cd k8s-offline-ha
# 执行部署
sudo ./scripts/deploy-ha.sh
# 选择: 3 (第一个 Master)
# 本节点 IP: 192.168.1.10
# VIP: 192.168.1.100
# 保存输出的 join 命令!
3.3 安装网络插件
在第一个 Master 节点执行:
kubectl apply -f configs/calico.yaml
# 等待网络插件启动
kubectl wait --for=condition=Ready pods --all -n kube-system --timeout=600s
# 验证
kubectl get nodes
kubectl get pods -n kube-system
3.4 部署其他 Master 节点
# 在 Master-2 (192.168.1.11) 和 Master-3 (192.168.1.12) 执行
cd /root/k8s-offline-ha
sudo ./scripts/deploy-ha.sh
# 选择: 4 (其他 Master)
# 输入保存的 Master join 命令
# 配置 kubectl(可选)
mkdir -p /root/.kube
scp root@192.168.1.10:/root/.kube/config /root/.kube/config
3.5 部署 Worker 节点
# 在所有 Worker 节点执行
cd /root/k8s-offline-ha
sudo ./scripts/deploy-ha.sh
# 选择: 5 (Worker)
# 输入保存的 Worker join 命令
3.6 验证集群
# 查看所有节点
kubectl get nodes -o wide
# 输出示例
NAME STATUS ROLES AGE VERSION
master-1 Ready control-plane 15m v1.30.0
master-2 Ready control-plane 10m v1.30.0
master-3 Ready control-plane 8m v1.30.0
worker-1 Ready <none> 5m v1.30.0
worker-2 Ready <none> 5m v1.30.0
worker-3 Ready <none> 5m v1.30.0
# 检查系统 Pod
kubectl get pods --all-namespaces
四、验证高可用
4.1 验证 etcd 集群
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# 应该显示 3 个 etcd 成员
4.2 验证负载均衡
# 测试 VIP 访问
kubectl --server=https://192.168.1.100:6443 get nodes
# 停止一个 Master,测试故障转移
# SSH 到 master-1
sudo systemctl stop kubelet
# 在其他节点测试,应该仍可访问
kubectl get nodes
# 恢复 master-1
sudo systemctl start kubelet
4.3 测试负载均衡器故障转移
# 停止主负载均衡器
# SSH 到 lb-1 (192.168.1.101)
sudo systemctl stop haproxy
sudo systemctl stop keepalived
# VIP 应该自动转移到 lb-2
# 在任意节点测试
kubectl get nodes # 应该仍然正常
五、生产环境配置
5.1 etcd 备份自动化
cat <<'EOF' > /usr/local/bin/backup-etcd.sh
#!/bin/bash
BACKUP_DIR="/backup/etcd"
DATE=$(date +%Y%m%d-%H%M%S)
mkdir -p ${BACKUP_DIR}
ETCDCTL_API=3 etcdctl snapshot save ${BACKUP_DIR}/etcd-snapshot-${DATE}.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
find ${BACKUP_DIR} -name "etcd-snapshot-*.db" -mtime +7 -delete
echo "✅ Backup: etcd-snapshot-${DATE}.db"
EOF
chmod +x /usr/local/bin/backup-etcd.sh
# 设置定时任务
echo "0 2 * * * /usr/local/bin/backup-etcd.sh" | crontab -
5.2 健康检查脚本
cat <<'EOF' > /usr/local/bin/check-cluster-health.sh
#!/bin/bash
# 集群健康检查
echo "=== 节点状态 ==="
kubectl get nodes
echo ""
echo "=== etcd 健康状态 ==="
ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint health
echo ""
echo "=== 系统 Pod 状态 ==="
kubectl get pods -n kube-system | grep -v Running || echo "所有 Pod 正常"
echo ""
echo "=== 负载均衡器状态 ==="
curl -s -k https://192.168.1.100:6443/healthz || echo "VIP 不可达"
EOF
chmod +x /usr/local/bin/check-cluster-health.sh
5.3 监控告警(使用 Prometheus)
详见监控与日志章节。
六、离线升级
6.1 准备新版本离线包
在联网环境制作新版本离线包,步骤同第二章。
6.2 升级流程
# 1. 传输新版本离线包到所有节点
# 2. 升级第一个 Master
cd /root/k8s-offline-ha-v1.30.1
sudo ./scripts/load-images.sh
sudo install -m 755 packages/kubeadm /usr/local/bin/
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.30.1
sudo install -m 755 packages/kubelet /usr/local/bin/
sudo install -m 755 packages/kubectl /usr/local/bin/
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# 3. 升级其他 Master 节点
sudo kubeadm upgrade node
sudo install -m 755 packages/kubelet /usr/local/bin/
sudo systemctl restart kubelet
# 4. 升级 Worker 节点(逐个升级)
kubectl drain worker-1 --ignore-daemonsets
# SSH 到 worker-1 执行升级
kubectl uncordon worker-1
七、故障排查
7.1 VIP 无法访问
# 检查 Keepalived
systemctl status keepalived
journalctl -u keepalived -f
# 检查 VIP
ip addr show | grep 192.168.1.100
# 检查 HAProxy
systemctl status haproxy
journalctl -u haproxy -f
7.2 etcd 集群不健康
# 查看 etcd 成员状态
ETCDCTL_API=3 etcdctl member list
# 查看 etcd 日志
sudo journalctl -u etcd -f
# 删除故障成员并重新加入
ETCDCTL_API=3 etcdctl member remove <member-id>
7.3 Master 节点加入失败
# 重新生成 certificate-key
sudo kubeadm init phase upload-certs --upload-certs
# 生成新的 join 命令
kubeadm token create --print-join-command --certificate-key <new-key>
八、安全加固
8.1 网络隔离
# 限制 API Server 访问(仅允许内网)
iptables -A INPUT -p tcp --dport 6443 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 6443 -j DROP
8.2 启用审计日志
在 kubeadm 配置中添加审计配置(参考在线高可用部署章节)。
8.3 定期安全扫描
# 使用 Trivy 扫描镜像
trivy image --severity HIGH,CRITICAL nginx:latest
# 使用 kube-bench 检查安全配置
kube-bench run --targets master,node
九、最佳实践
9.1 部署规范
- ✅ 提前规划IP地址和主机名
- ✅ 使用统一的操作系统版本
- ✅ 保持版本一致性
- ✅ 制作详细的部署文档
- ✅ 定期备份 etcd
9.2 运维规范
- ✅ 建立变更管理流程
- ✅ 定期健康检查
- ✅ 监控告警及时响应
- ✅ 定期进行灾难恢复演练
- ✅ 保留多个版本的离线包
9.3 安全规范
- ✅ 离线包加密存储
- ✅ 限制管理员访问权限
- ✅ 启用审计日志
- ✅ 定期安全扫描
- ✅ 及时更新安全补丁
小结
本章介绍了离线高可用部署:
✅ 完整架构:2 LB + 3 Master + 3 Worker
✅ 高可用保障:VIP + etcd 集群 + 多 Master
✅ 完全离线:不依赖互联网
✅ 生产级可靠性:99.9%+ SLA
✅ 安全合规:适合政企、金融行业
✅ 自动化部署:一键部署脚本
✅ 运维友好:备份、升级、监控完善
准备时间:2-3 小时(联网环境)
部署时间:2-4 小时(目标环境)
最小规模:8 台服务器
可用性:99.9%+ SLA
适用场景:
- 🏛️ 政企生产环境
- 🏦 金融行业核心系统
- 🔒 军工、保密单位
- 📊 关键业务系统
下一步: