蓝绿发布高级场景
蓝绿发布高级场景
深入探讨蓝绿发布在复杂场景下的实践。
有状态应用的蓝绿发布
有状态应用的蓝绿发布需要特殊处理,因为涉及数据持久化和状态同步。
场景 1:使用 StatefulSet
挑战:
- Pod 有唯一标识和稳定的网络标识
- 需要按顺序启动和停止
- 持久化数据需要迁移或共享
解决方案:
# 蓝环境 StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: database-blue
spec:
serviceName: database-blue
replicas: 3
selector:
matchLabels:
app: database
version: blue
template:
metadata:
labels:
app: database
version: blue
spec:
containers:
- name: postgres
image: postgres:14-blue
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
env:
- name: POSTGRES_DB
value: mydb
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi
---
# 无头 Service
apiVersion: v1
kind: Service
metadata:
name: database-blue
spec:
clusterIP: None
selector:
app: database
version: blue
ports:
- port: 5432
targetPort: 5432
---
# 主 Service(用于切换)
apiVersion: v1
kind: Service
metadata:
name: database
spec:
selector:
app: database
version: blue # 切换点
ports:
- port: 5432
targetPort: 5432
数据同步策略:
- 共享存储(推荐):
# 使用 ReadWriteMany 存储
volumeClaimTemplates:
- metadata:
name: shared-data
spec:
accessModes: [ "ReadWriteMany" ]
storageClassName: nfs
resources:
requests:
storage: 100Gi
- 数据复制:
#!/bin/bash
# 从蓝环境复制数据到绿环境
# 1. 停止写入(切换到只读模式)
kubectl exec database-blue-0 -- psql -U postgres -c "
ALTER SYSTEM SET default_transaction_read_only = on;
SELECT pg_reload_conf();
"
# 2. 创建备份
kubectl exec database-blue-0 -- pg_dump -U postgres mydb > /tmp/backup.sql
# 3. 恢复到绿环境
kubectl exec database-green-0 -- psql -U postgres mydb < /tmp/backup.sql
# 4. 验证数据一致性
kubectl exec database-blue-0 -- psql -U postgres -c "SELECT COUNT(*) FROM users;"
kubectl exec database-green-0 -- psql -U postgres -c "SELECT COUNT(*) FROM users;"
- 主从复制:
# 绿环境作为从库
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-green-config
data:
postgresql.conf: |
# 从库配置
hot_standby = on
wal_level = replica
max_wal_senders = 10
wal_keep_segments = 64
recovery.conf: |
# 复制配置
standby_mode = on
primary_conninfo = 'host=database-blue-0.database-blue port=5432 user=replicator'
trigger_file = '/tmp/promote_to_master'
场景 2:分布式缓存(Redis Cluster)
Redis 集群蓝绿发布:
# Redis 蓝环境
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-blue
spec:
serviceName: redis-blue
replicas: 6 # 3 主 + 3 从
selector:
matchLabels:
app: redis
version: blue
template:
metadata:
labels:
app: redis
version: blue
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
- containerPort: 16379 # Cluster bus
command:
- redis-server
args:
- /etc/redis/redis.conf
- --cluster-enabled yes
- --cluster-config-file nodes.conf
- --cluster-node-timeout 5000
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
---
# 初始化 Job
apiVersion: batch/v1
kind: Job
metadata:
name: redis-green-init
spec:
template:
spec:
containers:
- name: init
image: redis:7-alpine
command:
- /bin/sh
- -c
- |
# 等待所有节点就绪
sleep 30
# 创建集群
redis-cli --cluster create \
redis-green-0.redis-green:6379 \
redis-green-1.redis-green:6379 \
redis-green-2.redis-green:6379 \
redis-green-3.redis-green:6379 \
redis-green-4.redis-green:6379 \
redis-green-5.redis-green:6379 \
--cluster-replicas 1 \
--cluster-yes
# 从蓝环境导入数据
for i in 0 1 2; do
redis-cli -h redis-blue-$i.redis-blue \
--rdb /tmp/dump-$i.rdb
redis-cli -h redis-green-$i.redis-green \
--pipe < /tmp/dump-$i.rdb
done
restartPolicy: Never
多集群蓝绿发布
跨多个 Kubernetes 集群的蓝绿发布。
架构设计
┌─────────────────┐
│ Global LB │
│ (DNS/CDN) │
└────────┬────────┘
│
┌────────┴────────┐
│ │
┌────▼─────┐ ┌────▼─────┐
│Cluster A │ │Cluster B │
│ (Blue) │ │ (Green) │
└──────────┘ └──────────┘
使用 External DNS + Route53
1. 安装 External DNS
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-dns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
spec:
serviceAccountName: external-dns
containers:
- name: external-dns
image: registry.k8s.io/external-dns/external-dns:v0.14.0
args:
- --source=service
- --source=ingress
- --domain-filter=example.com
- --provider=aws
- --policy=sync
- --aws-zone-type=public
- --registry=txt
- --txt-owner-id=my-cluster
2. 配置 Service
# Cluster A (Blue)
apiVersion: v1
kind: Service
metadata:
name: myapp
annotations:
external-dns.alpha.kubernetes.io/hostname: myapp.example.com
external-dns.alpha.kubernetes.io/ttl: "60"
spec:
type: LoadBalancer
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
3. 切换流量
# 切换到 Cluster B (Green)
# 在 Cluster B 中创建相同的 Service
# External DNS 会自动更新 DNS 记录
# 或使用加权路由
kubectl annotate service myapp \
external-dns.alpha.kubernetes.io/aws-weight="100" \
-n default --overwrite
# Cluster A 设置权重为 0
kubectl annotate service myapp \
external-dns.alpha.kubernetes.io/aws-weight="0" \
-n default --overwrite \
--kubeconfig=cluster-a-config
使用 Istio 多集群
1. 安装 Istio 多集群
# 集群 A
istioctl install --set profile=default \
--set values.global.meshID=mesh1 \
--set values.global.multiCluster.clusterName=cluster-a \
--set values.global.network=network-a
# 集群 B
istioctl install --set profile=default \
--set values.global.meshID=mesh1 \
--set values.global.multiCluster.clusterName=cluster-b \
--set values.global.network=network-b
2. 配置流量路由
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.example.com
http:
- match:
- headers:
x-version:
exact: "v2"
route:
- destination:
host: myapp.cluster-b.svc.cluster.local
weight: 100
- route:
- destination:
host: myapp.cluster-a.svc.cluster.local
weight: 100
- destination:
host: myapp.cluster-b.svc.cluster.local
weight: 0
---
# 逐步切换
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.example.com
http:
- route:
- destination:
host: myapp.cluster-a.svc.cluster.local
weight: 50 # 50% 流量
- destination:
host: myapp.cluster-b.svc.cluster.local
weight: 50 # 50% 流量
微服务架构蓝绿发布
多个微服务协同的蓝绿发布。
场景:关联服务同步升级
┌─────────────────────────────────────┐
│ 前端 (v2) │
└──────────┬──────────────────────────┘
│
┌──────┴──────┐
│ │
┌───▼───┐ ┌───▼───┐
│用户服务│ │订单服务│
│ v2 │ │ v2 │
└───┬───┘ └───┬───┘
│ │
└─────┬──────┘
│
┌─────▼─────┐
│ 支付服务 │
│ v2 │
└───────────┘
协调发布脚本:
#!/bin/bash
set -e
SERVICES=("payment-service" "order-service" "user-service" "frontend")
NEW_VERSION="v2.0"
NAMESPACE="production"
echo "🚀 开始微服务蓝绿发布"
# 1. 按依赖顺序部署绿环境
for SERVICE in "${SERVICES[@]}"; do
echo "📦 部署 $SERVICE 绿环境..."
# 部署绿环境
kubectl set image deployment/$SERVICE-green \
$SERVICE=myregistry/$SERVICE:$NEW_VERSION \
-n $NAMESPACE
# 等待就绪
kubectl rollout status deployment/$SERVICE-green \
-n $NAMESPACE --timeout=5m
echo "✅ $SERVICE 绿环境就绪"
done
# 2. 端到端测试
echo "🧪 运行端到端测试..."
./scripts/e2e-test.sh green
# 3. 按依赖顺序切换流量(从底层到上层)
for SERVICE in "${SERVICES[@]}"; do
echo "🔄 切换 $SERVICE 流量到绿环境..."
kubectl patch service $SERVICE-service \
-n $NAMESPACE \
-p '{"spec":{"selector":{"version":"green"}}}'
# 监控 30 秒
echo "📊 监控 $SERVICE..."
sleep 30
# 检查错误率
ERROR_RATE=$(kubectl logs -l app=$SERVICE,version=green \
-n $NAMESPACE --tail=1000 | grep ERROR | wc -l)
if [ $ERROR_RATE -gt 10 ]; then
echo "❌ $SERVICE 错误率过高,开始回滚"
# 回滚所有已切换的服务
for ROLLBACK_SERVICE in "${SERVICES[@]}"; do
kubectl patch service $ROLLBACK_SERVICE-service \
-n $NAMESPACE \
-p '{"spec":{"selector":{"version":"blue"}}}'
done
exit 1
fi
echo "✅ $SERVICE 切换成功"
done
echo "🎉 所有微服务切换完成"
服务依赖管理
使用 Helm 管理依赖:
# Chart.yaml
apiVersion: v2
name: microservices
version: 1.0.0
dependencies:
- name: payment-service
version: "2.0.0"
repository: "https://charts.example.com"
- name: order-service
version: "2.0.0"
repository: "https://charts.example.com"
condition: order-service.enabled
- name: user-service
version: "2.0.0"
repository: "https://charts.example.com"
values.yaml:
global:
version: blue # 或 green
namespace: production
payment-service:
enabled: true
version: blue
replicas: 3
order-service:
enabled: true
version: blue
replicas: 3
dependencies:
- payment-service
user-service:
enabled: true
version: blue
replicas: 3
切换命令:
# 升级到绿环境
helm upgrade microservices ./charts/microservices \
--set global.version=green \
--namespace production
# 切换流量
helm upgrade microservices ./charts/microservices \
--set global.activeVersion=green \
--namespace production
合规性和审计
发布审批流程
使用 Argo CD + Notifications:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-prod
annotations:
# 需要审批
argocd-image-updater.argoproj.io/require-approval: "true"
spec:
project: production
syncPolicy:
automated: null # 禁用自动同步
syncOptions:
- CreateNamespace=false
- ApplyOutOfSyncOnly=true
# 审批配置
notifications:
context:
requireApproval: "true"
approvers: "ops-team@example.com"
审批 Webhook:
// approval-webhook.go
func ApprovalHandler(w http.ResponseWriter, r *http.Request) {
var req ApprovalRequest
json.NewDecoder(r.Body).Decode(&req)
// 记录审计日志
auditLog := AuditLog{
Timestamp: time.Now(),
Application: req.AppName,
Version: req.Version,
Approver: req.Approver,
Environment: req.Environment,
Status: "approved",
}
// 保存到数据库
db.Save(&auditLog)
// 触发发布
triggerDeployment(req.AppName, req.Version)
w.WriteHeader(http.StatusOK)
}
变更记录
自动生成变更日志:
apiVersion: v1
kind: ConfigMap
metadata:
name: deployment-changelog
annotations:
kubernetes.io/change-cause: "Deploy v2.0 with blue-green strategy"
data:
changelog: |
Version: v2.0
Date: 2024-01-10 15:00:00
Strategy: Blue-Green
Deployed By: ops-user@example.com
Changes:
- Added new payment gateway
- Fixed memory leak in order service
- Updated database schema
Rollback Command:
kubectl patch service myapp-service -p '{"spec":{"selector":{"version":"blue"}}}'
成本优化
自动清理旧环境
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-old-deployments
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: bitnami/kubectl:latest
command:
- /bin/bash
- -c
- |
# 获取当前激活版本
ACTIVE_VERSION=$(kubectl get service myapp-service \
-o jsonpath='{.spec.selector.version}')
# 删除非激活版本
if [ "$ACTIVE_VERSION" == "blue" ]; then
INACTIVE="green"
else
INACTIVE="blue"
fi
# 检查是否超过保留期(7天)
DEPLOY_TIME=$(kubectl get deployment myapp-$INACTIVE \
-o jsonpath='{.metadata.creationTimestamp}')
if [ $(date -d "$DEPLOY_TIME" +%s) -lt $(date -d "7 days ago" +%s) ]; then
echo "Deleting old deployment: myapp-$INACTIVE"
kubectl delete deployment myapp-$INACTIVE
fi
restartPolicy: OnFailure
serviceAccountName: cleanup-sa
资源预留策略
# 使用 PriorityClass 确保关键服务
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: production-high
value: 1000000
globalDefault: false
description: "High priority for production services"
---
# 蓝环境(生产)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
template:
spec:
priorityClassName: production-high
containers:
- name: myapp
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
# 绿环境(测试)- 较低优先级
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
template:
spec:
priorityClassName: production-medium
containers:
- name: myapp
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
小结
高级场景蓝绿发布的关键点:
有状态应用:
- 数据同步策略
- 主从复制
- 共享存储
多集群:
- DNS 流量切换
- Istio 多集群网格
- 跨区域部署
微服务:
- 依赖顺序管理
- 端到端测试
- 协调发布
合规性:
- 审批流程
- 审计日志
- 变更记录
成本优化:
- 自动清理
- 资源优先级
- 弹性伸缩
蓝绿发布虽然简单,但在复杂场景下需要周密的规划和自动化支持。