蓝绿发布高级场景

蓝绿发布高级场景

深入探讨蓝绿发布在复杂场景下的实践。

有状态应用的蓝绿发布

有状态应用的蓝绿发布需要特殊处理,因为涉及数据持久化和状态同步。

场景 1:使用 StatefulSet

挑战:

  • Pod 有唯一标识和稳定的网络标识
  • 需要按顺序启动和停止
  • 持久化数据需要迁移或共享

解决方案:

# 蓝环境 StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database-blue
spec:
  serviceName: database-blue
  replicas: 3
  selector:
    matchLabels:
      app: database
      version: blue
  template:
    metadata:
      labels:
        app: database
        version: blue
    spec:
      containers:
      - name: postgres
        image: postgres:14-blue
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        env:
        - name: POSTGRES_DB
          value: mydb
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi

---
# 无头 Service
apiVersion: v1
kind: Service
metadata:
  name: database-blue
spec:
  clusterIP: None
  selector:
    app: database
    version: blue
  ports:
  - port: 5432
    targetPort: 5432

---
# 主 Service(用于切换)
apiVersion: v1
kind: Service
metadata:
  name: database
spec:
  selector:
    app: database
    version: blue  # 切换点
  ports:
  - port: 5432
    targetPort: 5432

数据同步策略:

  1. 共享存储(推荐):
# 使用 ReadWriteMany 存储
volumeClaimTemplates:
- metadata:
    name: shared-data
  spec:
    accessModes: [ "ReadWriteMany" ]
    storageClassName: nfs
    resources:
      requests:
        storage: 100Gi
  1. 数据复制
#!/bin/bash
# 从蓝环境复制数据到绿环境

# 1. 停止写入(切换到只读模式)
kubectl exec database-blue-0 -- psql -U postgres -c "
  ALTER SYSTEM SET default_transaction_read_only = on;
  SELECT pg_reload_conf();
"

# 2. 创建备份
kubectl exec database-blue-0 -- pg_dump -U postgres mydb > /tmp/backup.sql

# 3. 恢复到绿环境
kubectl exec database-green-0 -- psql -U postgres mydb < /tmp/backup.sql

# 4. 验证数据一致性
kubectl exec database-blue-0 -- psql -U postgres -c "SELECT COUNT(*) FROM users;"
kubectl exec database-green-0 -- psql -U postgres -c "SELECT COUNT(*) FROM users;"
  1. 主从复制
# 绿环境作为从库
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-green-config
data:
  postgresql.conf: |
    # 从库配置
    hot_standby = on
    wal_level = replica
    max_wal_senders = 10
    wal_keep_segments = 64
  
  recovery.conf: |
    # 复制配置
    standby_mode = on
    primary_conninfo = 'host=database-blue-0.database-blue port=5432 user=replicator'
    trigger_file = '/tmp/promote_to_master'

场景 2:分布式缓存(Redis Cluster)

Redis 集群蓝绿发布:

# Redis 蓝环境
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-blue
spec:
  serviceName: redis-blue
  replicas: 6  # 3 主 + 3 从
  selector:
    matchLabels:
      app: redis
      version: blue
  template:
    metadata:
      labels:
        app: redis
        version: blue
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
        - containerPort: 16379  # Cluster bus
        command:
        - redis-server
        args:
        - /etc/redis/redis.conf
        - --cluster-enabled yes
        - --cluster-config-file nodes.conf
        - --cluster-node-timeout 5000
        volumeMounts:
        - name: data
          mountPath: /data
        - name: config
          mountPath: /etc/redis
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

---
# 初始化 Job
apiVersion: batch/v1
kind: Job
metadata:
  name: redis-green-init
spec:
  template:
    spec:
      containers:
      - name: init
        image: redis:7-alpine
        command:
        - /bin/sh
        - -c
        - |
          # 等待所有节点就绪
          sleep 30
          
          # 创建集群
          redis-cli --cluster create \
            redis-green-0.redis-green:6379 \
            redis-green-1.redis-green:6379 \
            redis-green-2.redis-green:6379 \
            redis-green-3.redis-green:6379 \
            redis-green-4.redis-green:6379 \
            redis-green-5.redis-green:6379 \
            --cluster-replicas 1 \
            --cluster-yes
          
          # 从蓝环境导入数据
          for i in 0 1 2; do
            redis-cli -h redis-blue-$i.redis-blue \
              --rdb /tmp/dump-$i.rdb
            redis-cli -h redis-green-$i.redis-green \
              --pipe < /tmp/dump-$i.rdb
          done
      restartPolicy: Never

多集群蓝绿发布

跨多个 Kubernetes 集群的蓝绿发布。

架构设计

        ┌─────────────────┐
        │  Global LB      │
        │  (DNS/CDN)      │
        └────────┬────────┘
                 │
        ┌────────┴────────┐
        │                 │
   ┌────▼─────┐     ┌────▼─────┐
   │Cluster A │     │Cluster B │
   │  (Blue)  │     │ (Green)  │
   └──────────┘     └──────────┘

使用 External DNS + Route53

1. 安装 External DNS

apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
  namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups: [""]
  resources: ["services","endpoints","pods"]
  verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get","watch","list"]

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: external-dns
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      containers:
      - name: external-dns
        image: registry.k8s.io/external-dns/external-dns:v0.14.0
        args:
        - --source=service
        - --source=ingress
        - --domain-filter=example.com
        - --provider=aws
        - --policy=sync
        - --aws-zone-type=public
        - --registry=txt
        - --txt-owner-id=my-cluster

2. 配置 Service

# Cluster A (Blue)
apiVersion: v1
kind: Service
metadata:
  name: myapp
  annotations:
    external-dns.alpha.kubernetes.io/hostname: myapp.example.com
    external-dns.alpha.kubernetes.io/ttl: "60"
spec:
  type: LoadBalancer
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080

3. 切换流量

# 切换到 Cluster B (Green)
# 在 Cluster B 中创建相同的 Service
# External DNS 会自动更新 DNS 记录

# 或使用加权路由
kubectl annotate service myapp \
  external-dns.alpha.kubernetes.io/aws-weight="100" \
  -n default --overwrite

# Cluster A 设置权重为 0
kubectl annotate service myapp \
  external-dns.alpha.kubernetes.io/aws-weight="0" \
  -n default --overwrite \
  --kubeconfig=cluster-a-config

使用 Istio 多集群

1. 安装 Istio 多集群

# 集群 A
istioctl install --set profile=default \
  --set values.global.meshID=mesh1 \
  --set values.global.multiCluster.clusterName=cluster-a \
  --set values.global.network=network-a

# 集群 B
istioctl install --set profile=default \
  --set values.global.meshID=mesh1 \
  --set values.global.multiCluster.clusterName=cluster-b \
  --set values.global.network=network-b

2. 配置流量路由

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - myapp.example.com
  http:
  - match:
    - headers:
        x-version:
          exact: "v2"
    route:
    - destination:
        host: myapp.cluster-b.svc.cluster.local
      weight: 100
  - route:
    - destination:
        host: myapp.cluster-a.svc.cluster.local
      weight: 100
    - destination:
        host: myapp.cluster-b.svc.cluster.local
      weight: 0

---
# 逐步切换
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - myapp.example.com
  http:
  - route:
    - destination:
        host: myapp.cluster-a.svc.cluster.local
      weight: 50  # 50% 流量
    - destination:
        host: myapp.cluster-b.svc.cluster.local
      weight: 50  # 50% 流量

微服务架构蓝绿发布

多个微服务协同的蓝绿发布。

场景:关联服务同步升级

┌─────────────────────────────────────┐
│        前端 (v2)                     │
└──────────┬──────────────────────────┘
           │
    ┌──────┴──────┐
    │             │
┌───▼───┐    ┌───▼───┐
│用户服务│    │订单服务│
│  v2   │    │  v2   │
└───┬───┘    └───┬───┘
    │            │
    └─────┬──────┘
          │
    ┌─────▼─────┐
    │  支付服务  │
    │    v2     │
    └───────────┘

协调发布脚本:

#!/bin/bash
set -e

SERVICES=("payment-service" "order-service" "user-service" "frontend")
NEW_VERSION="v2.0"
NAMESPACE="production"

echo "🚀 开始微服务蓝绿发布"

# 1. 按依赖顺序部署绿环境
for SERVICE in "${SERVICES[@]}"; do
    echo "📦 部署 $SERVICE 绿环境..."
    
    # 部署绿环境
    kubectl set image deployment/$SERVICE-green \
        $SERVICE=myregistry/$SERVICE:$NEW_VERSION \
        -n $NAMESPACE
    
    # 等待就绪
    kubectl rollout status deployment/$SERVICE-green \
        -n $NAMESPACE --timeout=5m
    
    echo "✅ $SERVICE 绿环境就绪"
done

# 2. 端到端测试
echo "🧪 运行端到端测试..."
./scripts/e2e-test.sh green

# 3. 按依赖顺序切换流量(从底层到上层)
for SERVICE in "${SERVICES[@]}"; do
    echo "🔄 切换 $SERVICE 流量到绿环境..."
    
    kubectl patch service $SERVICE-service \
        -n $NAMESPACE \
        -p '{"spec":{"selector":{"version":"green"}}}'
    
    # 监控 30 秒
    echo "📊 监控 $SERVICE..."
    sleep 30
    
    # 检查错误率
    ERROR_RATE=$(kubectl logs -l app=$SERVICE,version=green \
        -n $NAMESPACE --tail=1000 | grep ERROR | wc -l)
    
    if [ $ERROR_RATE -gt 10 ]; then
        echo "❌ $SERVICE 错误率过高,开始回滚"
        # 回滚所有已切换的服务
        for ROLLBACK_SERVICE in "${SERVICES[@]}"; do
            kubectl patch service $ROLLBACK_SERVICE-service \
                -n $NAMESPACE \
                -p '{"spec":{"selector":{"version":"blue"}}}'
        done
        exit 1
    fi
    
    echo "✅ $SERVICE 切换成功"
done

echo "🎉 所有微服务切换完成"

服务依赖管理

使用 Helm 管理依赖:

# Chart.yaml
apiVersion: v2
name: microservices
version: 1.0.0
dependencies:
- name: payment-service
  version: "2.0.0"
  repository: "https://charts.example.com"
- name: order-service
  version: "2.0.0"
  repository: "https://charts.example.com"
  condition: order-service.enabled
- name: user-service
  version: "2.0.0"
  repository: "https://charts.example.com"

values.yaml:

global:
  version: blue  # 或 green
  namespace: production

payment-service:
  enabled: true
  version: blue
  replicas: 3

order-service:
  enabled: true
  version: blue
  replicas: 3
  dependencies:
    - payment-service

user-service:
  enabled: true
  version: blue
  replicas: 3

切换命令:

# 升级到绿环境
helm upgrade microservices ./charts/microservices \
  --set global.version=green \
  --namespace production

# 切换流量
helm upgrade microservices ./charts/microservices \
  --set global.activeVersion=green \
  --namespace production

合规性和审计

发布审批流程

使用 Argo CD + Notifications:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-prod
  annotations:
    # 需要审批
    argocd-image-updater.argoproj.io/require-approval: "true"
spec:
  project: production
  syncPolicy:
    automated: null  # 禁用自动同步
    syncOptions:
    - CreateNamespace=false
    - ApplyOutOfSyncOnly=true
  
  # 审批配置
  notifications:
    context:
      requireApproval: "true"
      approvers: "ops-team@example.com"

审批 Webhook:

// approval-webhook.go
func ApprovalHandler(w http.ResponseWriter, r *http.Request) {
    var req ApprovalRequest
    json.NewDecoder(r.Body).Decode(&req)
    
    // 记录审计日志
    auditLog := AuditLog{
        Timestamp:   time.Now(),
        Application: req.AppName,
        Version:     req.Version,
        Approver:    req.Approver,
        Environment: req.Environment,
        Status:      "approved",
    }
    
    // 保存到数据库
    db.Save(&auditLog)
    
    // 触发发布
    triggerDeployment(req.AppName, req.Version)
    
    w.WriteHeader(http.StatusOK)
}

变更记录

自动生成变更日志:

apiVersion: v1
kind: ConfigMap
metadata:
  name: deployment-changelog
  annotations:
    kubernetes.io/change-cause: "Deploy v2.0 with blue-green strategy"
data:
  changelog: |
    Version: v2.0
    Date: 2024-01-10 15:00:00
    Strategy: Blue-Green
    Deployed By: ops-user@example.com
    
    Changes:
    - Added new payment gateway
    - Fixed memory leak in order service
    - Updated database schema
    
    Rollback Command:
    kubectl patch service myapp-service -p '{"spec":{"selector":{"version":"blue"}}}'

成本优化

自动清理旧环境

apiVersion: batch/v1
kind: CronJob
metadata:
  name: cleanup-old-deployments
spec:
  schedule: "0 2 * * *"  # 每天凌晨 2 点
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleanup
            image: bitnami/kubectl:latest
            command:
            - /bin/bash
            - -c
            - |
              # 获取当前激活版本
              ACTIVE_VERSION=$(kubectl get service myapp-service \
                -o jsonpath='{.spec.selector.version}')
              
              # 删除非激活版本
              if [ "$ACTIVE_VERSION" == "blue" ]; then
                INACTIVE="green"
              else
                INACTIVE="blue"
              fi
              
              # 检查是否超过保留期(7天)
              DEPLOY_TIME=$(kubectl get deployment myapp-$INACTIVE \
                -o jsonpath='{.metadata.creationTimestamp}')
              
              if [ $(date -d "$DEPLOY_TIME" +%s) -lt $(date -d "7 days ago" +%s) ]; then
                echo "Deleting old deployment: myapp-$INACTIVE"
                kubectl delete deployment myapp-$INACTIVE
              fi
          restartPolicy: OnFailure
          serviceAccountName: cleanup-sa

资源预留策略

# 使用 PriorityClass 确保关键服务
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: production-high
value: 1000000
globalDefault: false
description: "High priority for production services"

---
# 蓝环境(生产)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  template:
    spec:
      priorityClassName: production-high
      containers:
      - name: myapp
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

---
# 绿环境(测试)- 较低优先级
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  template:
    spec:
      priorityClassName: production-medium
      containers:
      - name: myapp
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"

小结

高级场景蓝绿发布的关键点:

有状态应用:

  • 数据同步策略
  • 主从复制
  • 共享存储

多集群:

  • DNS 流量切换
  • Istio 多集群网格
  • 跨区域部署

微服务:

  • 依赖顺序管理
  • 端到端测试
  • 协调发布

合规性:

  • 审批流程
  • 审计日志
  • 变更记录

成本优化:

  • 自动清理
  • 资源优先级
  • 弹性伸缩

蓝绿发布虽然简单,但在复杂场景下需要周密的规划和自动化支持。