应用部署和负载均衡

本章节介绍如何在 EKS 集群中部署微服务应用,包括 Deployment、Service、Ingress 配置,以及金丝雀发布、蓝绿部署等高级发布策略。

微服务架构设计

应用架构

示例电商平台架构:

                  互联网用户
                      ↓
              ┌──────────────┐
              │   Route 53   │
              │  (DNS 路由)  │
              └──────┬───────┘
                     ↓
              ┌──────────────┐
              │  CloudFront  │
              │   (CDN)      │
              └──────┬───────┘
                     ↓
              ┌──────────────┐
              │     ALB      │
              │ (Ingress)    │
              └──────┬───────┘
                     ↓
       ┌─────────────┴─────────────┐
       ↓                           ↓
┌─────────────┐            ┌─────────────┐
│  Frontend   │            │  API Gateway│
│  (Next.js)  │            │  (BFF)      │
└─────────────┘            └──────┬──────┘
                                  ↓
              ┌───────────────────┴────────────────┐
              ↓                   ↓                ↓
       ┌──────────┐        ┌──────────┐    ┌──────────┐
       │  User    │        │ Product  │    │  Order   │
       │ Service  │        │ Service  │    │ Service  │
       └────┬─────┘        └────┬─────┘    └────┬─────┘
            ↓                   ↓                ↓
       ┌──────────┐        ┌──────────┐    ┌──────────┐
       │ RDS      │        │  Redis   │    │ RDS      │
       │(Users DB)│        │ (Cache)  │    │(Orders DB│
       └──────────┘        └──────────┘    └──────────┘

服务清单

核心服务:

1. frontend-service
   ├─ 框架:Next.js
   ├─ 副本数:3
   ├─ 资源:CPU 500m, Memory 512Mi
   └─ 端口:3000

2. api-gateway-service
   ├─ 框架:Node.js (Express)
   ├─ 副本数:3
   ├─ 资源:CPU 1000m, Memory 1Gi
   └─ 端口:8080

3. user-service
   ├─ 语言:Go
   ├─ 副本数:3
   ├─ 资源:CPU 500m, Memory 512Mi
   ├─ 端口:9001
   └─ 数据库:RDS PostgreSQL

4. product-service
   ├─ 语言:Python (FastAPI)
   ├─ 副本数:3
   ├─ 资源:CPU 500m, Memory 512Mi
   ├─ 端口:9002
   └─ 缓存:Redis

5. order-service
   ├─ 语言:Java (Spring Boot)
   ├─ 副本数:3
   ├─ 资源:CPU 1000m, Memory 1Gi
   ├─ 端口:9003
   └─ 数据库:RDS PostgreSQL

Kubernetes 资源配置

Namespace 组织

# namespaces.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production
    team: platform

---
apiVersion: v1
kind: Namespace
metadata:
  name: staging
  labels:
    environment: staging
    team: platform

---
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
  labels:
    environment: production
    purpose: monitoring

ConfigMap 配置

# api-gateway-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: api-gateway-config
  namespace: production
data:
  # 应用配置
  NODE_ENV: "production"
  LOG_LEVEL: "info"
  
  # 服务发现
  USER_SERVICE_URL: "http://user-service:9001"
  PRODUCT_SERVICE_URL: "http://product-service:9002"
  ORDER_SERVICE_URL: "http://order-service:9003"
  
  # 功能开关
  ENABLE_CACHE: "true"
  ENABLE_RATE_LIMIT: "true"
  
  # 超时配置
  REQUEST_TIMEOUT: "30000"
  CIRCUIT_BREAKER_TIMEOUT: "5000"
  
  # 应用配置文件
  app.json: |
    {
      "rateLimit": {
        "windowMs": 900000,
        "max": 100
      },
      "cors": {
        "origin": ["https://example.com"],
        "credentials": true
      }
    }

Secret 管理(使用 External Secrets)

安装 External Secrets Operator:

#!/bin/bash
# install-external-secrets.sh

CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

echo "安装 External Secrets Operator..."

# 1. 创建 IAM 策略
cat > external-secrets-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret"
      ],
      "Resource": "arn:aws:secretsmanager:*:*:secret:production/*"
    }
  ]
}
EOF

aws iam create-policy \
  --policy-name ExternalSecretsPolicy \
  --policy-document file://external-secrets-policy.json

# 2. 创建 IRSA
eksctl create iamserviceaccount \
  --name external-secrets \
  --namespace kube-system \
  --cluster $CLUSTER_NAME \
  --attach-policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/ExternalSecretsPolicy \
  --approve \
  --region $REGION

# 3. 安装 Operator
helm repo add external-secrets https://charts.external-secrets.io
helm repo update

helm install external-secrets \
  external-secrets/external-secrets \
  -n kube-system \
  --set installCRDs=true \
  --set serviceAccount.create=false \
  --set serviceAccount.name=external-secrets

echo "✓ External Secrets Operator 已安装"

rm -f external-secrets-policy.json

配置 SecretStore:

# secretstore.yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secretsmanager
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets
            namespace: kube-system

创建 ExternalSecret:

# database-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager
    kind: SecretStore
  target:
    name: database-credentials
    creationPolicy: Owner
  data:
  - secretKey: username
    remoteRef:
      key: production/database
      property: username
  - secretKey: password
    remoteRef:
      key: production/database
      property: password
  - secretKey: host
    remoteRef:
      key: production/database
      property: host
  - secretKey: database
    remoteRef:
      key: production/database
      property: database

Deployment 配置

User Service Deployment:

# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: production
  labels:
    app: user-service
    version: v1
spec:
  replicas: 3
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
        version: v1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9001"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: user-service
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      # 亲和性配置(跨 AZ 分布)
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - user-service
              topologyKey: topology.kubernetes.io/zone
      
      # 容器配置
      containers:
      - name: user-service
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/user-service:v1.2.3
        imagePullPolicy: IfNotPresent
        
        ports:
        - name: http
          containerPort: 9001
          protocol: TCP
        - name: metrics
          containerPort: 9090
          protocol: TCP
        
        # 环境变量
        env:
        - name: PORT
          value: "9001"
        - name: LOG_LEVEL
          value: "info"
        - name: DB_HOST
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: host
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: password
        - name: DB_NAME
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: database
        
        # 资源限制
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        
        # 健康检查
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        
        # 启动探针(慢启动应用)
        startupProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 0
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 30
        
        # 安全上下文
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        
        # 挂载卷
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      
      # 卷定义
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}
      
      # 终止宽限期
      terminationGracePeriodSeconds: 60

Service 配置

# user-service-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: production
  labels:
    app: user-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: ClusterIP
  sessionAffinity: None
  selector:
    app: user-service
  ports:
  - name: http
    protocol: TCP
    port: 9001
    targetPort: http
  - name: metrics
    protocol: TCP
    port: 9090
    targetPort: metrics

HorizontalPodAutoscaler (HPA)

# user-service-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 3
  maxReplicas: 10
  metrics:
  # CPU 利用率
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  # 内存利用率
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  # 自定义指标(每秒请求数)
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 1
        periodSeconds: 60
      selectPolicy: Min

PodDisruptionBudget (PDB)

# user-service-pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: user-service
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: user-service

Ingress 和 ALB 配置

Ingress 资源

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-ingress
  namespace: production
  annotations:
    # ALB 配置
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    
    # 监听器
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: '443'
    
    # SSL/TLS
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/xxx
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
    
    # 健康检查
    alb.ingress.kubernetes.io/healthcheck-path: /health
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: '15'
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '5'
    alb.ingress.kubernetes.io/healthy-threshold-count: '2'
    alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
    
    # 访问日志
    alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=production-alb-logs,access_logs.s3.prefix=ingress
    
    # 标签
    alb.ingress.kubernetes.io/tags: Environment=production,Team=platform
    
    # 安全组
    alb.ingress.kubernetes.io/security-groups: sg-0123456789abcdef0
    
    # 子网
    alb.ingress.kubernetes.io/subnets: subnet-public-1a,subnet-public-1b,subnet-public-1c

spec:
  rules:
  # API 流量
  - host: api.example.com
    http:
      paths:
      - path: /api/users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 9001
      
      - path: /api/products
        pathType: Prefix
        backend:
          service:
            name: product-service
            port:
              number: 9002
      
      - path: /api/orders
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 9003
  
  # 前端流量
  - host: www.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 3000

多环境 Ingress

生产环境:

# ingress-production.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.name: production
    alb.ingress.kubernetes.io/group.order: '10'
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-gateway-service
            port:
              number: 8080

金丝雀环境:

# ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: canary-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/group.name: production
    alb.ingress.kubernetes.io/group.order: '5'
    alb.ingress.kubernetes.io/conditions.canary: |
      [{"field":"http-header","httpHeaderConfig":{"httpHeaderName":"X-Canary","values":["true"]}}]
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-gateway-service-canary
            port:
              number: 8080

高级发布策略

滚动更新(默认)

配置示例:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # 额外创建的 Pod 数量
      maxUnavailable: 0  # 不可用的 Pod 数量

发布脚本:

#!/bin/bash
# rolling-update.sh

NAMESPACE="production"
DEPLOYMENT="user-service"
IMAGE="123456789012.dkr.ecr.us-east-1.amazonaws.com/user-service:v1.2.4"

echo "执行滚动更新..."
echo "部署: $DEPLOYMENT"
echo "镜像: $IMAGE"

# 更新镜像
kubectl set image deployment/$DEPLOYMENT \
  user-service=$IMAGE \
  -n $NAMESPACE

# 监控发布进度
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE

# 检查 Pod 状态
kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT

echo "✓ 滚动更新完成"

金丝雀发布

Istio VirtualService:

# canary-virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service
  namespace: production
spec:
  hosts:
  - user-service
  http:
  - match:
    - headers:
        X-Canary:
          exact: "true"
    route:
    - destination:
        host: user-service
        subset: canary
  - route:
    - destination:
        host: user-service
        subset: stable
      weight: 90
    - destination:
        host: user-service
        subset: canary
      weight: 10

DestinationRule:

# canary-destinationrule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: user-service
  namespace: production
spec:
  host: user-service
  subsets:
  - name: stable
    labels:
      version: v1
  - name: canary
    labels:
      version: v2

金丝雀发布脚本:

#!/bin/bash
# canary-deployment.sh

NAMESPACE="production"
SERVICE="user-service"
STABLE_VERSION="v1.2.3"
CANARY_VERSION="v1.2.4"

echo "================================================"
echo "金丝雀发布: $SERVICE"
echo "稳定版本: $STABLE_VERSION"
echo "金丝雀版本: $CANARY_VERSION"
echo "================================================"

# 1. 部署金丝雀版本(10% 流量)
echo ""
echo "1. 部署金丝雀版本..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${SERVICE}-canary
  namespace: $NAMESPACE
spec:
  replicas: 1
  selector:
    matchLabels:
      app: $SERVICE
      version: v2
  template:
    metadata:
      labels:
        app: $SERVICE
        version: v2
    spec:
      containers:
      - name: $SERVICE
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/${SERVICE}:${CANARY_VERSION}
        # ... 其他配置与稳定版本相同
EOF

# 2. 配置流量分割
echo ""
echo "2. 配置流量分割(10% 金丝雀)..."
kubectl apply -f canary-virtualservice.yaml
kubectl apply -f canary-destinationrule.yaml

# 3. 监控指标
echo ""
echo "3. 监控金丝雀指标..."
echo "   查看 Grafana 仪表盘"
echo "   命令: kubectl port-forward -n monitoring svc/grafana 3000:80"

read -p "金丝雀版本正常吗?(y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
  # 4. 逐步增加流量
  echo ""
  echo "4. 增加金丝雀流量到 50%..."
  kubectl patch virtualservice $SERVICE -n $NAMESPACE --type merge -p '
  {
    "spec": {
      "http": [{
        "route": [
          {"destination": {"host": "'$SERVICE'", "subset": "stable"}, "weight": 50},
          {"destination": {"host": "'$SERVICE'", "subset": "canary"}, "weight": 50}
        ]
      }]
    }
  }'
  
  sleep 300  # 等待 5 分钟观察
  
  read -p "继续推进到 100%?(y/n) " -n 1 -r
  echo
  if [[ $REPLY =~ ^[Yy]$ ]]; then
    # 5. 完全切换
    echo ""
    echo "5. 完全切换到新版本..."
    kubectl set image deployment/$SERVICE \
      $SERVICE=123456789012.dkr.ecr.us-east-1.amazonaws.com/${SERVICE}:${CANARY_VERSION} \
      -n $NAMESPACE
    
    kubectl delete deployment ${SERVICE}-canary -n $NAMESPACE
    
    echo "✓ 金丝雀发布成功"
  fi
else
  # 回滚
  echo ""
  echo "回滚金丝雀部署..."
  kubectl delete deployment ${SERVICE}-canary -n $NAMESPACE
  kubectl delete virtualservice $SERVICE -n $NAMESPACE
  echo "✓ 已回滚"
fi

echo "================================================"

蓝绿部署

部署脚本:

#!/bin/bash
# blue-green-deployment.sh

NAMESPACE="production"
SERVICE="user-service"
CURRENT_VERSION="blue"  # 或 "green"
NEW_VERSION="green"     # 或 "blue"
IMAGE_TAG="v1.2.4"

echo "================================================"
echo "蓝绿部署: $SERVICE"
echo "当前版本: $CURRENT_VERSION"
echo "新版本: $NEW_VERSION"
echo "================================================"

# 1. 部署新版本(绿色)
echo ""
echo "1. 部署 $NEW_VERSION 环境..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${SERVICE}-${NEW_VERSION}
  namespace: $NAMESPACE
spec:
  replicas: 3
  selector:
    matchLabels:
      app: $SERVICE
      version: $NEW_VERSION
  template:
    metadata:
      labels:
        app: $SERVICE
        version: $NEW_VERSION
    spec:
      containers:
      - name: $SERVICE
        image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/${SERVICE}:${IMAGE_TAG}
        # ... 其他配置
EOF

# 2. 等待新版本就绪
echo ""
echo "2. 等待新版本就绪..."
kubectl rollout status deployment/${SERVICE}-${NEW_VERSION} -n $NAMESPACE

# 3. 运行冒烟测试
echo ""
echo "3. 运行冒烟测试..."
NEW_POD=$(kubectl get pods -n $NAMESPACE -l app=$SERVICE,version=$NEW_VERSION -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n $NAMESPACE $NEW_POD -- curl -s http://localhost:9001/health

# 4. 切换流量
echo ""
read -p "切换流量到 $NEW_VERSION?(y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
  echo "切换 Service 选择器..."
  kubectl patch service $SERVICE -n $NAMESPACE -p '
  {
    "spec": {
      "selector": {
        "app": "'$SERVICE'",
        "version": "'$NEW_VERSION'"
      }
    }
  }'
  
  echo "✓ 流量已切换到 $NEW_VERSION"
  
  # 5. 清理旧版本
  echo ""
  read -p "删除 $CURRENT_VERSION 部署?(y/n) " -n 1 -r
  echo
  if [[ $REPLY =~ ^[Yy]$ ]]; then
    kubectl delete deployment ${SERVICE}-${CURRENT_VERSION} -n $NAMESPACE
    echo "✓ 旧版本已删除"
  fi
else
  echo "保持当前版本"
  kubectl delete deployment ${SERVICE}-${NEW_VERSION} -n $NAMESPACE
fi

echo "================================================"

部署验证

健康检查

#!/bin/bash
# verify-deployment.sh

NAMESPACE="production"
SERVICE="user-service"

echo "================================================"
echo "验证部署: $SERVICE"
echo "================================================"

# 1. Deployment 状态
echo ""
echo "1. Deployment 状态"
kubectl get deployment $SERVICE -n $NAMESPACE

# 2. Pod 状态
echo ""
echo "2. Pod 状态"
kubectl get pods -n $NAMESPACE -l app=$SERVICE -o wide

# 3. 副本数
echo ""
echo "3. 副本状态"
DESIRED=$(kubectl get deployment $SERVICE -n $NAMESPACE -o jsonpath='{.spec.replicas}')
READY=$(kubectl get deployment $SERVICE -n $NAMESPACE -o jsonpath='{.status.readyReplicas}')
echo "期望: $DESIRED, 就绪: $READY"

if [ "$DESIRED" == "$READY" ]; then
  echo "✓ 所有副本就绪"
else
  echo "✗ 副本数不匹配"
  exit 1
fi

# 4. 服务端点
echo ""
echo "4. Service 端点"
kubectl get endpoints $SERVICE -n $NAMESPACE

# 5. 日志检查
echo ""
echo "5. 最近日志(最新 Pod)"
LATEST_POD=$(kubectl get pods -n $NAMESPACE -l app=$SERVICE --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].metadata.name}')
kubectl logs -n $NAMESPACE $LATEST_POD --tail=20

# 6. 健康检查
echo ""
echo "6. 健康检查"
kubectl exec -n $NAMESPACE $LATEST_POD -- curl -s http://localhost:9001/health | jq

echo ""
echo "================================================"
echo "验证完成"
echo "================================================"

最佳实践总结

1. 资源配置

✓ 合理设置 requests 和 limits
✓ 配置 HPA 自动扩展
✓ 使用 PDB 保护应用可用性
✓ 配置亲和性和反亲和性
✓ 使用 Spot 实例节点组(非关键服务)

2. 健康检查

✓ 配置 liveness、readiness、startup 探针
✓ 合理设置超时和失败阈值
✓ 避免探针影响性能
✓ 使用专门的健康检查端点
✓ 记录健康检查失败原因

3. 配置管理

✓ 使用 ConfigMap 管理配置
✓ 使用 External Secrets 管理敏感信息
✓ 配置与代码分离
✓ 环境特定的配置
✓ 版本控制配置文件

4. 发布策略

✓ 默认使用滚动更新
✓ 关键服务使用金丝雀发布
✓ 蓝绿部署用于零停机切换
✓ 始终有回滚计划
✓ 自动化发布流程

5. 监控和日志

✓ 添加 Prometheus 指标
✓ 结构化日志输出
✓ 配置访问日志
✓ 设置告警规则
✓ 使用分布式追踪

下一步: 继续学习 数据库和缓存层 章节。