基于 Ingress 的精确流量控制

基于 Ingress 的精确流量控制

为什么使用 Ingress

基于 Deployment 的金丝雀发布有一个限制:流量分配粒度粗糙

问题:想要 5% 流量到金丝雀
- 10 Pod 总量,金丝雀 1 Pod = 10% ❌ 太多
- 20 Pod 总量,金丝雀 1 Pod = 5% ✅ 但资源浪费
- 100 Pod 总量,金丝雀 5 Pod = 5% ✅ 但成本高

Ingress 方案

✅ 精确控制流量百分比(1%, 5%, 10%...)
✅ 不受 Pod 数量限制
✅ 支持基于规则的路由

Nginx Ingress Controller 金丝雀

原理

Nginx Ingress 支持通过注解实现金丝雀发布:

┌─────────────────┐
│  Ingress (主)   │  90% 流量 → myapp-stable
└─────────────────┘

┌─────────────────┐
│ Ingress (金丝雀) │  10% 流量 → myapp-canary
│ canary: true    │
│ canary-weight:10│
└─────────────────┘

完整示例

1. 创建两个独立的 Service

# services.yaml
---
# 稳定版本 Service
apiVersion: v1
kind: Service
metadata:
  name: myapp-stable
  labels:
    app: myapp
    version: stable
spec:
  selector:
    app: myapp
    version: stable
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  type: ClusterIP

---
# 金丝雀版本 Service
apiVersion: v1
kind: Service
metadata:
  name: myapp-canary
  labels:
    app: myapp
    version: canary
spec:
  selector:
    app: myapp
    version: canary
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
  type: ClusterIP

2. 主 Ingress(稳定版本)

# ingress-stable.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-stable
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-stable
            port:
              number: 80

3. 金丝雀 Ingress

# ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary
  annotations:
    kubernetes.io/ingress.class: nginx
    # 启用金丝雀
    nginx.ingress.kubernetes.io/canary: "true"
    # 权重:10% 流量到金丝雀
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

应用配置:

# 部署 Deployment(稳定版和金丝雀版)
kubectl apply -f deployments.yaml

# 创建 Service
kubectl apply -f services.yaml

# 创建 Ingress
kubectl apply -f ingress-stable.yaml
kubectl apply -f ingress-canary.yaml

# 验证
kubectl get ingress
kubectl describe ingress myapp-canary

4. 测试流量分配

# 测试脚本
#!/bin/bash
for i in {1..100}; do
  VERSION=$(curl -s http://myapp.example.com | grep -o 'Version: v[0-9.]*' | cut -d' ' -f2)
  echo $VERSION
done | sort | uniq -c

# 预期输出:
# 90 v1.0
# 10 v2.0

流量调整

增加到 25%

kubectl patch ingress myapp-canary -p \
  '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"25"}}}'

# 验证
kubectl get ingress myapp-canary -o jsonpath='{.metadata.annotations}'

增加到 50%

kubectl patch ingress myapp-canary -p \
  '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"50"}}}'

全量切换

# 方案 1:将金丝雀权重设为 100
kubectl patch ingress myapp-canary -p \
  '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'

# 方案 2:将主 Ingress 指向金丝雀 Service
kubectl patch ingress myapp-stable --type=json -p='[
  {"op": "replace", "path": "/spec/rules/0/http/paths/0/backend/service/name", "value": "myapp-canary"}
]'

# 然后删除金丝雀 Ingress
kubectl delete ingress myapp-canary

基于请求头的金丝雀

用例场景

  • 内部员工测试:员工带特定 Header 访问金丝雀
  • Beta 用户:Beta 用户请求路由到金丝雀
  • 调试模式:开发者调试时访问金丝雀

配置

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-header
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    # 基于 Header 路由
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    nginx.ingress.kubernetes.io/canary-by-header-value: "true"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

测试

# 普通请求 → 稳定版本
curl http://myapp.example.com
# Response: Version: v1.0

# 带 Header 的请求 → 金丝雀版本
curl -H "X-Canary: true" http://myapp.example.com
# Response: Version: v2.0

# Header 值不匹配 → 稳定版本
curl -H "X-Canary: false" http://myapp.example.com
# Response: Version: v1.0

正则表达式匹配

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-header-pattern
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "User-Agent"
    # 匹配特定 User-Agent
    nginx.ingress.kubernetes.io/canary-by-header-pattern: ".*iOS.*"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

测试:

# iOS 设备 → 金丝雀
curl -H "User-Agent: MyApp/1.0 (iOS 15.0)" http://myapp.example.com

# Android 设备 → 稳定版本
curl -H "User-Agent: MyApp/1.0 (Android 12)" http://myapp.example.com

基于 Cookie 的金丝雀

用例场景

  • 用户分组:特定用户组使用金丝雀
  • A/B 测试:根据用户 ID 分流
  • 灰度用户:选择部分用户参与测试

配置

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-cookie
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    # 基于 Cookie 路由
    nginx.ingress.kubernetes.io/canary-by-cookie: "canary-user"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

Cookie 值规则

# Cookie 值为 "always" → 金丝雀
curl -b "canary-user=always" http://myapp.example.com

# Cookie 值为 "never" → 稳定版本
curl -b "canary-user=never" http://myapp.example.com

# 无 Cookie 或其他值 → 按 canary-weight 分配
curl http://myapp.example.com

实现用户灰度

// 前端代码:根据用户 ID 设置 Cookie
function setCanaryCookie(userId) {
  // 用户 ID 末位为 0-4 → 金丝雀
  const lastDigit = userId % 10;
  const canaryValue = lastDigit < 5 ? 'always' : 'never';
  
  document.cookie = `canary-user=${canaryValue}; path=/; max-age=86400`;
}

// 调用
setCanaryCookie(12345); // 末位是 5 → 稳定版本
setCanaryCookie(12340); // 末位是 0 → 金丝雀版本

组合策略

Header + Weight

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-combined
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    # 1. 优先:检查 Header
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    nginx.ingress.kubernetes.io/canary-by-header-value: "true"
    # 2. 兜底:按权重分配
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

路由逻辑

  1. 如果请求带 X-Canary: true → 金丝雀
  2. 如果请求带 X-Canary: false → 稳定版本
  3. 否则,按 10% 权重分配

Cookie + Header

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-multi
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    # 1. 优先:Cookie
    nginx.ingress.kubernetes.io/canary-by-cookie: "canary-user"
    # 2. 其次:Header
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    # 3. 兜底:权重
    nginx.ingress.kubernetes.io/canary-weight: "5"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

多路径金丝雀

场景

只对特定路径进行金丝雀发布:

/api/v1/*  → 稳定版本
/api/v2/*  → 金丝雀版本

配置

# 稳定版本 - API v1
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-stable-v1
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /api/v1
        pathType: Prefix
        backend:
          service:
            name: myapp-stable
            port:
              number: 80

---
# 金丝雀版本 - API v2
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-v2
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "20"
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /api/v2
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

自动化脚本

渐进式流量调整脚本

#!/bin/bash
# progressive-rollout.sh

set -e

INGRESS_NAME="myapp-canary"
NAMESPACE="default"
STAGES=(1 5 10 25 50 75 100)
WAIT_TIME=300  # 每阶段等待 5 分钟

echo "🚀 Starting progressive canary rollout..."

for WEIGHT in "${STAGES[@]}"; do
  echo ""
  echo "📊 Setting canary weight to ${WEIGHT}%..."
  
  kubectl patch ingress $INGRESS_NAME -n $NAMESPACE -p \
    "{\"metadata\":{\"annotations\":{\"nginx.ingress.kubernetes.io/canary-weight\":\"${WEIGHT}\"}}}"
  
  echo "⏳ Waiting ${WAIT_TIME}s for metrics..."
  sleep $WAIT_TIME
  
  echo "🔍 Checking metrics..."
  # 这里可以集成 Prometheus 查询
  ERROR_RATE=$(kubectl exec -n monitoring prometheus-0 -- \
    promtool query instant 'rate(http_requests_total{status=~"5.."}[5m])' | \
    jq -r '.data.result[0].value[1]')
  
  echo "Error rate: ${ERROR_RATE}"
  
  # 如果错误率过高,回滚
  if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
    echo "❌ Error rate too high! Rolling back..."
    kubectl patch ingress $INGRESS_NAME -n $NAMESPACE -p \
      '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"0"}}}'
    exit 1
  fi
  
  echo "✅ Weight ${WEIGHT}% is stable"
done

echo ""
echo "🎉 Canary rollout completed successfully!"

Header 测试脚本

#!/bin/bash
# test-header-routing.sh

HOST="myapp.example.com"

echo "Testing header-based routing..."
echo ""

echo "1️⃣ Request without header (should go to stable):"
curl -s $HOST | grep Version

echo ""
echo "2️⃣ Request with X-Canary: true (should go to canary):"
curl -s -H "X-Canary: true" $HOST | grep Version

echo ""
echo "3️⃣ Request with X-Canary: false (should go to stable):"
curl -s -H "X-Canary: false" $HOST | grep Version

echo ""
echo "4️⃣ Load test (100 requests, 10% should go to canary):"
for i in {1..100}; do
  curl -s $HOST | grep -o 'v[0-9.]*'
done | sort | uniq -c

监控和告警

Prometheus 指标

# ServiceMonitor for Nginx Ingress
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-ingress
spec:
  selector:
    matchLabels:
      app: nginx-ingress
  endpoints:
  - port: metrics
    interval: 30s

Grafana Dashboard 查询

# 金丝雀流量占比
sum(rate(nginx_ingress_controller_requests{service="myapp-canary"}[5m]))
/
sum(rate(nginx_ingress_controller_requests{service=~"myapp-(stable|canary)"}[5m]))

# 金丝雀错误率
rate(nginx_ingress_controller_requests{service="myapp-canary",status=~"5.."}[5m])
/
rate(nginx_ingress_controller_requests{service="myapp-canary"}[5m])

# 金丝雀响应时间 P99
histogram_quantile(0.99,
  rate(nginx_ingress_controller_request_duration_seconds_bucket{service="myapp-canary"}[5m])
)

告警规则

# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: canary-alerts
spec:
  groups:
  - name: canary
    interval: 30s
    rules:
    - alert: CanaryHighErrorRate
      expr: |
        rate(nginx_ingress_controller_requests{service="myapp-canary",status=~"5.."}[5m])
        /
        rate(nginx_ingress_controller_requests{service="myapp-canary"}[5m])
        > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Canary error rate is high"
        description: "Canary version has {{ $value | humanizePercentage }} error rate"
    
    - alert: CanaryHighLatency
      expr: |
        histogram_quantile(0.99,
          rate(nginx_ingress_controller_request_duration_seconds_bucket{service="myapp-canary"}[5m])
        ) > 1
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Canary P99 latency is high"
        description: "Canary P99 latency is {{ $value }}s"

故障排查

问题 1:金丝雀不生效

# 检查 Ingress 注解
kubectl get ingress myapp-canary -o yaml | grep -A 5 annotations

# 检查 Nginx Ingress Controller 日志
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller

# 验证配置
kubectl exec -n ingress-nginx <ingress-pod> -- cat /etc/nginx/nginx.conf | grep canary

问题 2:流量全部到金丝雀

可能原因:

  • canary-weight 设置为 100
  • 主 Ingress 不存在
  • Service Endpoint 问题
# 检查权重
kubectl get ingress myapp-canary -o jsonpath='{.metadata.annotations}'

# 检查主 Ingress
kubectl get ingress myapp-stable

# 检查 Service Endpoints
kubectl get endpoints myapp-stable myapp-canary

问题 3:Header 路由不工作

# 测试时显示 Header
curl -v -H "X-Canary: true" http://myapp.example.com

# 检查 Ingress 配置
kubectl describe ingress myapp-canary | grep canary-by-header

# 验证 Header 值
kubectl exec -n ingress-nginx <ingress-pod> -- \
  curl -s -H "X-Canary: true" http://127.0.0.1:10246/configuration/backends

总结

Ingress 金丝雀优势

精确控制:支持 1% 精度的流量分配
灵活路由:基于 Header、Cookie、权重
资源高效:不受 Pod 数量限制
易于集成:与现有 Ingress 配置兼容

适用场景

场景 推荐方案
需要精确流量控制 Ingress Weight
内部员工测试 Ingress Header
用户分组灰度 Ingress Cookie
小流量验证 Ingress Weight (1-5%)
多环境部署 不同 Ingress

下一节将介绍使用 Flagger 实现全自动的金丝雀发布。