基于 Ingress 的精确流量控制
基于 Ingress 的精确流量控制
为什么使用 Ingress
基于 Deployment 的金丝雀发布有一个限制:流量分配粒度粗糙。
问题:想要 5% 流量到金丝雀
- 10 Pod 总量,金丝雀 1 Pod = 10% ❌ 太多
- 20 Pod 总量,金丝雀 1 Pod = 5% ✅ 但资源浪费
- 100 Pod 总量,金丝雀 5 Pod = 5% ✅ 但成本高
Ingress 方案:
✅ 精确控制流量百分比(1%, 5%, 10%...)
✅ 不受 Pod 数量限制
✅ 支持基于规则的路由
Nginx Ingress Controller 金丝雀
原理
Nginx Ingress 支持通过注解实现金丝雀发布:
┌─────────────────┐
│ Ingress (主) │ 90% 流量 → myapp-stable
└─────────────────┘
┌─────────────────┐
│ Ingress (金丝雀) │ 10% 流量 → myapp-canary
│ canary: true │
│ canary-weight:10│
└─────────────────┘
完整示例
1. 创建两个独立的 Service
# services.yaml
---
# 稳定版本 Service
apiVersion: v1
kind: Service
metadata:
name: myapp-stable
labels:
app: myapp
version: stable
spec:
selector:
app: myapp
version: stable
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
---
# 金丝雀版本 Service
apiVersion: v1
kind: Service
metadata:
name: myapp-canary
labels:
app: myapp
version: canary
spec:
selector:
app: myapp
version: canary
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
2. 主 Ingress(稳定版本)
# ingress-stable.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-stable
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-stable
port:
number: 80
3. 金丝雀 Ingress
# ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary
annotations:
kubernetes.io/ingress.class: nginx
# 启用金丝雀
nginx.ingress.kubernetes.io/canary: "true"
# 权重:10% 流量到金丝雀
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
应用配置:
# 部署 Deployment(稳定版和金丝雀版)
kubectl apply -f deployments.yaml
# 创建 Service
kubectl apply -f services.yaml
# 创建 Ingress
kubectl apply -f ingress-stable.yaml
kubectl apply -f ingress-canary.yaml
# 验证
kubectl get ingress
kubectl describe ingress myapp-canary
4. 测试流量分配
# 测试脚本
#!/bin/bash
for i in {1..100}; do
VERSION=$(curl -s http://myapp.example.com | grep -o 'Version: v[0-9.]*' | cut -d' ' -f2)
echo $VERSION
done | sort | uniq -c
# 预期输出:
# 90 v1.0
# 10 v2.0
流量调整
增加到 25%
kubectl patch ingress myapp-canary -p \
'{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"25"}}}'
# 验证
kubectl get ingress myapp-canary -o jsonpath='{.metadata.annotations}'
增加到 50%
kubectl patch ingress myapp-canary -p \
'{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"50"}}}'
全量切换
# 方案 1:将金丝雀权重设为 100
kubectl patch ingress myapp-canary -p \
'{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'
# 方案 2:将主 Ingress 指向金丝雀 Service
kubectl patch ingress myapp-stable --type=json -p='[
{"op": "replace", "path": "/spec/rules/0/http/paths/0/backend/service/name", "value": "myapp-canary"}
]'
# 然后删除金丝雀 Ingress
kubectl delete ingress myapp-canary
基于请求头的金丝雀
用例场景
- 内部员工测试:员工带特定 Header 访问金丝雀
- Beta 用户:Beta 用户请求路由到金丝雀
- 调试模式:开发者调试时访问金丝雀
配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary-header
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
# 基于 Header 路由
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "true"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
测试
# 普通请求 → 稳定版本
curl http://myapp.example.com
# Response: Version: v1.0
# 带 Header 的请求 → 金丝雀版本
curl -H "X-Canary: true" http://myapp.example.com
# Response: Version: v2.0
# Header 值不匹配 → 稳定版本
curl -H "X-Canary: false" http://myapp.example.com
# Response: Version: v1.0
正则表达式匹配
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary-header-pattern
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-by-header: "User-Agent"
# 匹配特定 User-Agent
nginx.ingress.kubernetes.io/canary-by-header-pattern: ".*iOS.*"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
测试:
# iOS 设备 → 金丝雀
curl -H "User-Agent: MyApp/1.0 (iOS 15.0)" http://myapp.example.com
# Android 设备 → 稳定版本
curl -H "User-Agent: MyApp/1.0 (Android 12)" http://myapp.example.com
基于 Cookie 的金丝雀
用例场景
- 用户分组:特定用户组使用金丝雀
- A/B 测试:根据用户 ID 分流
- 灰度用户:选择部分用户参与测试
配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary-cookie
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
# 基于 Cookie 路由
nginx.ingress.kubernetes.io/canary-by-cookie: "canary-user"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
Cookie 值规则
# Cookie 值为 "always" → 金丝雀
curl -b "canary-user=always" http://myapp.example.com
# Cookie 值为 "never" → 稳定版本
curl -b "canary-user=never" http://myapp.example.com
# 无 Cookie 或其他值 → 按 canary-weight 分配
curl http://myapp.example.com
实现用户灰度
// 前端代码:根据用户 ID 设置 Cookie
function setCanaryCookie(userId) {
// 用户 ID 末位为 0-4 → 金丝雀
const lastDigit = userId % 10;
const canaryValue = lastDigit < 5 ? 'always' : 'never';
document.cookie = `canary-user=${canaryValue}; path=/; max-age=86400`;
}
// 调用
setCanaryCookie(12345); // 末位是 5 → 稳定版本
setCanaryCookie(12340); // 末位是 0 → 金丝雀版本
组合策略
Header + Weight
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary-combined
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
# 1. 优先:检查 Header
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "true"
# 2. 兜底:按权重分配
nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
路由逻辑:
- 如果请求带
X-Canary: true→ 金丝雀 - 如果请求带
X-Canary: false→ 稳定版本 - 否则,按 10% 权重分配
Cookie + Header
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary-multi
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
# 1. 优先:Cookie
nginx.ingress.kubernetes.io/canary-by-cookie: "canary-user"
# 2. 其次:Header
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
# 3. 兜底:权重
nginx.ingress.kubernetes.io/canary-weight: "5"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
多路径金丝雀
场景
只对特定路径进行金丝雀发布:
/api/v1/* → 稳定版本
/api/v2/* → 金丝雀版本
配置
# 稳定版本 - API v1
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-stable-v1
spec:
rules:
- host: api.example.com
http:
paths:
- path: /api/v1
pathType: Prefix
backend:
service:
name: myapp-stable
port:
number: 80
---
# 金丝雀版本 - API v2
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-canary-v2
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "20"
spec:
rules:
- host: api.example.com
http:
paths:
- path: /api/v2
pathType: Prefix
backend:
service:
name: myapp-canary
port:
number: 80
自动化脚本
渐进式流量调整脚本
#!/bin/bash
# progressive-rollout.sh
set -e
INGRESS_NAME="myapp-canary"
NAMESPACE="default"
STAGES=(1 5 10 25 50 75 100)
WAIT_TIME=300 # 每阶段等待 5 分钟
echo "🚀 Starting progressive canary rollout..."
for WEIGHT in "${STAGES[@]}"; do
echo ""
echo "📊 Setting canary weight to ${WEIGHT}%..."
kubectl patch ingress $INGRESS_NAME -n $NAMESPACE -p \
"{\"metadata\":{\"annotations\":{\"nginx.ingress.kubernetes.io/canary-weight\":\"${WEIGHT}\"}}}"
echo "⏳ Waiting ${WAIT_TIME}s for metrics..."
sleep $WAIT_TIME
echo "🔍 Checking metrics..."
# 这里可以集成 Prometheus 查询
ERROR_RATE=$(kubectl exec -n monitoring prometheus-0 -- \
promtool query instant 'rate(http_requests_total{status=~"5.."}[5m])' | \
jq -r '.data.result[0].value[1]')
echo "Error rate: ${ERROR_RATE}"
# 如果错误率过高,回滚
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "❌ Error rate too high! Rolling back..."
kubectl patch ingress $INGRESS_NAME -n $NAMESPACE -p \
'{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"0"}}}'
exit 1
fi
echo "✅ Weight ${WEIGHT}% is stable"
done
echo ""
echo "🎉 Canary rollout completed successfully!"
Header 测试脚本
#!/bin/bash
# test-header-routing.sh
HOST="myapp.example.com"
echo "Testing header-based routing..."
echo ""
echo "1️⃣ Request without header (should go to stable):"
curl -s $HOST | grep Version
echo ""
echo "2️⃣ Request with X-Canary: true (should go to canary):"
curl -s -H "X-Canary: true" $HOST | grep Version
echo ""
echo "3️⃣ Request with X-Canary: false (should go to stable):"
curl -s -H "X-Canary: false" $HOST | grep Version
echo ""
echo "4️⃣ Load test (100 requests, 10% should go to canary):"
for i in {1..100}; do
curl -s $HOST | grep -o 'v[0-9.]*'
done | sort | uniq -c
监控和告警
Prometheus 指标
# ServiceMonitor for Nginx Ingress
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nginx-ingress
spec:
selector:
matchLabels:
app: nginx-ingress
endpoints:
- port: metrics
interval: 30s
Grafana Dashboard 查询
# 金丝雀流量占比
sum(rate(nginx_ingress_controller_requests{service="myapp-canary"}[5m]))
/
sum(rate(nginx_ingress_controller_requests{service=~"myapp-(stable|canary)"}[5m]))
# 金丝雀错误率
rate(nginx_ingress_controller_requests{service="myapp-canary",status=~"5.."}[5m])
/
rate(nginx_ingress_controller_requests{service="myapp-canary"}[5m])
# 金丝雀响应时间 P99
histogram_quantile(0.99,
rate(nginx_ingress_controller_request_duration_seconds_bucket{service="myapp-canary"}[5m])
)
告警规则
# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: canary-alerts
spec:
groups:
- name: canary
interval: 30s
rules:
- alert: CanaryHighErrorRate
expr: |
rate(nginx_ingress_controller_requests{service="myapp-canary",status=~"5.."}[5m])
/
rate(nginx_ingress_controller_requests{service="myapp-canary"}[5m])
> 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Canary error rate is high"
description: "Canary version has {{ $value | humanizePercentage }} error rate"
- alert: CanaryHighLatency
expr: |
histogram_quantile(0.99,
rate(nginx_ingress_controller_request_duration_seconds_bucket{service="myapp-canary"}[5m])
) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Canary P99 latency is high"
description: "Canary P99 latency is {{ $value }}s"
故障排查
问题 1:金丝雀不生效
# 检查 Ingress 注解
kubectl get ingress myapp-canary -o yaml | grep -A 5 annotations
# 检查 Nginx Ingress Controller 日志
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller
# 验证配置
kubectl exec -n ingress-nginx <ingress-pod> -- cat /etc/nginx/nginx.conf | grep canary
问题 2:流量全部到金丝雀
可能原因:
canary-weight设置为 100- 主 Ingress 不存在
- Service Endpoint 问题
# 检查权重
kubectl get ingress myapp-canary -o jsonpath='{.metadata.annotations}'
# 检查主 Ingress
kubectl get ingress myapp-stable
# 检查 Service Endpoints
kubectl get endpoints myapp-stable myapp-canary
问题 3:Header 路由不工作
# 测试时显示 Header
curl -v -H "X-Canary: true" http://myapp.example.com
# 检查 Ingress 配置
kubectl describe ingress myapp-canary | grep canary-by-header
# 验证 Header 值
kubectl exec -n ingress-nginx <ingress-pod> -- \
curl -s -H "X-Canary: true" http://127.0.0.1:10246/configuration/backends
总结
Ingress 金丝雀优势
✅ 精确控制:支持 1% 精度的流量分配
✅ 灵活路由:基于 Header、Cookie、权重
✅ 资源高效:不受 Pod 数量限制
✅ 易于集成:与现有 Ingress 配置兼容
适用场景
| 场景 | 推荐方案 |
|---|---|
| 需要精确流量控制 | Ingress Weight |
| 内部员工测试 | Ingress Header |
| 用户分组灰度 | Ingress Cookie |
| 小流量验证 | Ingress Weight (1-5%) |
| 多环境部署 | 不同 Ingress |
下一节将介绍使用 Flagger 实现全自动的金丝雀发布。