蓝绿发布自动化工具
蓝绿发布自动化工具
探索各种工具和平台来自动化蓝绿发布流程。
Argo Rollouts 深度实践
Argo Rollouts 是专为 Kubernetes 设计的渐进式交付控制器。
完整的 Rollout 配置
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
namespace: production
spec:
replicas: 5
revisionHistoryLimit: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myregistry/myapp:v1.0
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
strategy:
blueGreen:
# 激活 Service(生产流量)
activeService: myapp-active
# 预览 Service(测试流量)
previewService: myapp-preview
# 自动升级配置
autoPromotionEnabled: false
autoPromotionSeconds: 30
# 缩容配置
scaleDownDelaySeconds: 300 # 5 分钟后缩容旧版本
scaleDownDelayRevisionLimit: 2 # 保留 2 个旧版本
# 反亲和性(确保蓝绿在不同节点)
antiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: {}
# 预升级分析
prePromotionAnalysis:
templates:
- templateName: smoke-test
args:
- name: service-name
value: myapp-preview
# 升级后分析
postPromotionAnalysis:
templates:
- templateName: load-test
- templateName: error-rate-check
args:
- name: service-name
value: myapp-active
Analysis Template
1. 冒烟测试:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: smoke-test
spec:
args:
- name: service-name
metrics:
- name: smoke-test
initialDelay: 10s
interval: 30s
count: 3
successCondition: result == "success"
provider:
job:
spec:
template:
spec:
containers:
- name: smoke-test
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
# 健康检查
curl -f http://{{args.service-name}}/health || exit 1
# API 测试
curl -f http://{{args.service-name}}/api/test || exit 1
# 登录测试
TOKEN=$(curl -X POST http://{{args.service-name}}/api/login \
-d '{"username":"test","password":"test"}' | jq -r '.token')
if [ -z "$TOKEN" ]; then
echo "Login failed"
exit 1
fi
echo "success"
restartPolicy: Never
backoffLimit: 1
2. 错误率检查:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: error-rate-check
spec:
args:
- name: service-name
metrics:
- name: error-rate
initialDelay: 30s
interval: 60s
count: 5
successCondition: result < 0.05 # 错误率低于 5%
failureLimit: 2
provider:
prometheus:
address: http://prometheus.monitoring:9090
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status=~"5.."
}[5m]))
/
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[5m]))
3. 性能测试:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: load-test
spec:
args:
- name: service-name
metrics:
- name: load-test
initialDelay: 60s
interval: 120s
count: 3
successCondition: result.p95 < 500 # P95 延迟 < 500ms
provider:
job:
spec:
template:
spec:
containers:
- name: k6-load-test
image: grafana/k6:latest
command:
- k6
- run
- -
stdin: |
import http from 'k6/http';
import { check } from 'k6';
export let options = {
stages: [
{ duration: '30s', target: 50 },
{ duration: '1m', target: 100 },
{ duration: '30s', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<500'],
},
};
export default function () {
let res = http.get('http://{{args.service-name}}/api/test');
check(res, {
'status is 200': (r) => r.status === 200,
});
}
restartPolicy: Never
使用 Argo Rollouts 插件
# 查看 Rollout 状态
kubectl argo rollouts get rollout myapp -w
# 促进升级(切换流量)
kubectl argo rollouts promote myapp
# 中止发布
kubectl argo rollouts abort myapp
# 重试
kubectl argo rollouts retry rollout myapp
# 重启
kubectl argo rollouts restart rollout myapp
# 查看历史
kubectl argo rollouts history rollout myapp
# 回滚
kubectl argo rollouts undo rollout myapp --to-revision=2
Argo Rollouts Dashboard
# 安装 Dashboard
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/dashboard-install.yaml
# 端口转发
kubectl port-forward -n argo-rollouts svc/argo-rollouts-dashboard 3100:3100
# 访问: http://localhost:3100
Flagger 蓝绿发布
Flagger 是另一个流行的渐进式交付工具。
安装 Flagger
# 添加 Helm 仓库
helm repo add flagger https://flagger.app
# 安装 Flagger(Nginx Ingress)
helm upgrade -i flagger flagger/flagger \
--namespace ingress-nginx \
--set meshProvider=nginx \
--set metricsServer=http://prometheus.monitoring:9090
# 安装 Grafana Dashboard
helm upgrade -i flagger-grafana flagger/grafana \
--namespace ingress-nginx \
--set url=http://prometheus.monitoring:9090
Flagger Canary 资源(蓝绿模式)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
namespace: production
spec:
# 部署引用
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
# 自动扩缩容
autoscalerRef:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
name: myapp
# Service 配置
service:
port: 80
targetPort: 8080
portDiscovery: true
# 蓝绿发布策略
analysis:
interval: 1m
threshold: 5
iterations: 10
# 蓝绿模式
sessionAffinity:
cookieName: flagger-cookie
maxAge: 21600
# 指标
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
# Webhooks
webhooks:
- name: load-test
type: pre-rollout
url: http://flagger-loadtester.production/
timeout: 5s
metadata:
type: bash
cmd: "curl -sd 'anon' http://myapp-canary/token | jq ."
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.production/
timeout: 10s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary/"
- name: promotion-gate
type: confirm-promotion
url: http://flagger-loadtester.production/gate/approve
Flagger Loadtester
apiVersion: apps/v1
kind: Deployment
metadata:
name: flagger-loadtester
namespace: production
spec:
selector:
matchLabels:
app: flagger-loadtester
template:
metadata:
labels:
app: flagger-loadtester
spec:
containers:
- name: loadtester
image: ghcr.io/fluxcd/flagger-loadtester:latest
ports:
- containerPort: 8080
command:
- ./loadtester
- -port=8080
- -log-level=info
- -timeout=1h
---
apiVersion: v1
kind: Service
metadata:
name: flagger-loadtester
namespace: production
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: flagger-loadtester
Spinnaker 蓝绿发布
Spinnaker 是 Netflix 开源的持续交付平台。
Spinnaker Pipeline 配置
{
"application": "myapp",
"name": "Blue-Green Deployment",
"stages": [
{
"type": "deployManifest",
"name": "Deploy Green",
"account": "my-k8s-account",
"cloudProvider": "kubernetes",
"manifests": [
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"name": "myapp-green"
},
"spec": {
"replicas": 3,
"selector": {
"matchLabels": {
"app": "myapp",
"version": "green"
}
},
"template": {
"metadata": {
"labels": {
"app": "myapp",
"version": "green"
}
},
"spec": {
"containers": [
{
"name": "myapp",
"image": "${ parameters.image }",
"ports": [
{
"containerPort": 8080
}
]
}
]
}
}
}
}
],
"source": "text"
},
{
"type": "manualJudgment",
"name": "Manual Approval",
"instructions": "Review green environment before switching traffic",
"judgmentInputs": [
{
"value": "approve"
},
{
"value": "reject"
}
]
},
{
"type": "patchManifest",
"name": "Switch Traffic to Green",
"account": "my-k8s-account",
"cloudProvider": "kubernetes",
"manifestName": "service myapp-service",
"patchBody": [
{
"op": "replace",
"path": "/spec/selector/version",
"value": "green"
}
]
},
{
"type": "wait",
"name": "Monitor Green",
"waitTime": 300
},
{
"type": "deleteManifest",
"name": "Delete Blue",
"account": "my-k8s-account",
"cloudProvider": "kubernetes",
"manifestName": "deployment myapp-blue"
}
],
"triggers": [
{
"type": "webhook",
"source": "github",
"enabled": true
}
],
"parameters": [
{
"name": "image",
"default": "myregistry/myapp:latest",
"description": "Docker image to deploy"
}
]
}
GitLab CI/CD 蓝绿发布
.gitlab-ci.yml
stages:
- build
- deploy-green
- test-green
- switch-traffic
- cleanup
variables:
KUBE_NAMESPACE: production
APP_NAME: myapp
build:
stage: build
image: docker:latest
services:
- docker:dind
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
only:
- main
deploy-green:
stage: deploy-green
image: bitnami/kubectl:latest
script:
# 确定当前颜色
- CURRENT_COLOR=$(kubectl get service $APP_NAME-service -n $KUBE_NAMESPACE -o jsonpath='{.spec.selector.version}' || echo "blue")
- |
if [ "$CURRENT_COLOR" == "blue" ]; then
NEW_COLOR="green"
else
NEW_COLOR="blue"
fi
- echo "Current=$CURRENT_COLOR, New=$NEW_COLOR"
- echo $NEW_COLOR > .color
# 部署新颜色
- kubectl set image deployment/$APP_NAME-$NEW_COLOR $APP_NAME=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA -n $KUBE_NAMESPACE
- kubectl rollout status deployment/$APP_NAME-$NEW_COLOR -n $KUBE_NAMESPACE --timeout=5m
artifacts:
paths:
- .color
only:
- main
test-green:
stage: test-green
image: curlimages/curl:latest
script:
- NEW_COLOR=$(cat .color)
- echo "Testing $NEW_COLOR environment"
# 冒烟测试
- curl -f http://$APP_NAME-$NEW_COLOR-test.$KUBE_NAMESPACE.svc.cluster.local/health || exit 1
- curl -f http://$APP_NAME-$NEW_COLOR-test.$KUBE_NAMESPACE.svc.cluster.local/api/test || exit 1
# 性能测试
- |
for i in {1..100}; do
curl -s -o /dev/null -w "%{http_code}\n" http://$APP_NAME-$NEW_COLOR-test.$KUBE_NAMESPACE.svc.cluster.local/
done | grep -v 200 | wc -l > errors.txt
- |
if [ $(cat errors.txt) -gt 5 ]; then
echo "Too many errors"
exit 1
fi
dependencies:
- deploy-green
only:
- main
switch-traffic:
stage: switch-traffic
image: bitnami/kubectl:latest
script:
- NEW_COLOR=$(cat .color)
- echo "Switching traffic to $NEW_COLOR"
- kubectl patch service $APP_NAME-service -n $KUBE_NAMESPACE -p "{\"spec\":{\"selector\":{\"version\":\"$NEW_COLOR\"}}}"
# 验证
- sleep 10
- kubectl get service $APP_NAME-service -n $KUBE_NAMESPACE -o yaml
dependencies:
- test-green
when: manual
only:
- main
cleanup:
stage: cleanup
image: bitnami/kubectl:latest
script:
- NEW_COLOR=$(cat .color)
- |
if [ "$NEW_COLOR" == "blue" ]; then
OLD_COLOR="green"
else
OLD_COLOR="blue"
fi
- echo "Cleaning up $OLD_COLOR environment"
- kubectl delete service $APP_NAME-$NEW_COLOR-test -n $KUBE_NAMESPACE || true
# 保留旧版本 7 天
- sleep 604800 # 7 days
- kubectl delete deployment $APP_NAME-$OLD_COLOR -n $KUBE_NAMESPACE
dependencies:
- switch-traffic
when: manual
only:
- main
Jenkins Pipeline 蓝绿发布
Jenkinsfile
pipeline {
agent any
parameters {
string(name: 'IMAGE_TAG', defaultValue: 'latest', description: 'Docker image tag')
choice(name: 'ENVIRONMENT', choices: ['dev', 'staging', 'production'], description: 'Target environment')
}
environment {
APP_NAME = 'myapp'
KUBE_NAMESPACE = "${params.ENVIRONMENT}"
REGISTRY = 'myregistry'
}
stages {
stage('Build') {
steps {
script {
docker.build("${REGISTRY}/${APP_NAME}:${params.IMAGE_TAG}")
docker.withRegistry('https://myregistry', 'docker-credentials') {
docker.image("${REGISTRY}/${APP_NAME}:${params.IMAGE_TAG}").push()
}
}
}
}
stage('Determine Color') {
steps {
script {
def currentColor = sh(
script: "kubectl get service ${APP_NAME}-service -n ${KUBE_NAMESPACE} -o jsonpath='{.spec.selector.version}' || echo 'blue'",
returnStdout: true
).trim()
env.CURRENT_COLOR = currentColor
env.NEW_COLOR = currentColor == 'blue' ? 'green' : 'blue'
echo "Current: ${env.CURRENT_COLOR}, New: ${env.NEW_COLOR}"
}
}
}
stage('Deploy Green') {
steps {
script {
sh """
kubectl set image deployment/${APP_NAME}-${env.NEW_COLOR} \
${APP_NAME}=${REGISTRY}/${APP_NAME}:${params.IMAGE_TAG} \
-n ${KUBE_NAMESPACE}
kubectl rollout status deployment/${APP_NAME}-${env.NEW_COLOR} \
-n ${KUBE_NAMESPACE} --timeout=5m
"""
}
}
}
stage('Test Green') {
steps {
script {
sh """
kubectl run test-pod --rm -i --restart=Never \
--image=curlimages/curl:latest \
-n ${KUBE_NAMESPACE} -- \
curl -f http://${APP_NAME}-${env.NEW_COLOR}:80/health
"""
}
}
}
stage('Approval') {
when {
expression { params.ENVIRONMENT == 'production' }
}
steps {
input message: "Switch traffic to ${env.NEW_COLOR}?",
ok: 'Deploy',
submitter: 'ops-team'
}
}
stage('Switch Traffic') {
steps {
script {
sh """
kubectl patch service ${APP_NAME}-service \
-n ${KUBE_NAMESPACE} \
-p '{"spec":{"selector":{"version":"${env.NEW_COLOR}"}}}'
"""
// 监控
sleep 60
}
}
}
stage('Cleanup') {
steps {
script {
timeout(time: 7, unit: 'DAYS') {
input message: "Delete ${env.CURRENT_COLOR} environment?",
ok: 'Delete'
}
sh "kubectl delete deployment ${APP_NAME}-${env.CURRENT_COLOR} -n ${KUBE_NAMESPACE}"
}
}
}
}
post {
failure {
script {
// 回滚
sh """
kubectl patch service ${APP_NAME}-service \
-n ${KUBE_NAMESPACE} \
-p '{"spec":{"selector":{"version":"${env.CURRENT_COLOR}"}}}'
"""
}
// 发送通知
emailext(
subject: "Pipeline Failed: ${env.JOB_NAME}",
body: "Build ${env.BUILD_NUMBER} failed. Please check Jenkins.",
to: 'ops-team@example.com'
)
}
success {
emailext(
subject: "Pipeline Success: ${env.JOB_NAME}",
body: "Deployed ${params.IMAGE_TAG} to ${params.ENVIRONMENT}",
to: 'ops-team@example.com'
)
}
}
}
监控和可观测性集成
Prometheus 告警规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: blue-green-alerts
spec:
groups:
- name: blue-green-deployment
interval: 30s
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5..", app="myapp"}[5m]))
/
sum(rate(http_requests_total{app="myapp"}[5m]))
> 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value | humanizePercentage }}"
- alert: HighLatency
expr: |
histogram_quantile(0.95,
sum(rate(http_request_duration_seconds_bucket{app="myapp"}[5m])) by (le)
) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "P95 latency is {{ $value }}s"
小结
自动化工具对比:
| 工具 | 优势 | 劣势 | 适用场景 |
|---|---|---|---|
| Argo Rollouts | K8s 原生、功能强大 | 学习曲线 | 所有场景 |
| Flagger | 自动化高、支持多网格 | 依赖 Prometheus | 服务网格环境 |
| Spinnaker | 功能全面、多云支持 | 复杂、资源消耗大 | 企业级、多云 |
| GitLab CI/CD | 集成度高、易用 | 灵活性有限 | GitLab 用户 |
| Jenkins | 高度定制、生态丰富 | 需要自己实现 | 传统企业 |
选择建议:
- Kubernetes 原生: Argo Rollouts
- 服务网格: Flagger + Istio
- 企业级多云: Spinnaker
- 快速上手: GitLab CI/CD
- 高度定制: Jenkins
掌握这些工具,可以实现完全自动化的蓝绿发布流程!