应用部署和负载均衡
本章节介绍如何在 EKS 集群中部署微服务应用,包括 Deployment、Service、Ingress 配置,以及金丝雀发布、蓝绿部署等高级发布策略。
微服务架构设计
应用架构
示例电商平台架构:
互联网用户
↓
┌──────────────┐
│ Route 53 │
│ (DNS 路由) │
└──────┬───────┘
↓
┌──────────────┐
│ CloudFront │
│ (CDN) │
└──────┬───────┘
↓
┌──────────────┐
│ ALB │
│ (Ingress) │
└──────┬───────┘
↓
┌─────────────┴─────────────┐
↓ ↓
┌─────────────┐ ┌─────────────┐
│ Frontend │ │ API Gateway│
│ (Next.js) │ │ (BFF) │
└─────────────┘ └──────┬──────┘
↓
┌───────────────────┴────────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ User │ │ Product │ │ Order │
│ Service │ │ Service │ │ Service │
└────┬─────┘ └────┬─────┘ └────┬─────┘
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ RDS │ │ Redis │ │ RDS │
│(Users DB)│ │ (Cache) │ │(Orders DB│
└──────────┘ └──────────┘ └──────────┘
服务清单
核心服务:
1. frontend-service
├─ 框架:Next.js
├─ 副本数:3
├─ 资源:CPU 500m, Memory 512Mi
└─ 端口:3000
2. api-gateway-service
├─ 框架:Node.js (Express)
├─ 副本数:3
├─ 资源:CPU 1000m, Memory 1Gi
└─ 端口:8080
3. user-service
├─ 语言:Go
├─ 副本数:3
├─ 资源:CPU 500m, Memory 512Mi
├─ 端口:9001
└─ 数据库:RDS PostgreSQL
4. product-service
├─ 语言:Python (FastAPI)
├─ 副本数:3
├─ 资源:CPU 500m, Memory 512Mi
├─ 端口:9002
└─ 缓存:Redis
5. order-service
├─ 语言:Java (Spring Boot)
├─ 副本数:3
├─ 资源:CPU 1000m, Memory 1Gi
├─ 端口:9003
└─ 数据库:RDS PostgreSQL
Kubernetes 资源配置
Namespace 组织
# namespaces.yaml
---
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
team: platform
---
apiVersion: v1
kind: Namespace
metadata:
name: staging
labels:
environment: staging
team: platform
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
environment: production
purpose: monitoring
ConfigMap 配置
# api-gateway-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: api-gateway-config
namespace: production
data:
# 应用配置
NODE_ENV: "production"
LOG_LEVEL: "info"
# 服务发现
USER_SERVICE_URL: "http://user-service:9001"
PRODUCT_SERVICE_URL: "http://product-service:9002"
ORDER_SERVICE_URL: "http://order-service:9003"
# 功能开关
ENABLE_CACHE: "true"
ENABLE_RATE_LIMIT: "true"
# 超时配置
REQUEST_TIMEOUT: "30000"
CIRCUIT_BREAKER_TIMEOUT: "5000"
# 应用配置文件
app.json: |
{
"rateLimit": {
"windowMs": 900000,
"max": 100
},
"cors": {
"origin": ["https://example.com"],
"credentials": true
}
}
Secret 管理(使用 External Secrets)
安装 External Secrets Operator:
#!/bin/bash
# install-external-secrets.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
echo "安装 External Secrets Operator..."
# 1. 创建 IAM 策略
cat > external-secrets-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
],
"Resource": "arn:aws:secretsmanager:*:*:secret:production/*"
}
]
}
EOF
aws iam create-policy \
--policy-name ExternalSecretsPolicy \
--policy-document file://external-secrets-policy.json
# 2. 创建 IRSA
eksctl create iamserviceaccount \
--name external-secrets \
--namespace kube-system \
--cluster $CLUSTER_NAME \
--attach-policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/ExternalSecretsPolicy \
--approve \
--region $REGION
# 3. 安装 Operator
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets \
external-secrets/external-secrets \
-n kube-system \
--set installCRDs=true \
--set serviceAccount.create=false \
--set serviceAccount.name=external-secrets
echo "✓ External Secrets Operator 已安装"
rm -f external-secrets-policy.json
配置 SecretStore:
# secretstore.yaml
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secretsmanager
namespace: production
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets
namespace: kube-system
创建 ExternalSecret:
# database-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
kind: SecretStore
target:
name: database-credentials
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: production/database
property: username
- secretKey: password
remoteRef:
key: production/database
property: password
- secretKey: host
remoteRef:
key: production/database
property: host
- secretKey: database
remoteRef:
key: production/database
property: database
Deployment 配置
User Service Deployment:
# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: production
labels:
app: user-service
version: v1
spec:
replicas: 3
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9001"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: user-service
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# 亲和性配置(跨 AZ 分布)
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- user-service
topologyKey: topology.kubernetes.io/zone
# 容器配置
containers:
- name: user-service
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/user-service:v1.2.3
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 9001
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
# 环境变量
env:
- name: PORT
value: "9001"
- name: LOG_LEVEL
value: "info"
- name: DB_HOST
valueFrom:
secretKeyRef:
name: database-credentials
key: host
- name: DB_USER
valueFrom:
secretKeyRef:
name: database-credentials
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: database-credentials
key: password
- name: DB_NAME
valueFrom:
secretKeyRef:
name: database-credentials
key: database
# 资源限制
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
# 健康检查
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
# 启动探针(慢启动应用)
startupProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 30
# 安全上下文
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# 挂载卷
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
# 卷定义
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
# 终止宽限期
terminationGracePeriodSeconds: 60
Service 配置
# user-service-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: production
labels:
app: user-service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
type: ClusterIP
sessionAffinity: None
selector:
app: user-service
ports:
- name: http
protocol: TCP
port: 9001
targetPort: http
- name: metrics
protocol: TCP
port: 9090
targetPort: metrics
HorizontalPodAutoscaler (HPA)
# user-service-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: user-service
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-service
minReplicas: 3
maxReplicas: 10
metrics:
# CPU 利用率
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# 内存利用率
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# 自定义指标(每秒请求数)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 1
periodSeconds: 60
selectPolicy: Min
PodDisruptionBudget (PDB)
# user-service-pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: user-service
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: user-service
Ingress 和 ALB 配置
Ingress 资源
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: production-ingress
namespace: production
annotations:
# ALB 配置
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/backend-protocol: HTTP
# 监听器
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
# SSL/TLS
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/xxx
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01
# 健康检查
alb.ingress.kubernetes.io/healthcheck-path: /health
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '15'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '5'
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
# 访问日志
alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=production-alb-logs,access_logs.s3.prefix=ingress
# 标签
alb.ingress.kubernetes.io/tags: Environment=production,Team=platform
# 安全组
alb.ingress.kubernetes.io/security-groups: sg-0123456789abcdef0
# 子网
alb.ingress.kubernetes.io/subnets: subnet-public-1a,subnet-public-1b,subnet-public-1c
spec:
rules:
# API 流量
- host: api.example.com
http:
paths:
- path: /api/users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 9001
- path: /api/products
pathType: Prefix
backend:
service:
name: product-service
port:
number: 9002
- path: /api/orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 9003
# 前端流量
- host: www.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 3000
多环境 Ingress
生产环境:
# ingress-production.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: production-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/group.name: production
alb.ingress.kubernetes.io/group.order: '10'
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-gateway-service
port:
number: 8080
金丝雀环境:
# ingress-canary.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: canary-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/group.name: production
alb.ingress.kubernetes.io/group.order: '5'
alb.ingress.kubernetes.io/conditions.canary: |
[{"field":"http-header","httpHeaderConfig":{"httpHeaderName":"X-Canary","values":["true"]}}]
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-gateway-service-canary
port:
number: 8080
高级发布策略
滚动更新(默认)
配置示例:
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 额外创建的 Pod 数量
maxUnavailable: 0 # 不可用的 Pod 数量
发布脚本:
#!/bin/bash
# rolling-update.sh
NAMESPACE="production"
DEPLOYMENT="user-service"
IMAGE="123456789012.dkr.ecr.us-east-1.amazonaws.com/user-service:v1.2.4"
echo "执行滚动更新..."
echo "部署: $DEPLOYMENT"
echo "镜像: $IMAGE"
# 更新镜像
kubectl set image deployment/$DEPLOYMENT \
user-service=$IMAGE \
-n $NAMESPACE
# 监控发布进度
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE
# 检查 Pod 状态
kubectl get pods -n $NAMESPACE -l app=$DEPLOYMENT
echo "✓ 滚动更新完成"
金丝雀发布
Istio VirtualService:
# canary-virtualservice.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: user-service
namespace: production
spec:
hosts:
- user-service
http:
- match:
- headers:
X-Canary:
exact: "true"
route:
- destination:
host: user-service
subset: canary
- route:
- destination:
host: user-service
subset: stable
weight: 90
- destination:
host: user-service
subset: canary
weight: 10
DestinationRule:
# canary-destinationrule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: user-service
namespace: production
spec:
host: user-service
subsets:
- name: stable
labels:
version: v1
- name: canary
labels:
version: v2
金丝雀发布脚本:
#!/bin/bash
# canary-deployment.sh
NAMESPACE="production"
SERVICE="user-service"
STABLE_VERSION="v1.2.3"
CANARY_VERSION="v1.2.4"
echo "================================================"
echo "金丝雀发布: $SERVICE"
echo "稳定版本: $STABLE_VERSION"
echo "金丝雀版本: $CANARY_VERSION"
echo "================================================"
# 1. 部署金丝雀版本(10% 流量)
echo ""
echo "1. 部署金丝雀版本..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${SERVICE}-canary
namespace: $NAMESPACE
spec:
replicas: 1
selector:
matchLabels:
app: $SERVICE
version: v2
template:
metadata:
labels:
app: $SERVICE
version: v2
spec:
containers:
- name: $SERVICE
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/${SERVICE}:${CANARY_VERSION}
# ... 其他配置与稳定版本相同
EOF
# 2. 配置流量分割
echo ""
echo "2. 配置流量分割(10% 金丝雀)..."
kubectl apply -f canary-virtualservice.yaml
kubectl apply -f canary-destinationrule.yaml
# 3. 监控指标
echo ""
echo "3. 监控金丝雀指标..."
echo " 查看 Grafana 仪表盘"
echo " 命令: kubectl port-forward -n monitoring svc/grafana 3000:80"
read -p "金丝雀版本正常吗?(y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
# 4. 逐步增加流量
echo ""
echo "4. 增加金丝雀流量到 50%..."
kubectl patch virtualservice $SERVICE -n $NAMESPACE --type merge -p '
{
"spec": {
"http": [{
"route": [
{"destination": {"host": "'$SERVICE'", "subset": "stable"}, "weight": 50},
{"destination": {"host": "'$SERVICE'", "subset": "canary"}, "weight": 50}
]
}]
}
}'
sleep 300 # 等待 5 分钟观察
read -p "继续推进到 100%?(y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
# 5. 完全切换
echo ""
echo "5. 完全切换到新版本..."
kubectl set image deployment/$SERVICE \
$SERVICE=123456789012.dkr.ecr.us-east-1.amazonaws.com/${SERVICE}:${CANARY_VERSION} \
-n $NAMESPACE
kubectl delete deployment ${SERVICE}-canary -n $NAMESPACE
echo "✓ 金丝雀发布成功"
fi
else
# 回滚
echo ""
echo "回滚金丝雀部署..."
kubectl delete deployment ${SERVICE}-canary -n $NAMESPACE
kubectl delete virtualservice $SERVICE -n $NAMESPACE
echo "✓ 已回滚"
fi
echo "================================================"
蓝绿部署
部署脚本:
#!/bin/bash
# blue-green-deployment.sh
NAMESPACE="production"
SERVICE="user-service"
CURRENT_VERSION="blue" # 或 "green"
NEW_VERSION="green" # 或 "blue"
IMAGE_TAG="v1.2.4"
echo "================================================"
echo "蓝绿部署: $SERVICE"
echo "当前版本: $CURRENT_VERSION"
echo "新版本: $NEW_VERSION"
echo "================================================"
# 1. 部署新版本(绿色)
echo ""
echo "1. 部署 $NEW_VERSION 环境..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${SERVICE}-${NEW_VERSION}
namespace: $NAMESPACE
spec:
replicas: 3
selector:
matchLabels:
app: $SERVICE
version: $NEW_VERSION
template:
metadata:
labels:
app: $SERVICE
version: $NEW_VERSION
spec:
containers:
- name: $SERVICE
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/${SERVICE}:${IMAGE_TAG}
# ... 其他配置
EOF
# 2. 等待新版本就绪
echo ""
echo "2. 等待新版本就绪..."
kubectl rollout status deployment/${SERVICE}-${NEW_VERSION} -n $NAMESPACE
# 3. 运行冒烟测试
echo ""
echo "3. 运行冒烟测试..."
NEW_POD=$(kubectl get pods -n $NAMESPACE -l app=$SERVICE,version=$NEW_VERSION -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n $NAMESPACE $NEW_POD -- curl -s http://localhost:9001/health
# 4. 切换流量
echo ""
read -p "切换流量到 $NEW_VERSION?(y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "切换 Service 选择器..."
kubectl patch service $SERVICE -n $NAMESPACE -p '
{
"spec": {
"selector": {
"app": "'$SERVICE'",
"version": "'$NEW_VERSION'"
}
}
}'
echo "✓ 流量已切换到 $NEW_VERSION"
# 5. 清理旧版本
echo ""
read -p "删除 $CURRENT_VERSION 部署?(y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
kubectl delete deployment ${SERVICE}-${CURRENT_VERSION} -n $NAMESPACE
echo "✓ 旧版本已删除"
fi
else
echo "保持当前版本"
kubectl delete deployment ${SERVICE}-${NEW_VERSION} -n $NAMESPACE
fi
echo "================================================"
部署验证
健康检查
#!/bin/bash
# verify-deployment.sh
NAMESPACE="production"
SERVICE="user-service"
echo "================================================"
echo "验证部署: $SERVICE"
echo "================================================"
# 1. Deployment 状态
echo ""
echo "1. Deployment 状态"
kubectl get deployment $SERVICE -n $NAMESPACE
# 2. Pod 状态
echo ""
echo "2. Pod 状态"
kubectl get pods -n $NAMESPACE -l app=$SERVICE -o wide
# 3. 副本数
echo ""
echo "3. 副本状态"
DESIRED=$(kubectl get deployment $SERVICE -n $NAMESPACE -o jsonpath='{.spec.replicas}')
READY=$(kubectl get deployment $SERVICE -n $NAMESPACE -o jsonpath='{.status.readyReplicas}')
echo "期望: $DESIRED, 就绪: $READY"
if [ "$DESIRED" == "$READY" ]; then
echo "✓ 所有副本就绪"
else
echo "✗ 副本数不匹配"
exit 1
fi
# 4. 服务端点
echo ""
echo "4. Service 端点"
kubectl get endpoints $SERVICE -n $NAMESPACE
# 5. 日志检查
echo ""
echo "5. 最近日志(最新 Pod)"
LATEST_POD=$(kubectl get pods -n $NAMESPACE -l app=$SERVICE --sort-by=.metadata.creationTimestamp -o jsonpath='{.items[-1].metadata.name}')
kubectl logs -n $NAMESPACE $LATEST_POD --tail=20
# 6. 健康检查
echo ""
echo "6. 健康检查"
kubectl exec -n $NAMESPACE $LATEST_POD -- curl -s http://localhost:9001/health | jq
echo ""
echo "================================================"
echo "验证完成"
echo "================================================"
最佳实践总结
1. 资源配置
✓ 合理设置 requests 和 limits
✓ 配置 HPA 自动扩展
✓ 使用 PDB 保护应用可用性
✓ 配置亲和性和反亲和性
✓ 使用 Spot 实例节点组(非关键服务)
2. 健康检查
✓ 配置 liveness、readiness、startup 探针
✓ 合理设置超时和失败阈值
✓ 避免探针影响性能
✓ 使用专门的健康检查端点
✓ 记录健康检查失败原因
3. 配置管理
✓ 使用 ConfigMap 管理配置
✓ 使用 External Secrets 管理敏感信息
✓ 配置与代码分离
✓ 环境特定的配置
✓ 版本控制配置文件
4. 发布策略
✓ 默认使用滚动更新
✓ 关键服务使用金丝雀发布
✓ 蓝绿部署用于零停机切换
✓ 始终有回滚计划
✓ 自动化发布流程
5. 监控和日志
✓ 添加 Prometheus 指标
✓ 结构化日志输出
✓ 配置访问日志
✓ 设置告警规则
✓ 使用分布式追踪
下一步: 继续学习 数据库和缓存层 章节。