运维和最佳实践
运维和最佳实践
本节介绍微服务应用的日常运维操作和生产环境最佳实践。
滚动更新
更新镜像版本
# 更新单个服务
kubectl set image deployment/product-service \
product-service=ecommerce/product-service:v1.1.0 \
-n ecommerce
# 查看滚动更新状态
kubectl rollout status deployment/product-service -n ecommerce
# 查看更新历史
kubectl rollout history deployment/product-service -n ecommerce
回滚到上一版本
# 回滚到上一个版本
kubectl rollout undo deployment/product-service -n ecommerce
# 回滚到指定版本
kubectl rollout undo deployment/product-service --to-revision=2 -n ecommerce
# 暂停/恢复滚动更新
kubectl rollout pause deployment/product-service -n ecommerce
kubectl rollout resume deployment/product-service -n ecommerce
扩缩容操作
手动扩缩容
# 扩容到 5 个副本
kubectl scale deployment product-service --replicas=5 -n ecommerce
# 查看副本数
kubectl get deployment product-service -n ecommerce
HPA 自动扩缩容
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: product-service-hpa
namespace: ecommerce
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: product-service
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
日志管理
查看日志
# 查看实时日志
kubectl logs -f product-service-abc123 -n ecommerce
# 查看最近 100 行
kubectl logs --tail=100 product-service-abc123 -n ecommerce
# 查看特定容器日志
kubectl logs product-service-abc123 -c product-service -n ecommerce
# 查看上一个容器日志(Pod 重启后)
kubectl logs product-service-abc123 --previous -n ecommerce
# 查看所有副本日志
kubectl logs -l app=product-service -n ecommerce --tail=50
日志聚合(使用 stern)
# 安装 stern
brew install stern
# 查看所有 product-service Pod 日志
stern product-service -n ecommerce
# 使用标签选择器
stern -l app=product-service -n ecommerce
# 查看多个命名空间
stern product-service --all-namespaces
监控和告警
ServiceMonitor 配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ecommerce-services
namespace: ecommerce
labels:
release: prometheus
spec:
selector:
matchLabels:
tier: backend
endpoints:
- port: http
path: /metrics
interval: 30s
PrometheusRule 告警
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: ecommerce-alerts
namespace: ecommerce
spec:
groups:
- name: ecommerce.rules
interval: 30s
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status_code=~"5.."}[5m])
/
rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "服务错误率过高"
description: "{{ $labels.app }} 错误率 {{ $value | humanizePercentage }}"
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total{namespace="ecommerce"}[15m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pod 频繁重启"
description: "{{ $labels.pod }} 在过去 15 分钟内重启了 {{ $value }} 次"
健康检查优化
Liveness Probe
livenessProbe:
httpGet:
path: /health
port: 8080
httpHeaders:
- name: X-Health-Check
value: "liveness"
initialDelaySeconds: 60 # 应用启动需要的时间
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # 失败 3 次后重启
successThreshold: 1
Readiness Probe
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3 # 失败 3 次后从负载均衡移除
successThreshold: 1
Startup Probe(慢启动应用)
startupProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 30 # 最多等待 300 秒启动
配置管理最佳实践
环境变量注入顺序
containers:
- name: app
# 1. 直接定义的环境变量
env:
- name: APP_ENV
value: "production"
# 2. 从 ConfigMap 注入所有键值
envFrom:
- configMapRef:
name: app-config
# 3. 从 Secret 注入特定值
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
配置热更新
# 更新 ConfigMap
kubectl create configmap app-config \
--from-file=config.json \
--dry-run=client -o yaml | kubectl apply -f -
# 重启 Pod 使配置生效
kubectl rollout restart deployment/product-service -n ecommerce
资源优化
资源请求和限制建议
resources:
requests:
# 设置为应用正常运行所需的最小资源
cpu: 200m # 0.2 核
memory: 256Mi # 256MB
limits:
# 设置为峰值时的资源上限(建议 2-4 倍 requests)
cpu: 500m # 0.5 核
memory: 512Mi # 512MB
QoS 类别
# Guaranteed(最高优先级)
# requests == limits
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m
memory: 512Mi
# Burstable(中等优先级)
# 设置 requests,limits > requests
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# BestEffort(最低优先级)
# 不设置 requests 和 limits
数据备份
MongoDB 备份脚本
apiVersion: batch/v1
kind: CronJob
metadata:
name: mongodb-backup
namespace: ecommerce
spec:
schedule: "0 2 * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mongo:6.0
command:
- /bin/sh
- -c
- |
DATE=$(date +%Y%m%d-%H%M%S)
mongodump \
--host=mongodb-0.mongodb \
--username=admin \
--password=$MONGO_PASSWORD \
--authenticationDatabase=admin \
--gzip \
--archive=/backup/dump-$DATE.gz
# 删除 7 天前的备份
find /backup -name "dump-*.gz" -mtime +7 -delete
env:
- name: MONGO_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: mongo-password
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
安全加固
Pod Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: product-service-policy
namespace: ecommerce
spec:
podSelector:
matchLabels:
app: product-service
policyTypes:
- Ingress
- Egress
ingress:
# 只允许来自 API Gateway 的请求
- from:
- podSelector:
matchLabels:
app: api-gateway
ports:
- protocol: TCP
port: 8080
egress:
# 允许访问 MongoDB
- to:
- podSelector:
matchLabels:
app: mongodb
ports:
- protocol: TCP
port: 27017
# 允许 DNS 查询
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
故障排查清单
# 1. 检查 Pod 状态
kubectl get pods -n ecommerce
kubectl describe pod <pod-name> -n ecommerce
# 2. 查看事件
kubectl get events -n ecommerce --sort-by='.lastTimestamp'
# 3. 查看日志
kubectl logs <pod-name> -n ecommerce --tail=100
# 4. 检查资源使用
kubectl top pods -n ecommerce
kubectl top nodes
# 5. 检查网络连接
kubectl run test -it --rm --image=busybox -n ecommerce -- sh
wget -O- http://product-service:8080/health
# 6. 检查配置
kubectl get configmap -n ecommerce
kubectl get secret -n ecommerce
# 7. 检查 RBAC 权限
kubectl auth can-i --list -n ecommerce
小结
本节介绍了微服务运维:
✅ 更新管理:滚动更新、回滚、金丝雀发布
✅ 扩缩容:手动扩容、HPA 自动扩缩容
✅ 日志管理:日志查看、聚合、分析
✅ 监控告警:ServiceMonitor、PrometheusRule
✅ 资源优化:QoS、资源限制、性能调优
✅ 安全加固:Security Context、Network Policy
✅ 备份恢复:定时备份、数据恢复
至此,微服务部署完整教程全部完成!🎉