HPA - 自动扩缩容
HPA - 自动扩缩容
什么是 HPA
HPA(Horizontal Pod Autoscaler)根据资源使用情况自动调整 Pod 副本数量。
工作原理
1. HPA 定期查询指标(默认 15 秒)
2. 计算所需副本数
3. 调整 Deployment/StatefulSet 的 replicas
4. 等待下一个周期
计算公式
期望副本数 = ceil[当前副本数 * (当前指标 / 目标指标)]
前提条件
安装 Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# 验证
kubectl get deployment metrics-server -n kube-system
kubectl top nodes
配置资源 Requests
HPA 需要 Pod 设置了 CPU/内存 requests:
resources:
requests:
cpu: 100m
memory: 128Mi
基于 CPU 的 HPA
创建 Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
replicas: 1
selector:
matchLabels:
app: php-apache
template:
metadata:
labels:
app: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
limits:
cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
spec:
ports:
- port: 80
selector:
app: php-apache
创建 HPA
方法一:命令行
kubectl autoscale deployment php-apache \
--cpu-percent=50 \
--min=1 \
--max=10
方法二:YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
测试扩容
# 查看 HPA
kubectl get hpa
# 输出
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 1m
# 施加负载
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
# 观察扩容
kubectl get hpa -w
# 输出
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
php-apache Deployment/php-apache 0%/50% 1 10 1
php-apache Deployment/php-apache 250%/50% 1 10 1
php-apache Deployment/php-apache 250%/50% 1 10 5
php-apache Deployment/php-apache 50%/50% 1 10 5
基于内存的 HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
多指标 HPA
同时基于 CPU 和内存:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multi-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 4
periodSeconds: 30
selectPolicy: Max
自定义指标 HPA
基于 Prometheus 指标
需要安装 Prometheus Adapter:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
基于外部指标
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: external-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: worker-tasks
target:
type: AverageValue
averageValue: "30"
HPA 行为控制
扩容行为
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # 立即扩容
policies:
- type: Percent
value: 100 # 每次最多增加 100%
periodSeconds: 15
- type: Pods
value: 4 # 每次最多增加 4 个
periodSeconds: 15
selectPolicy: Max # 选择更激进的策略
缩容行为
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5分钟稳定窗口
policies:
- type: Percent
value: 50 # 每次最多减少 50%
periodSeconds: 60
- type: Pods
value: 2 # 每次最多减少 2 个
periodSeconds: 60
selectPolicy: Min # 选择更保守的策略
实战示例
Web 应用自动扩缩容
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
replicas: 2
selector:
matchLabels:
app: webapp
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: webapp:v1
ports:
- containerPort: 8080
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: 8080
readinessProbe:
httpGet:
path: /ready
port: 8080
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 2
periodSeconds: 30
API 服务扩缩容
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
VPA - 垂直扩缩容
VPA(Vertical Pod Autoscaler)自动调整 Pod 的资源请求和限制。
安装 VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
创建 VPA
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: webapp-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
updatePolicy:
updateMode: "Auto" # Auto | Initial | Off
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 2Gi
更新模式:
- Off:仅提供建议
- Initial:仅在创建时设置
- Auto:自动更新并重启 Pod
Cluster Autoscaler
自动调整集群节点数量。
云平台配置
AWS:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler
namespace: kube-system
data:
cluster-autoscaler: |
--cloud-provider=aws
--nodes=1:10:k8s-worker-nodes
GCP:
--cloud-provider=gce
--nodes=1:10:k8s-worker-pool
工作原理
1. 检测 Pending 的 Pod
2. 评估是否需要新节点
3. 向云平台请求创建节点
4. 等待节点就绪
5. 调度 Pod 到新节点
监控 HPA
查看 HPA 状态
# 列出 HPA
kubectl get hpa
# 详细信息
kubectl describe hpa webapp-hpa
# 实时监控
kubectl get hpa -w
# 查看事件
kubectl get events --field-selector involvedObject.name=webapp-hpa
HPA 指标
# 查看当前指标
kubectl get hpa webapp-hpa -o yaml
# 输出
status:
conditions:
- lastTransitionTime: "2024-01-08T03:00:00Z"
message: "the HPA was able to successfully calculate a replica count"
reason: ValidMetricFound
status: "True"
type: AbleToScale
currentMetrics:
- resource:
current:
averageUtilization: 45
averageValue: 90m
name: cpu
type: Resource
currentReplicas: 5
desiredReplicas: 5
常用命令
# 创建 HPA
kubectl autoscale deployment webapp --cpu-percent=50 --min=1 --max=10
# 查看 HPA
kubectl get hpa
kubectl describe hpa <name>
# 编辑 HPA
kubectl edit hpa <name>
# 删除 HPA
kubectl delete hpa <name>
# 查看 Pod 资源使用
kubectl top pods
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory
# 查看节点资源
kubectl top nodes
最佳实践
1. 设置合理的最小副本数
minReplicas: 2 # 至少保证高可用
2. 设置合理的目标使用率
averageUtilization: 60 # 留有余地,不要设置过高
3. 配置资源 Requests
resources:
requests:
cpu: 200m # 必须设置
memory: 256Mi
4. 设置稳定窗口
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 避免频繁缩容
5. 监控和告警
# 设置告警
# - HPA 无法计算指标
# - HPA 达到最大副本数
# - Pod 资源使用率持续过高
6. 测试扩缩容
# 负载测试工具
# - Apache Bench
# - wrk
# - Locust
# 观察扩容过程
kubectl get hpa -w
kubectl get pods -w
7. HPA + VPA
不要同时对 CPU/内存使用 HPA 和 VPA。
建议:
- HPA:用于 CPU
- VPA:用于内存
常见问题
HPA 不工作
kubectl describe hpa <name>
# 可能原因
# 1. Metrics Server 未安装
kubectl get deployment metrics-server -n kube-system
# 2. Pod 未设置 requests
kubectl describe pod <pod-name> | grep -A 5 "Requests"
# 3. 指标获取失败
kubectl top pods
频繁扩缩容
调整稳定窗口:
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 增加到 10 分钟
扩容慢
调整扩容策略:
behavior:
scaleUp:
policies:
- type: Percent
value: 100 # 更激进的扩容
periodSeconds: 15
小结
HPA 是 Kubernetes 弹性伸缩的核心功能:
核心概念:
- HPA:水平扩缩容(调整副本数)
- VPA:垂直扩缩容(调整资源配置)
- CA:集群扩缩容(调整节点数)
工作原理:
- 定期检查指标
- 计算期望副本数
- 调整 Deployment
最佳实践:
- 设置合理的最小/最大副本数
- 配置资源 requests
- 设置稳定窗口
- 监控和告警
注意事项:
- Metrics Server 是前提
- 避免 HPA 和 VPA 冲突
- 测试扩缩容行为
进阶篇完成!下一章开始实战篇。