HPA - 自动扩缩容

HPA - 自动扩缩容

什么是 HPA

HPA(Horizontal Pod Autoscaler)根据资源使用情况自动调整 Pod 副本数量。

工作原理

1. HPA 定期查询指标(默认 15 秒)
2. 计算所需副本数
3. 调整 Deployment/StatefulSet 的 replicas
4. 等待下一个周期

计算公式

期望副本数 = ceil[当前副本数 * (当前指标 / 目标指标)]

前提条件

安装 Metrics Server

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# 验证
kubectl get deployment metrics-server -n kube-system
kubectl top nodes

配置资源 Requests

HPA 需要 Pod 设置了 CPU/内存 requests:

resources:
  requests:
    cpu: 100m
    memory: 128Mi

基于 CPU 的 HPA

创建 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  replicas: 1
  selector:
    matchLabels:
      app: php-apache
  template:
    metadata:
      labels:
        app: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 200m
          limits:
            cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
spec:
  ports:
  - port: 80
  selector:
    app: php-apache

创建 HPA

方法一:命令行

kubectl autoscale deployment php-apache \
  --cpu-percent=50 \
  --min=1 \
  --max=10

方法二:YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

测试扩容

# 查看 HPA
kubectl get hpa

# 输出
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         10        1          1m

# 施加负载
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

# 观察扩容
kubectl get hpa -w

# 输出
NAME         REFERENCE               TARGETS    MINPODS   MAXPODS   REPLICAS
php-apache   Deployment/php-apache   0%/50%     1         10        1
php-apache   Deployment/php-apache   250%/50%   1         10        1
php-apache   Deployment/php-apache   250%/50%   1         10        5
php-apache   Deployment/php-apache   50%/50%    1         10        5

基于内存的 HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

多指标 HPA

同时基于 CPU 和内存:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multi-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 30
      selectPolicy: Max

自定义指标 HPA

基于 Prometheus 指标

需要安装 Prometheus Adapter:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

基于外部指标

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: external-metric-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: queue_messages_ready
        selector:
          matchLabels:
            queue: worker-tasks
      target:
        type: AverageValue
        averageValue: "30"

HPA 行为控制

扩容行为

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0  # 立即扩容
    policies:
    - type: Percent
      value: 100  # 每次最多增加 100%
      periodSeconds: 15
    - type: Pods
      value: 4    # 每次最多增加 4 个
      periodSeconds: 15
    selectPolicy: Max  # 选择更激进的策略

缩容行为

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 5分钟稳定窗口
    policies:
    - type: Percent
      value: 50  # 每次最多减少 50%
      periodSeconds: 60
    - type: Pods
      value: 2   # 每次最多减少 2 个
      periodSeconds: 60
    selectPolicy: Min  # 选择更保守的策略

实战示例

Web 应用自动扩缩容

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: webapp:v1
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Pods
        value: 2
        periodSeconds: 30

API 服务扩缩容

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 3
  maxReplicas: 30
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

VPA - 垂直扩缩容

VPA(Vertical Pod Autoscaler)自动调整 Pod 的资源请求和限制。

安装 VPA

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

创建 VPA

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  updatePolicy:
    updateMode: "Auto"  # Auto | Initial | Off
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

更新模式

  • Off:仅提供建议
  • Initial:仅在创建时设置
  • Auto:自动更新并重启 Pod

Cluster Autoscaler

自动调整集群节点数量。

云平台配置

AWS

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler
  namespace: kube-system
data:
  cluster-autoscaler: |
    --cloud-provider=aws
    --nodes=1:10:k8s-worker-nodes

GCP

--cloud-provider=gce
--nodes=1:10:k8s-worker-pool

工作原理

1. 检测 Pending 的 Pod
2. 评估是否需要新节点
3. 向云平台请求创建节点
4. 等待节点就绪
5. 调度 Pod 到新节点

监控 HPA

查看 HPA 状态

# 列出 HPA
kubectl get hpa

# 详细信息
kubectl describe hpa webapp-hpa

# 实时监控
kubectl get hpa -w

# 查看事件
kubectl get events --field-selector involvedObject.name=webapp-hpa

HPA 指标

# 查看当前指标
kubectl get hpa webapp-hpa -o yaml

# 输出
status:
  conditions:
  - lastTransitionTime: "2024-01-08T03:00:00Z"
    message: "the HPA was able to successfully calculate a replica count"
    reason: ValidMetricFound
    status: "True"
    type: AbleToScale
  currentMetrics:
  - resource:
      current:
        averageUtilization: 45
        averageValue: 90m
      name: cpu
    type: Resource
  currentReplicas: 5
  desiredReplicas: 5

常用命令

# 创建 HPA
kubectl autoscale deployment webapp --cpu-percent=50 --min=1 --max=10

# 查看 HPA
kubectl get hpa
kubectl describe hpa <name>

# 编辑 HPA
kubectl edit hpa <name>

# 删除 HPA
kubectl delete hpa <name>

# 查看 Pod 资源使用
kubectl top pods
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory

# 查看节点资源
kubectl top nodes

最佳实践

1. 设置合理的最小副本数

minReplicas: 2  # 至少保证高可用

2. 设置合理的目标使用率

averageUtilization: 60  # 留有余地,不要设置过高

3. 配置资源 Requests

resources:
  requests:
    cpu: 200m    # 必须设置
    memory: 256Mi

4. 设置稳定窗口

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 避免频繁缩容

5. 监控和告警

# 设置告警
# - HPA 无法计算指标
# - HPA 达到最大副本数
# - Pod 资源使用率持续过高

6. 测试扩缩容

# 负载测试工具
# - Apache Bench
# - wrk
# - Locust

# 观察扩容过程
kubectl get hpa -w
kubectl get pods -w

7. HPA + VPA

不要同时对 CPU/内存使用 HPA 和 VPA。

建议:

  • HPA:用于 CPU
  • VPA:用于内存

常见问题

HPA 不工作

kubectl describe hpa <name>

# 可能原因
# 1. Metrics Server 未安装
kubectl get deployment metrics-server -n kube-system

# 2. Pod 未设置 requests
kubectl describe pod <pod-name> | grep -A 5 "Requests"

# 3. 指标获取失败
kubectl top pods

频繁扩缩容

调整稳定窗口:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 600  # 增加到 10 分钟

扩容慢

调整扩容策略:

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 100  # 更激进的扩容
      periodSeconds: 15

小结

HPA 是 Kubernetes 弹性伸缩的核心功能:

核心概念

  • HPA:水平扩缩容(调整副本数)
  • VPA:垂直扩缩容(调整资源配置)
  • CA:集群扩缩容(调整节点数)

工作原理

  • 定期检查指标
  • 计算期望副本数
  • 调整 Deployment

最佳实践

  • 设置合理的最小/最大副本数
  • 配置资源 requests
  • 设置稳定窗口
  • 监控和告警

注意事项

  • Metrics Server 是前提
  • 避免 HPA 和 VPA 冲突
  • 测试扩缩容行为

进阶篇完成!下一章开始实战篇。