Prometheus 监控系统
Prometheus 监控系统
Prometheus 是云原生监控的事实标准。
架构
┌─────────────┐ Pull ┌──────────────┐
│ Application │◄─────────────│ Prometheus │
│ /metrics │ │ Server │
└─────────────┘ └──────┬───────┘
│
┌───────────────────────┼──────────┐
│ │ │
┌──────▼──────┐ ┌──────▼────┐ ┌──▼────┐
│ Grafana │ │Alertmanag │ │ TSDB │
│ (可视化) │ │er (告警) │ │ │
└─────────────┘ └───────────┘ └───────┘
安装 kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
ServiceMonitor 配置
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-monitor
namespace: production
labels:
release: prometheus
spec:
selector:
matchLabels:
app: myapp
endpoints:
- port: http
path: /metrics
interval: 30s
PrometheusRule 告警
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: myapp-alerts
namespace: monitoring
spec:
groups:
- name: myapp.rules
interval: 30s
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status_code=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "错误率过高"
常用 PromQL
# CPU 使用率
rate(container_cpu_usage_seconds_total[5m])
# 内存使用率
container_memory_usage_bytes / container_spec_memory_limit_bytes
# HTTP QPS
rate(http_requests_total[5m])
# P95 延迟
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
小结
✅ Prometheus 安装和配置
✅ ServiceMonitor 指标采集
✅ PrometheusRule 告警规则
✅ PromQL 常用查询
下一节:Grafana 可视化。