DaemonSet 与 Job
DaemonSet 与 Job
本章介绍两种特殊的工作负载:DaemonSet 和 Job。
DaemonSet - 守护进程
DaemonSet 确保在每个节点上运行一个 Pod 副本。
使用场景
- 日志收集:Fluentd、Filebeat
- 监控代理:Node Exporter、cAdvisor
- 网络代理:kube-proxy、Calico
- 存储守护进程:Ceph、GlusterFS
基础示例
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
labels:
app: fluentd
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
containers:
- name: fluentd
image: fluentd:v1.14
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
kubectl apply -f daemonset.yaml
# 查看
kubectl get daemonset
kubectl get pods -l app=fluentd -o wide
# 输出显示每个节点都有一个 Pod
NAME READY STATUS NODE
fluentd-abcd 1/1 Running node-1
fluentd-efgh 1/1 Running node-2
fluentd-ijkl 1/1 Running node-3
节点选择器
只在特定节点运行:
spec:
template:
spec:
nodeSelector:
disktype: ssd # 只在有 ssd 标签的节点运行
节点亲和性
更灵活的节点选择:
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
容忍污点
在有污点的节点上运行:
spec:
template:
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule # 允许在 master 节点运行
更新策略
RollingUpdate(默认)
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # 最多 1 个 Pod 不可用
OnDelete
spec:
updateStrategy:
type: OnDelete # 手动删除 Pod 才更新
实战示例:Node Exporter
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter:v1.5.0
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
resources:
limits:
cpu: 200m
memory: 100Mi
requests:
cpu: 100m
memory: 50Mi
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
Job - 批处理任务
Job 创建一个或多个 Pod 并确保指定数量的 Pod 成功终止。
基础示例
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4 # 失败重试次数
kubectl apply -f job.yaml
# 查看 Job
kubectl get jobs
# 输出
NAME COMPLETIONS DURATION AGE
pi 1/1 5s 10s
# 查看 Pod
kubectl get pods -l job-name=pi
# 查看日志
kubectl logs -l job-name=pi
并行执行
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-job
spec:
completions: 10 # 总共需要完成 10 个
parallelism: 3 # 并行执行 3 个
template:
spec:
containers:
- name: worker
image: busybox
command:
- /bin/sh
- -c
- echo "Processing task"; sleep 10
restartPolicy: Never
执行过程:
时间 0s: 启动 3 个 Pod(0、1、2)
时间 10s: Pod 0 完成,启动 Pod 3
时间 10s: Pod 1 完成,启动 Pod 4
时间 10s: Pod 2 完成,启动 Pod 5
...
完成模式
NonIndexed(默认)
普通模式,Pod 没有索引:
spec:
completionMode: NonIndexed
completions: 5
Indexed
每个 Pod 有唯一索引(0 到 completions-1):
spec:
completionMode: Indexed
completions: 5
parallelism: 2
template:
spec:
containers:
- name: worker
image: busybox
command:
- /bin/sh
- -c
- echo "Task index: $JOB_COMPLETION_INDEX"; sleep 5
restartPolicy: Never
Pod 可以通过 JOB_COMPLETION_INDEX 环境变量获取索引。
超时设置
spec:
activeDeadlineSeconds: 100 # Job 运行超时时间
ttlSecondsAfterFinished: 3600 # 完成后 1 小时自动清理
实战示例:数据迁移
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
completions: 1
template:
spec:
containers:
- name: migrator
image: mysql:8.0
command:
- /bin/bash
- -c
- |
mysql -h old-db -u root -p$DB_PASSWORD olddb < /scripts/dump.sql
mysql -h new-db -u root -p$DB_PASSWORD newdb < /scripts/dump.sql
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
volumeMounts:
- name: scripts
mountPath: /scripts
volumes:
- name: scripts
configMap:
name: migration-scripts
restartPolicy: OnFailure
backoffLimit: 3
CronJob - 定时任务
CronJob 按照 Cron 格式的时间表创建 Job。
基础示例
apiVersion: batch/v1
kind: CronJob
metadata:
name: backup
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup-tool:v1
command:
- /bin/sh
- -c
- /scripts/backup.sh
volumeMounts:
- name: backup-volume
mountPath: /backup
volumes:
- name: backup-volume
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
Cron 表达式
* * * * *
│ │ │ │ │
│ │ │ │ └─── 星期 (0-6)
│ │ │ └───── 月份 (1-12)
│ │ └─────── 日期 (1-31)
│ └───────── 小时 (0-23)
└─────────── 分钟 (0-59)
常用示例:
# 每分钟
schedule: "* * * * *"
# 每小时
schedule: "0 * * * *"
# 每天凌晨 2 点
schedule: "0 2 * * *"
# 每周日凌晨 3 点
schedule: "0 3 * * 0"
# 每月 1 号凌晨 4 点
schedule: "0 4 1 * *"
# 工作日上午 9 点
schedule: "0 9 * * 1-5"
# 每 6 小时
schedule: "0 */6 * * *"
并发策略
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Allow # Allow | Forbid | Replace
- Allow:允许并发执行(默认)
- Forbid:禁止并发,跳过新任务
- Replace:取消当前任务,运行新任务
历史限制
spec:
successfulJobsHistoryLimit: 3 # 保留 3 个成功的 Job
failedJobsHistoryLimit: 1 # 保留 1 个失败的 Job
实战示例:数据库备份
apiVersion: batch/v1
kind: CronJob
metadata:
name: mysql-backup
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: mysql:8.0
command:
- /bin/bash
- -c
- |
BACKUP_FILE="/backup/mysql-$(date +%Y%m%d-%H%M%S).sql"
mysqldump -h mysql -u root -p$MYSQL_ROOT_PASSWORD --all-databases > $BACKUP_FILE
gzip $BACKUP_FILE
echo "Backup completed: $BACKUP_FILE.gz"
# 删除 7 天前的备份
find /backup -name "*.sql.gz" -mtime +7 -delete
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
实战示例:日志清理
apiVersion: batch/v1
kind: CronJob
metadata:
name: log-cleanup
spec:
schedule: "0 0 * * *" # 每天午夜
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: busybox
command:
- /bin/sh
- -c
- |
echo "Starting log cleanup..."
find /var/log -name "*.log" -mtime +30 -delete
echo "Cleanup completed"
volumeMounts:
- name: logs
mountPath: /var/log
volumes:
- name: logs
hostPath:
path: /var/log
restartPolicy: OnFailure
backoffLimit: 3
常用命令
# DaemonSet
kubectl get daemonset
kubectl describe daemonset <name>
kubectl delete daemonset <name>
# Job
kubectl create job test --image=busybox -- echo "Hello"
kubectl get jobs
kubectl describe job <name>
kubectl logs -l job-name=<name>
kubectl delete job <name>
# CronJob
kubectl create cronjob test --schedule="*/1 * * * *" --image=busybox -- echo "Hello"
kubectl get cronjobs
kubectl describe cronjob <name>
kubectl delete cronjob <name>
# 手动触发 CronJob
kubectl create job --from=cronjob/<cronjob-name> <job-name>
# 暂停 CronJob
kubectl patch cronjob <name> -p '{"spec":{"suspend":true}}'
# 恢复 CronJob
kubectl patch cronjob <name> -p '{"spec":{"suspend":false}}'
最佳实践
DaemonSet
1. 资源限制
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 200Mi
2. 使用 hostNetwork 谨慎
spec:
hostNetwork: true # 仅在必要时使用
3. 配置容忍度
tolerations:
- effect: NoSchedule
operator: Exists
Job
1. 设置超时
spec:
activeDeadlineSeconds: 300
backoffLimit: 3
2. 使用 restartPolicy
spec:
template:
spec:
restartPolicy: OnFailure # Never | OnFailure
3. 自动清理
spec:
ttlSecondsAfterFinished: 3600
CronJob
1. 避免并发
spec:
concurrencyPolicy: Forbid
2. 保留适当历史
spec:
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
3. 幂等性
确保任务可以安全地重复执行。
小结
DaemonSet:
- 在每个节点运行一个 Pod
- 适合系统级守护进程
- 日志收集、监控、网络代理
Job:
- 运行一次性任务
- 确保成功完成
- 数据迁移、批处理
CronJob:
- 定时执行任务
- 基于 Cron 表达式
- 备份、清理、报表
下一章我们将学习 Ingress,实现 HTTP 路由和负载均衡。