成本优化和最佳实践
本章节详细介绍如何优化 AWS 成本,包括成本分析、Spot 实例策略、Reserved Instances、资源右调优以及 FinOps 最佳实践。
成本分析架构
成本组成
总成本
├─ 计算成本 (40-50%)
│ ├─ EKS 控制平面:$0.10/小时/集群
│ ├─ EC2 实例(Worker Nodes)
│ │ ├─ On-Demand:标准价格
│ │ ├─ Reserved Instances:节省 40-60%
│ │ └─ Spot Instances:节省 70-90%
│ └─ NAT Gateway:$0.045/小时 + 数据处理费
│
├─ 存储成本 (15-20%)
│ ├─ EBS Volumes
│ │ ├─ gp3:$0.08/GB/月
│ │ ├─ gp2:$0.10/GB/月
│ │ └─ io2:$0.125/GB/月 + IOPS 费用
│ ├─ S3
│ │ ├─ Standard:$0.023/GB/月
│ │ ├─ IA:$0.0125/GB/月
│ │ └─ Glacier:$0.004/GB/月
│ └─ EFS:$0.30/GB/月
│
├─ 数据库成本 (20-30%)
│ ├─ RDS:实例 + 存储 + 备份
│ ├─ ElastiCache:节点费用
│ └─ DynamoDB:按需或预置容量
│
├─ 网络成本 (5-10%)
│ ├─ 数据传输出 AWS
│ ├─ 跨 AZ 数据传输
│ ├─ 跨区域数据传输
│ └─ NAT Gateway 数据处理
│
└─ 其他成本 (5-10%)
├─ CloudWatch Logs
├─ ALB/NLB
├─ Route 53
└─ KMS
成本可视化
┌─────────────────────────────────────────────────────┐
│ AWS Cost Explorer │
│ ┌──────────────────────────────────────────────┐ │
│ │ 按服务分组 │ │
│ │ 按标签分组(环境、团队、项目) │ │
│ │ 按时间趋势 │ │
│ └──────────────────────────────────────────────┘ │
└───────────────────┬─────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ AWS Cost Anomaly Detection │
│ ┌──────────────────────────────────────────────┐ │
│ │ 自动检测异常支出 │ │
│ │ 发送告警通知 │ │
│ └──────────────────────────────────────────────┘ │
└───────────────────┬─────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ AWS Budgets │
│ ┌──────────────────────────────────────────────┐ │
│ │ 设置预算阈值 │ │
│ │ 超预算告警 │ │
│ │ 预测性告警 │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
标签策略
成本分配标签
标签架构:
# 必需标签
Environment: production | staging | development
Team: platform | data | backend | frontend
Project: user-service | order-service | payment-service
CostCenter: engineering | product | marketing
Owner: team-email@company.com
# 可选标签
Application: api-gateway | database | cache
Component: compute | storage | network
ManagedBy: terraform | helm | manual
Backup: enabled | disabled
应用标签脚本:
#!/bin/bash
# tag-resources.sh
REGION="us-east-1"
echo "================================================"
echo "为资源添加成本分配标签"
echo "================================================"
# 1. 标记 EC2 实例
echo ""
echo "1. 标记 EC2 实例..."
aws ec2 describe-instances \
--filters "Name=tag:kubernetes.io/cluster/production-eks-cluster,Values=owned" \
--query 'Reservations[].Instances[].InstanceId' \
--output text | \
xargs -I {} aws ec2 create-tags \
--resources {} \
--tags \
Key=Environment,Value=production \
Key=Team,Value=platform \
Key=CostCenter,Value=engineering \
--region $REGION
# 2. 标记 EBS 卷
echo ""
echo "2. 标记 EBS 卷..."
aws ec2 describe-volumes \
--filters "Name=tag:kubernetes.io/cluster/production-eks-cluster,Values=owned" \
--query 'Volumes[].VolumeId' \
--output text | \
xargs -I {} aws ec2 create-tags \
--resources {} \
--tags \
Key=Environment,Value=production \
Key=Team,Value=platform \
Key=CostCenter,Value=engineering \
--region $REGION
# 3. 标记 RDS 实例
echo ""
echo "3. 标记 RDS 实例..."
aws rds list-tags-for-resource \
--resource-name arn:aws:rds:us-east-1:123456789012:db:production-postgres-users \
--region $REGION
aws rds add-tags-to-resource \
--resource-name arn:aws:rds:us-east-1:123456789012:db:production-postgres-users \
--tags \
Key=Environment,Value=production \
Key=Team,Value=backend \
Key=CostCenter,Value=engineering \
--region $REGION
# 4. 标记 S3 存储桶
echo ""
echo "4. 标记 S3 存储桶..."
aws s3api put-bucket-tagging \
--bucket production-app-assets-123456789012 \
--tagging 'TagSet=[
{Key=Environment,Value=production},
{Key=Team,Value=platform},
{Key=CostCenter,Value=engineering}
]'
echo ""
echo "================================================"
echo "标签添加完成!"
echo "================================================"
Kubernetes 资源标签:
# deployment-with-cost-tags.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
namespace: production
labels:
app: user-service
environment: production
team: backend
cost-center: engineering
project: user-management
spec:
template:
metadata:
labels:
app: user-service
environment: production
team: backend
cost-center: engineering
计算成本优化
Spot 实例策略
Spot Node Group 配置:
#!/bin/bash
# create-spot-node-group-advanced.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
echo "创建多实例类型 Spot 节点组..."
# 实例类型策略:
# - 选择多个实例类型提高可用性
# - 选择相似规格的实例类型
# - 使用 capacity-optimized 分配策略
aws eks create-nodegroup \
--cluster-name $CLUSTER_NAME \
--nodegroup-name spot-mixed-nodes \
--node-role $EKS_NODE_ROLE_ARN \
--subnets $PRIVATE_APP_SUBNET_1A $PRIVATE_APP_SUBNET_1B $PRIVATE_APP_SUBNET_1C \
--instance-types m5.xlarge m5a.xlarge m5n.xlarge c5.xlarge c5a.xlarge \
--capacity-type SPOT \
--scaling-config minSize=3,maxSize=30,desiredSize=6 \
--update-config maxUnavailable=1 \
--labels \
workload-type=stateless,\
capacity-type=spot,\
environment=production \
--taints \
key=spot,value=true,effect=NoSchedule \
--tags \
Environment=production,\
CapacityType=SPOT,\
CostOptimized=true \
--region $REGION
echo "✓ Spot 节点组已创建"
Spot 中断处理:
# spot-interrupt-handler.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: aws-node-termination-handler
namespace: kube-system
spec:
selector:
matchLabels:
app: aws-node-termination-handler
template:
metadata:
labels:
app: aws-node-termination-handler
spec:
serviceAccountName: aws-node-termination-handler
hostNetwork: true
containers:
- name: aws-node-termination-handler
image: public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: ENABLE_SPOT_INTERRUPTION_DRAINING
value: "true"
- name: ENABLE_SCHEDULED_EVENT_DRAINING
value: "true"
- name: DELETE_LOCAL_DATA
value: "true"
- name: IGNORE_DAEMON_SETS
value: "true"
- name: POD_TERMINATION_GRACE_PERIOD
value: "30"
- name: WEBHOOK_URL
value: "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK"
Spot 实例适用工作负载:
# stateless-deployment-spot.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: batch-processor
namespace: production
spec:
replicas: 10
selector:
matchLabels:
app: batch-processor
template:
metadata:
labels:
app: batch-processor
spec:
# 调度到 Spot 节点
tolerations:
- key: spot
value: "true"
effect: NoSchedule
nodeSelector:
capacity-type: spot
# 优雅关闭
terminationGracePeriodSeconds: 120
containers:
- name: processor
image: batch-processor:v1.0.0
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
Reserved Instances 策略
RI 购买分析:
#!/bin/bash
# analyze-ri-opportunities.sh
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
echo "================================================"
echo "分析 Reserved Instance 购买机会"
echo "================================================"
# 1. 获取过去 30 天的实例使用情况
echo ""
echo "1. 分析实例使用模式..."
aws ce get-cost-and-usage \
--time-period Start=2024-01-01,End=2024-01-31 \
--granularity DAILY \
--metrics UnblendedCost UsageQuantity \
--group-by Type=DIMENSION,Key=INSTANCE_TYPE \
--filter '{
"Dimensions": {
"Key": "SERVICE",
"Values": ["Amazon Elastic Compute Cloud - Compute"]
}
}' \
--region $REGION
# 2. 获取 RI 推荐
echo ""
echo "2. 获取 AWS RI 购买建议..."
aws ce get-reservation-purchase-recommendation \
--service "Amazon Elastic Compute Cloud - Compute" \
--lookback-period-in-days SIXTY_DAYS \
--term-in-years ONE_YEAR \
--payment-option PARTIAL_UPFRONT \
--region $REGION
# 3. 当前 RI 利用率
echo ""
echo "3. 当前 RI 利用率..."
aws ce get-reservation-utilization \
--time-period Start=2024-01-01,End=2024-01-31 \
--region $REGION
echo ""
echo "================================================"
echo "建议:"
echo " 1. 对稳定运行的节点组使用 RI"
echo " 2. 推荐 1 年期 Partial Upfront"
echo " 3. 定期审查 RI 利用率"
echo "================================================"
RI 购买脚本:
#!/bin/bash
# purchase-reserved-instances.sh
REGION="us-east-1"
echo "购买 Reserved Instances..."
# 示例:购买 3 个 m5.xlarge 实例,1 年期,部分预付
aws ec2 purchase-reserved-instances-offering \
--reserved-instances-offering-id <OFFERING_ID> \
--instance-count 3 \
--region $REGION
echo "✓ RI 购买完成"
Savings Plans
Savings Plans 分析:
#!/bin/bash
# analyze-savings-plans.sh
REGION="us-east-1"
echo "分析 Savings Plans 机会..."
# 获取 Savings Plans 推荐
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type COMPUTE_SP \
--term-in-years ONE_YEAR \
--payment-option PARTIAL_UPFRONT \
--lookback-period-in-days SIXTY_DAYS \
--region $REGION
echo ""
echo "Compute Savings Plans vs EC2 Reserved Instances:"
echo " Compute SP:更灵活,可跨实例类型、区域、操作系统"
echo " EC2 RI:折扣更高,但灵活性低"
echo ""
echo "推荐策略:"
echo " 1. 基础容量:EC2 RI(最大折扣)"
echo " 2. 弹性容量:Compute SP(灵活性)"
echo " 3. 峰值容量:On-Demand + Spot"
存储成本优化
EBS 优化
EBS 卷分析:
#!/bin/bash
# analyze-ebs-volumes.sh
REGION="us-east-1"
echo "================================================"
echo "分析 EBS 卷优化机会"
echo "================================================"
# 1. 查找未使用的卷
echo ""
echo "1. 未使用的 EBS 卷:"
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType,CreateTime:CreateTime}' \
--output table \
--region $REGION
# 2. 查找低利用率的卷
echo ""
echo "2. 过去 7 天平均 IOPS < 100 的卷:"
aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name VolumeReadOps \
--dimensions Name=VolumeId,Value=vol-xxxxx \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Average \
--region $REGION
# 3. 查找 gp2 卷(可迁移到 gp3)
echo ""
echo "3. 可优化为 gp3 的 gp2 卷:"
aws ec2 describe-volumes \
--filters Name=volume-type,Values=gp2 \
--query 'Volumes[].{ID:VolumeId,Size:Size,State:State}' \
--output table \
--region $REGION
echo ""
echo "================================================"
echo "优化建议:"
echo " 1. 删除未使用的卷(节省 100%)"
echo " 2. gp2 → gp3(节省 20%,性能提升)"
echo " 3. 减小过度配置的卷"
echo "================================================"
迁移到 gp3:
#!/bin/bash
# migrate-gp2-to-gp3.sh
REGION="us-east-1"
echo "迁移 gp2 到 gp3..."
# 获取所有 gp2 卷
GP2_VOLUMES=$(aws ec2 describe-volumes \
--filters Name=volume-type,Values=gp2 \
--query 'Volumes[].VolumeId' \
--output text \
--region $REGION)
for VOLUME_ID in $GP2_VOLUMES; do
echo "迁移卷: $VOLUME_ID"
aws ec2 modify-volume \
--volume-id $VOLUME_ID \
--volume-type gp3 \
--iops 3000 \
--throughput 125 \
--region $REGION
echo " ✓ 已提交修改请求"
done
echo ""
echo "注意:卷类型修改可能需要几分钟到几小时"
S3 生命周期策略
智能分层策略:
#!/bin/bash
# configure-s3-lifecycle.sh
BUCKET="production-app-assets-123456789012"
echo "配置 S3 生命周期策略..."
cat > lifecycle-policy.json << 'EOF'
{
"Rules": [
{
"Id": "IntelligentTiering",
"Status": "Enabled",
"Filter": {
"Prefix": "assets/"
},
"Transitions": [
{
"Days": 0,
"StorageClass": "INTELLIGENT_TIERING"
}
]
},
{
"Id": "ArchiveOldLogs",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "GLACIER"
},
{
"Days": 90,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 365
}
},
{
"Id": "DeleteOldVersions",
"Status": "Enabled",
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "GLACIER"
}
],
"NoncurrentVersionExpiration": {
"NoncurrentDays": 90
}
},
{
"Id": "DeleteIncompleteMultipart",
"Status": "Enabled",
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}
]
}
EOF
aws s3api put-bucket-lifecycle-configuration \
--bucket $BUCKET \
--lifecycle-configuration file://lifecycle-policy.json
echo "✓ 生命周期策略已配置"
rm -f lifecycle-policy.json
网络成本优化
NAT Gateway 优化
分析 NAT Gateway 成本:
#!/bin/bash
# analyze-nat-gateway-cost.sh
REGION="us-east-1"
echo "================================================"
echo "分析 NAT Gateway 成本"
echo "================================================"
# 1. NAT Gateway 数据处理量
echo ""
echo "1. 过去 30 天数据处理量:"
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=nat-xxxxx \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Sum \
--region $REGION
# 2. 计算成本
echo ""
echo "2. 成本估算:"
echo " NAT Gateway 小时费用:$0.045/小时 = $32.4/月"
echo " 数据处理费用:$0.045/GB"
echo ""
echo "示例:处理 1TB 数据"
echo " 固定费用:$32.4"
echo " 数据费用:1000GB × $0.045 = $45"
echo " 总计:$77.4/月"
echo ""
echo "================================================"
echo "优化建议:"
echo " 1. 使用 VPC Endpoints 访问 AWS 服务"
echo " 2. 缓存外部 API 响应"
echo " 3. 使用 S3 Gateway Endpoint"
echo " 4. 考虑单 NAT Gateway(非生产环境)"
echo "================================================"
配置 VPC Endpoints:
#!/bin/bash
# create-vpc-endpoints.sh
source vpc-config.sh
source sg-config.sh
REGION="us-east-1"
echo "创建 VPC Endpoints(节省 NAT Gateway 成本)..."
# 1. S3 Gateway Endpoint(免费)
echo ""
echo "1. 创建 S3 Gateway Endpoint..."
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--service-name com.amazonaws.$REGION.s3 \
--route-table-ids $PRIVATE_ROUTE_TABLE_ID \
--region $REGION
# 2. ECR API Endpoint
echo ""
echo "2. 创建 ECR API Endpoint..."
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.$REGION.ecr.api \
--subnet-ids $PRIVATE_APP_SUBNET_1A $PRIVATE_APP_SUBNET_1B $PRIVATE_APP_SUBNET_1C \
--security-group-ids $VPCE_SG_ID \
--region $REGION
# 3. ECR DKR Endpoint
echo ""
echo "3. 创建 ECR DKR Endpoint..."
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.$REGION.ecr.dkr \
--subnet-ids $PRIVATE_APP_SUBNET_1A $PRIVATE_APP_SUBNET_1B $PRIVATE_APP_SUBNET_1C \
--security-group-ids $VPCE_SG_ID \
--region $REGION
# 4. CloudWatch Logs Endpoint
echo ""
echo "4. 创建 CloudWatch Logs Endpoint..."
aws ec2 create-vpc-endpoint \
--vpc-id $VPC_ID \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.$REGION.logs \
--subnet-ids $PRIVATE_APP_SUBNET_1A $PRIVATE_APP_SUBNET_1B $PRIVATE_APP_SUBNET_1C \
--security-group-ids $VPCE_SG_ID \
--region $REGION
echo ""
echo "✓ VPC Endpoints 已创建"
echo ""
echo "预计每月节省:"
echo " - 减少 NAT Gateway 数据处理费用"
echo " - Interface Endpoint 费用:$0.01/小时 = $7.2/月"
echo " - 数据处理免费(AWS 服务)"
跨 AZ 数据传输优化
分析跨 AZ 流量:
# service-topology-aware.yaml
apiVersion: v1
kind: Service
metadata:
name: user-service
namespace: production
annotations:
service.kubernetes.io/topology-aware-hints: auto
spec:
selector:
app: user-service
ports:
- port: 9001
type: ClusterIP
# 拓扑感知路由(减少跨 AZ 流量)
topologyKeys:
- "topology.kubernetes.io/zone"
- "*"
数据库成本优化
RDS 成本优化
RDS 实例右调优:
#!/bin/bash
# analyze-rds-utilization.sh
DB_INSTANCE="production-postgres-users"
REGION="us-east-1"
echo "分析 RDS 实例利用率..."
# CPU 利用率(过去 30 天)
echo ""
echo "CPU 利用率(30 天平均):"
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=$DB_INSTANCE \
--start-time $(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 86400 \
--statistics Average \
--region $REGION
# 连接数
echo ""
echo "数据库连接数:"
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBInstanceIdentifier,Value=$DB_INSTANCE \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Average,Maximum \
--region $REGION
echo ""
echo "优化建议:"
echo " - CPU < 40%:考虑降级实例类型"
echo " - CPU 40-70%:当前配置合理"
echo " - CPU > 70%:考虑升级或优化查询"
Aurora Serverless v2 迁移:
#!/bin/bash
# migrate-to-aurora-serverless.sh
echo "================================================"
echo "Aurora Serverless v2 成本对比"
echo "================================================"
echo ""
echo "传统 RDS (db.r6g.xlarge):"
echo " 固定成本:$0.42/小时 = $302.4/月"
echo ""
echo "Aurora Serverless v2:"
echo " 按需计费:$0.12/ACU/小时"
echo " 最小容量:0.5 ACU"
echo " 最大容量:根据负载自动扩展"
echo ""
echo "示例成本计算:"
echo " 平均 2 ACU × 720 小时 × $0.12 = $172.8/月"
echo " 节省:$302.4 - $172.8 = $129.6/月 (43%)"
echo ""
echo "适用场景:"
echo " ✓ 间歇性工作负载"
echo " ✓ 开发/测试环境"
echo " ✓ 不可预测的流量模式"
DynamoDB 成本优化
按需 vs 预置容量:
#!/bin/bash
# analyze-dynamodb-cost.sh
TABLE_NAME="production-sessions"
echo "分析 DynamoDB 成本..."
# 过去 7 天的读写请求
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name ConsumedReadCapacityUnits \
--dimensions Name=TableName,Value=$TABLE_NAME \
--start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 3600 \
--statistics Sum \
--region us-east-1
echo ""
echo "成本对比:"
echo ""
echo "按需模式:"
echo " 读:$1.25 per million"
echo " 写:$6.25 per million"
echo ""
echo "预置容量:"
echo " 读:$0.00013/RCU/小时 = $0.0936/RCU/月"
echo " 写:$0.00065/WCU/小时 = $0.468/WCU/月"
echo ""
echo "切换建议:"
echo " - 流量不可预测 → 按需模式"
echo " - 稳定流量 → 预置容量 + Auto Scaling"
echo " - 预测流量 > 实际用量的 50% → 按需模式"
监控和日志成本优化
CloudWatch Logs 优化
配置日志保留策略:
#!/bin/bash
# optimize-cloudwatch-logs.sh
REGION="us-east-1"
echo "优化 CloudWatch Logs 成本..."
# 设置日志保留期
LOG_GROUPS=$(aws logs describe-log-groups \
--query 'logGroups[].logGroupName' \
--output text \
--region $REGION)
for LOG_GROUP in $LOG_GROUPS; do
echo "配置日志组: $LOG_GROUP"
# 根据日志类型设置不同保留期
if [[ $LOG_GROUP == *"/aws/eks/"* ]]; then
RETENTION=7 # EKS 日志保留 7 天
elif [[ $LOG_GROUP == *"/aws/rds/"* ]]; then
RETENTION=30 # RDS 日志保留 30 天
else
RETENTION=14 # 其他日志保留 14 天
fi
aws logs put-retention-policy \
--log-group-name $LOG_GROUP \
--retention-in-days $RETENTION \
--region $REGION
echo " ✓ 保留期:$RETENTION 天"
done
echo ""
echo "✓ 日志保留策略已配置"
导出到 S3(长期存储):
#!/bin/bash
# export-logs-to-s3.sh
LOG_GROUP="/aws/eks/production-eks-cluster/cluster"
BUCKET="production-logs-123456789012"
PREFIX="eks-logs"
FROM=$(date -u -d '7 days ago' +%s)000
TO=$(date -u +%s)000
echo "导出日志到 S3..."
TASK_ID=$(aws logs create-export-task \
--log-group-name $LOG_GROUP \
--from $FROM \
--to $TO \
--destination $BUCKET \
--destination-prefix $PREFIX \
--query 'taskId' \
--output text)
echo "导出任务 ID: $TASK_ID"
echo ""
echo "成本对比:"
echo " CloudWatch Logs:$0.50/GB/月"
echo " S3 Standard:$0.023/GB/月"
echo " 节省:95%"
成本治理和 FinOps
AWS Budgets 配置
#!/bin/bash
# create-cost-budgets.sh
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
EMAIL="finance-team@company.com"
echo "创建成本预算..."
# 1. 每月总成本预算
cat > monthly-budget.json << EOF
{
"BudgetName": "monthly-total-cost",
"BudgetType": "COST",
"TimeUnit": "MONTHLY",
"BudgetLimit": {
"Amount": "10000",
"Unit": "USD"
},
"CostFilters": {},
"CostTypes": {
"IncludeTax": true,
"IncludeSubscription": true,
"UseBlended": false,
"IncludeRefund": false,
"IncludeCredit": false,
"IncludeUpfront": true,
"IncludeRecurring": true,
"IncludeOtherSubscription": true,
"IncludeSupport": true,
"IncludeDiscount": true,
"UseAmortized": false
},
"TimePeriod": {
"Start": "2024-01-01T00:00:00Z",
"End": "2087-06-15T00:00:00Z"
}
}
EOF
cat > budget-notifications.json << EOF
[
{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [
{
"SubscriptionType": "EMAIL",
"Address": "$EMAIL"
}
]
},
{
"Notification": {
"NotificationType": "FORECASTED",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 100,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [
{
"SubscriptionType": "EMAIL",
"Address": "$EMAIL"
}
]
}
]
EOF
aws budgets create-budget \
--account-id $ACCOUNT_ID \
--budget file://monthly-budget.json \
--notifications-with-subscribers file://budget-notifications.json
echo "✓ 预算已创建"
rm -f monthly-budget.json budget-notifications.json
成本异常检测
#!/bin/bash
# setup-cost-anomaly-detection.sh
echo "配置成本异常检测..."
# 创建成本异常监控器
aws ce create-anomaly-monitor \
--anomaly-monitor '{
"MonitorName": "Production Cost Monitor",
"MonitorType": "DIMENSIONAL",
"MonitorDimension": "SERVICE"
}'
# 创建订阅
aws ce create-anomaly-subscription \
--anomaly-subscription '{
"SubscriptionName": "Daily Cost Anomaly Alert",
"Threshold": 100,
"Frequency": "DAILY",
"MonitorArnList": ["arn:aws:ce::123456789012:anomalymonitor/xxx"],
"Subscribers": [
{
"Type": "EMAIL",
"Address": "finance-team@company.com"
}
]
}'
echo "✓ 异常检测已配置"
成本报告自动化
#!/bin/bash
# generate-cost-report.sh
REGION="us-east-1"
START_DATE=$(date -u -d '1 month ago' +%Y-%m-01)
END_DATE=$(date -u +%Y-%m-01)
echo "================================================"
echo "生成成本报告"
echo "时间范围:$START_DATE 至 $END_DATE"
echo "================================================"
# 1. 按服务分组
echo ""
echo "1. 按服务分组的成本:"
aws ce get-cost-and-usage \
--time-period Start=$START_DATE,End=$END_DATE \
--granularity MONTHLY \
--metrics UnblendedCost \
--group-by Type=DIMENSION,Key=SERVICE \
--region $REGION \
--output table
# 2. 按标签分组(团队)
echo ""
echo "2. 按团队分组的成本:"
aws ce get-cost-and-usage \
--time-period Start=$START_DATE,End=$END_DATE \
--granularity MONTHLY \
--metrics UnblendedCost \
--group-by Type=TAG,Key=Team \
--region $REGION \
--output table
# 3. 趋势分析
echo ""
echo "3. 成本趋势(每日):"
aws ce get-cost-and-usage \
--time-period Start=$START_DATE,End=$END_DATE \
--granularity DAILY \
--metrics UnblendedCost \
--region $REGION
echo ""
echo "================================================"
echo "报告生成完成"
echo "================================================"
最佳实践总结
1. 成本可见性
✓ 实施完整的标签策略
✓ 启用 Cost Allocation Tags
✓ 定期审查 Cost Explorer
✓ 配置成本异常检测
✓ 设置预算和告警
2. 计算优化
✓ 混合使用 On-Demand、RI、Spot
✓ Right-sizing 实例
✓ 使用 Savings Plans
✓ 删除未使用的资源
✓ 使用 Graviton 实例(ARM)
3. 存储优化
✓ 使用 S3 生命周期策略
✓ gp2 迁移到 gp3
✓ 删除未使用的快照和卷
✓ 启用 S3 Intelligent-Tiering
✓ 压缩和去重数据
4. 网络优化
✓ 使用 VPC Endpoints
✓ 减少跨 AZ 流量
✓ 使用 CloudFront CDN
✓ 优化 NAT Gateway 使用
✓ 启用拓扑感知路由
5. FinOps 文化
✓ 成本责任明确到团队
✓ 定期成本审查会议
✓ 成本优化 KPI
✓ 工程师成本意识培养
✓ 自动化成本报告
6. 持续优化
✓ 每月成本审查
✓ 定期清理资源
✓ 监控 RI/SP 利用率
✓ 更新优化策略
✓ 跟踪优化效果
总结
本高可用架构实战教程覆盖了:
- 项目规划:需求分析、技术选型
- 网络架构:Multi-AZ VPC、子网规划、路由策略
- 安全配置:安全组、点对点引用、最小权限
- EKS 集群:控制平面、节点组、核心插件
- 应用部署:微服务、负载均衡、发布策略
- 数据库层:RDS、Redis、DynamoDB、S3
- 监控日志:Prometheus、Grafana、ELK、Jaeger
- 自动扩展:HPA、VPA、Cluster Autoscaler、备份恢复
- 成本优化:标签策略、RI/Spot、资源优化、FinOps
通过本教程的学习和实践,你将掌握在 AWS 上构建企业级高可用架构的完整技能。