EKS 集群创建和配置
本章节详细介绍如何在生产环境中创建和配置 Amazon EKS 集群,包括 IAM 角色配置、集群创建、节点组管理、插件安装和监控配置。
EKS 架构设计
集群规划
集群配置概览:
集群名称:production-eks-cluster
Kubernetes 版本:1.28
区域:us-east-1
可用区:us-east-1a, us-east-1b, us-east-1c
网络配置:
├─ VPC:10.0.0.0/16
├─ 控制平面子网:私有应用子网(3个AZ)
├─ 工作节点子网:私有应用子网(3个AZ)
└─ Pod 网络:VPC CNI(Secondary CIDR 可选)
端点访问:
├─ 公有端点:启用(受限 IP)
├─ 私有端点:启用
└─ 混合访问模式(推荐)
日志启用:
├─ API Server
├─ Audit
├─ Authenticator
├─ Controller Manager
└─ Scheduler
控制平面和数据平面
架构图:
┌─────────────────────────────────────────────────────┐
│ AWS 托管控制平面 │
│ ┌──────────────────────────────────────────────┐ │
│ │ API Server (Multi-AZ) │ │
│ │ etcd (Multi-AZ, 自动备份) │ │
│ │ Controller Manager │ │
│ │ Scheduler │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────┬───────────────────────────────┘
│ (安全组: sg-eks-cp)
│ 端点: 公有 + 私有
▼
┌─────────────────────────────────────────────────────┐
│ 工作节点(数据平面) │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ us-east-1a│ │ us-east-1b│ │ us-east-1c│ │
│ │ │ │ │ │ │ │
│ │ Node 1 │ │ Node 3 │ │ Node 5 │ │
│ │ Node 2 │ │ Node 4 │ │ Node 6 │ │
│ │ │ │ │ │ │ │
│ │ Pods... │ │ Pods... │ │ Pods... │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ (安全组: sg-eks-nodes) │
│ (IAM 角色: eks-node-role) │
└─────────────────────────────────────────────────────┘
网络模式
VPC CNI 插件:
工作原理:
├─ 每个 Pod 获得 VPC 内的真实 IP
├─ Pod 可以直接与 VPC 资源通信
├─ 无需额外的 NAT 或覆盖网络
└─ 性能最优
IP 地址管理:
├─ 主网卡:节点主 IP
├─ 辅助网卡:Pod IP 池
├─ 每个实例类型有不同的 ENI 和 IP 限制
└─ 需要仔细规划子网大小
示例(m5.xlarge):
├─ 最大 ENI:4 个
├─ 每个 ENI 的 IP:15 个
├─ 总可用 IP:4 × 15 - 1 = 59 个
│ (减去主网卡的主 IP)
├─ 最大 Pod 数:58 个
└─ 实际配置:建议保留 10-20% 余量
IAM 角色准备
EKS 集群角色
用途: EKS 控制平面调用 AWS API
信任策略:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
创建脚本:
#!/bin/bash
# create-eks-cluster-role.sh
REGION="us-east-1"
ROLE_NAME="eks-cluster-role"
echo "创建 EKS 集群 IAM 角色..."
# 创建信任策略文件
cat > eks-cluster-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "eks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
# 创建角色
aws iam create-role \
--role-name $ROLE_NAME \
--assume-role-policy-document file://eks-cluster-trust-policy.json \
--description "IAM role for EKS cluster" \
--region $REGION
# 附加必需的托管策略
echo "附加 AmazonEKSClusterPolicy..."
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy \
--region $REGION
# 附加 VPC 资源控制器策略
echo "附加 AmazonEKSVPCResourceController..."
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSVPCResourceController \
--region $REGION
# 获取角色 ARN
CLUSTER_ROLE_ARN=$(aws iam get-role \
--role-name $ROLE_NAME \
--query 'Role.Arn' \
--output text)
echo "EKS 集群角色 ARN: $CLUSTER_ROLE_ARN"
echo "export EKS_CLUSTER_ROLE_ARN=$CLUSTER_ROLE_ARN" >> eks-config.sh
# 清理临时文件
rm -f eks-cluster-trust-policy.json
echo "✓ EKS 集群角色创建完成"
EKS Node 角色
用途: Worker 节点(EC2 实例)调用 AWS API
信任策略:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
创建脚本:
#!/bin/bash
# create-eks-node-role.sh
REGION="us-east-1"
ROLE_NAME="eks-node-role"
echo "创建 EKS Node IAM 角色..."
# 创建信任策略文件
cat > eks-node-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
# 创建角色
aws iam create-role \
--role-name $ROLE_NAME \
--assume-role-policy-document file://eks-node-trust-policy.json \
--description "IAM role for EKS worker nodes" \
--region $REGION
# 附加必需的托管策略
# 1. 核心 Worker Node 策略
echo "附加 AmazonEKSWorkerNodePolicy..."
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \
--region $REGION
# 2. CNI 策略(Pod 网络)
echo "附加 AmazonEKS_CNI_Policy..."
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \
--region $REGION
# 3. ECR 只读策略(拉取镜像)
echo "附加 AmazonEC2ContainerRegistryReadOnly..."
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \
--region $REGION
# 4. SSM 策略(远程管理)
echo "附加 AmazonSSMManagedInstanceCore..."
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore \
--region $REGION
# 创建自定义策略(CloudWatch Logs)
echo "创建 CloudWatch Logs 策略..."
cat > eks-node-cloudwatch-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
EOF
aws iam put-role-policy \
--role-name $ROLE_NAME \
--policy-name EKSNodeCloudWatchPolicy \
--policy-document file://eks-node-cloudwatch-policy.json \
--region $REGION
# 获取角色 ARN
NODE_ROLE_ARN=$(aws iam get-role \
--role-name $ROLE_NAME \
--query 'Role.Arn' \
--output text)
echo "EKS Node 角色 ARN: $NODE_ROLE_ARN"
echo "export EKS_NODE_ROLE_ARN=$NODE_ROLE_ARN" >> eks-config.sh
# 清理临时文件
rm -f eks-node-trust-policy.json eks-node-cloudwatch-policy.json
echo "✓ EKS Node 角色创建完成"
EKS 集群创建
创建集群脚本
#!/bin/bash
# create-eks-cluster.sh
set -e
# 加载配置
source vpc-config.sh
source sg-config.sh
source eks-config.sh
REGION="us-east-1"
CLUSTER_NAME="production-eks-cluster"
K8S_VERSION="1.28"
OFFICE_IP="203.0.113.0/24" # 替换为实际办公室 IP
echo "================================================"
echo "创建 EKS 集群: $CLUSTER_NAME"
echo "版本: $K8S_VERSION"
echo "区域: $REGION"
echo "================================================"
# 1. 创建集群
echo ""
echo "1. 创建 EKS 集群(预计 10-15 分钟)..."
aws eks create-cluster \
--name $CLUSTER_NAME \
--region $REGION \
--kubernetes-version $K8S_VERSION \
--role-arn $EKS_CLUSTER_ROLE_ARN \
--resources-vpc-config \
subnetIds=$PRIVATE_APP_SUBNET_1A,$PRIVATE_APP_SUBNET_1B,$PRIVATE_APP_SUBNET_1C,\
securityGroupIds=$EKS_CP_SG_ID,\
endpointPublicAccess=true,\
endpointPrivateAccess=true,\
publicAccessCidrs="$OFFICE_IP" \
--logging \
'{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}' \
--tags \
Environment=production,\
ManagedBy=script,\
Team=platform
echo " 集群创建请求已提交"
# 2. 等待集群激活
echo ""
echo "2. 等待集群变为 ACTIVE 状态..."
aws eks wait cluster-active \
--name $CLUSTER_NAME \
--region $REGION
echo " ✓ 集群已激活"
# 3. 更新 kubeconfig
echo ""
echo "3. 更新 kubeconfig..."
aws eks update-kubeconfig \
--name $CLUSTER_NAME \
--region $REGION
echo " ✓ kubeconfig 已更新"
# 4. 验证连接
echo ""
echo "4. 验证集群连接..."
kubectl cluster-info
kubectl get svc
# 5. 创建 OIDC Provider(用于 IRSA)
echo ""
echo "5. 创建 OIDC Identity Provider..."
# 获取 OIDC Issuer URL
OIDC_ISSUER=$(aws eks describe-cluster \
--name $CLUSTER_NAME \
--region $REGION \
--query 'cluster.identity.oidc.issuer' \
--output text)
echo " OIDC Issuer: $OIDC_ISSUER"
# 提取 OIDC ID
OIDC_ID=$(echo $OIDC_ISSUER | cut -d '/' -f 5)
echo " OIDC ID: $OIDC_ID"
# 检查是否已存在
EXISTING_PROVIDER=$(aws iam list-open-id-connect-providers \
--query "OpenIDConnectProviderList[?contains(Arn, '$OIDC_ID')].Arn" \
--output text)
if [ -z "$EXISTING_PROVIDER" ]; then
# 获取根 CA 指纹
THUMBPRINT=$(echo | openssl s_client -servername oidc.eks.$REGION.amazonaws.com \
-connect oidc.eks.$REGION.amazonaws.com:443 2>/dev/null | \
openssl x509 -fingerprint -noout | \
sed 's/://g' | \
awk -F= '{print tolower($2)}')
# 创建 OIDC Provider
aws iam create-open-id-connect-provider \
--url $OIDC_ISSUER \
--client-id-list sts.amazonaws.com \
--thumbprint-list $THUMBPRINT \
--region $REGION
echo " ✓ OIDC Provider 已创建"
else
echo " ✓ OIDC Provider 已存在: $EXISTING_PROVIDER"
fi
# 6. 输出集群信息
echo ""
echo "================================================"
echo "EKS 集群创建完成!"
echo "================================================"
echo ""
echo "集群信息:"
aws eks describe-cluster \
--name $CLUSTER_NAME \
--region $REGION \
--query 'cluster.{Name:name,Status:status,Version:version,Endpoint:endpoint,CreatedAt:createdAt}'
echo ""
echo "控制平面日志已启用:"
echo " ✓ API Server"
echo " ✓ Audit"
echo " ✓ Authenticator"
echo " ✓ Controller Manager"
echo " ✓ Scheduler"
echo ""
echo "端点访问:"
echo " ✓ 公有访问:启用(受限 IP)"
echo " ✓ 私有访问:启用"
echo ""
echo "下一步:创建节点组"
echo " 运行: ./create-node-groups.sh"
echo "================================================"
# 保存集群信息
echo "export CLUSTER_NAME=$CLUSTER_NAME" >> eks-config.sh
echo "export OIDC_ISSUER=$OIDC_ISSUER" >> eks-config.sh
echo "export OIDC_ID=$OIDC_ID" >> eks-config.sh
Managed Node Groups
节点组规划
节点组配置:
1. 通用型节点组(On-Demand):
├─ 名称:general-nodes
├─ 实例类型:m5.xlarge
├─ 数量:6 个(每 AZ 2 个)
├─ 最小:6,最大:12
├─ 磁盘:50GB gp3
└─ 用途:核心业务服务
2. Spot 实例节点组:
├─ 名称:spot-nodes
├─ 实例类型:m5.xlarge, c5.xlarge
├─ 数量:3 个(每 AZ 1 个)
├─ 最小:3,最大:9
├─ 磁盘:50GB gp3
└─ 用途:无状态服务、批处理
3. 计算密集型节点组(可选):
├─ 名称:compute-nodes
├─ 实例类型:c5.2xlarge
├─ 数量:3 个
├─ 磁盘:100GB gp3
└─ 用途:CPU 密集型任务
创建通用节点组
#!/bin/bash
# create-general-node-group.sh
source vpc-config.sh
source sg-config.sh
source eks-config.sh
REGION="us-east-1"
NODE_GROUP_NAME="general-nodes"
echo "================================================"
echo "创建通用型节点组: $NODE_GROUP_NAME"
echo "================================================"
# 创建节点组
aws eks create-nodegroup \
--cluster-name $CLUSTER_NAME \
--nodegroup-name $NODE_GROUP_NAME \
--region $REGION \
--node-role $EKS_NODE_ROLE_ARN \
--subnets $PRIVATE_APP_SUBNET_1A $PRIVATE_APP_SUBNET_1B $PRIVATE_APP_SUBNET_1C \
--instance-types m5.xlarge \
--scaling-config \
minSize=6,\
maxSize=12,\
desiredSize=6 \
--disk-size 50 \
--remote-access \
ec2SshKey=my-keypair,\
sourceSecurityGroups=$BASTION_SG_ID \
--labels \
role=general,\
environment=production \
--tags \
Environment=production,\
NodeGroup=general,\
ManagedBy=eks
echo " 节点组创建请求已提交"
# 等待节点组激活
echo ""
echo "等待节点组激活(预计 5-10 分钟)..."
aws eks wait nodegroup-active \
--cluster-name $CLUSTER_NAME \
--nodegroup-name $NODE_GROUP_NAME \
--region $REGION
echo " ✓ 节点组已激活"
# 验证节点
echo ""
echo "验证节点状态..."
kubectl get nodes \
--label-columns=role,environment,node.kubernetes.io/instance-type
echo ""
echo "================================================"
echo "通用型节点组创建完成!"
echo "================================================"
创建 Spot 节点组
#!/bin/bash
# create-spot-node-group.sh
source vpc-config.sh
source eks-config.sh
REGION="us-east-1"
NODE_GROUP_NAME="spot-nodes"
echo "================================================"
echo "创建 Spot 实例节点组: $NODE_GROUP_NAME"
echo "================================================"
# 创建节点组
aws eks create-nodegroup \
--cluster-name $CLUSTER_NAME \
--nodegroup-name $NODE_GROUP_NAME \
--region $REGION \
--node-role $EKS_NODE_ROLE_ARN \
--subnets $PRIVATE_APP_SUBNET_1A $PRIVATE_APP_SUBNET_1B $PRIVATE_APP_SUBNET_1C \
--instance-types m5.xlarge c5.xlarge \
--capacity-type SPOT \
--scaling-config \
minSize=3,\
maxSize=9,\
desiredSize=3 \
--disk-size 50 \
--labels \
role=spot,\
environment=production,\
workload=stateless \
--taints \
key=spot,value=true,effect=NoSchedule \
--tags \
Environment=production,\
NodeGroup=spot,\
CapacityType=SPOT
echo " Spot 节点组创建请求已提交"
# 等待节点组激活
echo ""
echo "等待节点组激活..."
aws eks wait nodegroup-active \
--cluster-name $CLUSTER_NAME \
--nodegroup-name $NODE_GROUP_NAME \
--region $REGION
echo " ✓ Spot 节点组已激活"
# 验证节点
echo ""
echo "验证 Spot 节点..."
kubectl get nodes -l role=spot
echo ""
echo "================================================"
echo "Spot 节点组创建完成!"
echo ""
echo "⚠️ 注意:"
echo " Spot 节点有污点 spot=true:NoSchedule"
echo " Pod 需要添加容忍度才能调度到 Spot 节点"
echo ""
echo "示例容忍度:"
echo " tolerations:"
echo " - key: spot"
echo " value: \"true\""
echo " effect: NoSchedule"
echo "================================================"
安装核心插件
VPC CNI 插件
作用: Pod 网络管理
#!/bin/bash
# install-vpc-cni.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
echo "配置 VPC CNI 插件..."
# 获取当前版本
CURRENT_VERSION=$(kubectl get daemonset aws-node -n kube-system \
-o jsonpath='{.spec.template.spec.containers[0].image}' | \
cut -d: -f2)
echo " 当前版本: $CURRENT_VERSION"
# 推荐版本(根据 EKS 版本)
RECOMMENDED_VERSION="v1.15.1-eksbuild.1"
# 更新插件
aws eks update-addon \
--cluster-name $CLUSTER_NAME \
--addon-name vpc-cni \
--addon-version $RECOMMENDED_VERSION \
--region $REGION \
--resolve-conflicts OVERWRITE
echo " ✓ VPC CNI 插件已更新"
# 配置环境变量(可选优化)
kubectl set env daemonset aws-node \
-n kube-system \
ENABLE_PREFIX_DELEGATION=true \
ENABLE_POD_ENI=true \
POD_SECURITY_GROUP_ENFORCING_MODE=standard \
WARM_ENI_TARGET=1 \
WARM_IP_TARGET=5
echo " ✓ VPC CNI 配置已优化"
CoreDNS
#!/bin/bash
# install-coredns.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
echo "配置 CoreDNS..."
# 更新 CoreDNS
aws eks update-addon \
--cluster-name $CLUSTER_NAME \
--addon-name coredns \
--region $REGION \
--resolve-conflicts OVERWRITE
echo " ✓ CoreDNS 已更新"
# 检查 CoreDNS Pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
EBS CSI Driver
#!/bin/bash
# install-ebs-csi-driver.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
echo "安装 EBS CSI Driver..."
# 1. 创建 IRSA 角色
ROLE_NAME="AmazonEKS_EBS_CSI_DriverRole"
cat > ebs-csi-trust-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com",
"oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:sub": "system:serviceaccount:kube-system:ebs-csi-controller-sa"
}
}
}
]
}
EOF
aws iam create-role \
--role-name $ROLE_NAME \
--assume-role-policy-document file://ebs-csi-trust-policy.json
aws iam attach-role-policy \
--role-name $ROLE_NAME \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy
EBS_CSI_ROLE_ARN=$(aws iam get-role --role-name $ROLE_NAME --query 'Role.Arn' --output text)
# 2. 安装插件
aws eks create-addon \
--cluster-name $CLUSTER_NAME \
--addon-name aws-ebs-csi-driver \
--service-account-role-arn $EBS_CSI_ROLE_ARN \
--region $REGION
echo " ✓ EBS CSI Driver 已安装"
# 3. 创建 StorageClass
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
allowVolumeExpansion: true
EOF
echo " ✓ gp3 StorageClass 已创建(默认)"
rm -f ebs-csi-trust-policy.json
AWS Load Balancer Controller
安装脚本
#!/bin/bash
# install-aws-load-balancer-controller.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
echo "================================================"
echo "安装 AWS Load Balancer Controller"
echo "================================================"
# 1. 创建 IAM 策略
echo ""
echo "1. 创建 IAM 策略..."
curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.6.2/docs/install/iam_policy.json
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://iam-policy.json
POLICY_ARN="arn:aws:iam::${ACCOUNT_ID}:policy/AWSLoadBalancerControllerIAMPolicy"
echo " 策略 ARN: $POLICY_ARN"
# 2. 创建 IRSA 角色
echo ""
echo "2. 创建 Service Account 和 IAM 角色..."
eksctl create iamserviceaccount \
--cluster=$CLUSTER_NAME \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn=$POLICY_ARN \
--approve \
--region=$REGION
# 3. 安装 Helm
echo ""
echo "3. 添加 Helm 仓库..."
helm repo add eks https://aws.github.io/eks-charts
helm repo update
# 4. 安装 Controller
echo ""
echo "4. 安装 AWS Load Balancer Controller..."
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=$CLUSTER_NAME \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller \
--set region=$REGION \
--set vpcId=$VPC_ID
echo ""
echo "5. 验证安装..."
kubectl get deployment -n kube-system aws-load-balancer-controller
echo ""
echo "================================================"
echo "AWS Load Balancer Controller 安装完成!"
echo "================================================"
rm -f iam-policy.json
CloudWatch Container Insights
安装 Fluent Bit
#!/bin/bash
# install-fluent-bit.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
echo "安装 Fluent Bit(日志采集)..."
# 下载配置
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | \
sed "s/{{cluster_name}}/$CLUSTER_NAME/;s/{{region_name}}/$REGION/" | \
kubectl apply -f -
echo " ✓ Fluent Bit 已安装"
# 验证
kubectl get daemonset fluent-bit -n amazon-cloudwatch
启用 Container Insights
#!/bin/bash
# enable-container-insights.sh
CLUSTER_NAME="production-eks-cluster"
REGION="us-east-1"
echo "启用 Container Insights..."
aws eks update-cluster-config \
--name $CLUSTER_NAME \
--region $REGION \
--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
# 安装 CloudWatch Agent
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-serviceaccount.yaml
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-configmap.yaml | \
sed "s/{{cluster_name}}/$CLUSTER_NAME/" | \
kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cwagent/cwagent-daemonset.yaml
echo " ✓ Container Insights 已启用"
echo ""
echo "查看日志:AWS Console → CloudWatch → Container Insights"
集群验证
完整验证脚本
#!/bin/bash
# verify-eks-cluster.sh
CLUSTER_NAME="production-eks-cluster"
echo "================================================"
echo "EKS 集群验证"
echo "================================================"
# 1. 集群状态
echo ""
echo "1. 集群状态"
aws eks describe-cluster \
--name $CLUSTER_NAME \
--query 'cluster.{Name:name,Status:status,Version:version,PlatformVersion:platformVersion}'
# 2. 节点状态
echo ""
echo "2. 节点状态"
kubectl get nodes -o wide
# 3. 系统 Pods
echo ""
echo "3. 系统组件状态"
kubectl get pods -n kube-system
# 4. 存储类
echo ""
echo "4. 存储类"
kubectl get storageclass
# 5. Ingress 类
echo ""
echo "5. Ingress 类"
kubectl get ingressclass
# 6. 节点资源
echo ""
echo "6. 节点资源使用"
kubectl top nodes
# 7. 部署测试应用
echo ""
echo "7. 部署测试应用"
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=LoadBalancer
echo " 等待 LoadBalancer..."
sleep 30
kubectl get svc nginx
echo ""
echo "================================================"
echo "验证完成!"
echo ""
echo "清理测试资源:"
echo " kubectl delete svc nginx"
echo " kubectl delete deployment nginx"
echo "================================================"
最佳实践总结
1. 集群配置
✓ 启用所有控制平面日志
✓ 使用混合端点访问(公有+私有)
✓ 限制公有端点访问 IP
✓ 启用 OIDC Provider(IRSA)
✓ 定期更新 Kubernetes 版本
2. 节点配置
✓ 多可用区部署
✓ 使用 Managed Node Groups
✓ 混合 On-Demand 和 Spot
✓ 适当的磁盘大小和类型
✓ 配置节点标签和污点
3. 网络配置
✓ 使用私有子网运行节点
✓ 每 AZ 独立 NAT Gateway
✓ 使用 VPC Endpoints 降低成本
✓ 合理规划 IP 地址
✓ 配置安全组引用
4. 安全配置
✓ 使用 IRSA 代替节点 IAM 角色
✓ 启用加密(etcd、EBS)
✓ 定期轮换凭证
✓ 配置 Pod Security Standards
✓ 启用审计日志
5. 监控配置
✓ 启用 Container Insights
✓ 配置日志采集
✓ 设置告警规则
✓ 监控成本
✓ 定期审查指标
下一步: 继续学习 应用部署和负载均衡 章节。