EC2 实例管理

Amazon Elastic Compute Cloud (EC2) 提供可扩展的云计算能力。

EC2 实例类型

实例系列

系列 用途 特点 示例
T3/T4g 通用 可突增性能,经济实惠 t3.micro, t4g.small
M5/M6i 通用 平衡的计算、内存、网络 m5.large, m6i.xlarge
C5/C6i 计算优化 高性能处理器 c5.2xlarge
R5/R6i 内存优化 大内存应用 r5.xlarge
I3/I4i 存储优化 高 IOPS i3.xlarge
P3/P4 GPU 机器学习 p3.2xlarge

实例大小

t3.nano    → 2 vCPU,  0.5 GB RAM
t3.micro   → 2 vCPU,  1 GB RAM
t3.small   → 2 vCPU,  2 GB RAM
t3.medium  → 2 vCPU,  4 GB RAM
t3.large   → 2 vCPU,  8 GB RAM
t3.xlarge  → 4 vCPU, 16 GB RAM
t3.2xlarge → 8 vCPU, 32 GB RAM

启动 EC2 实例

使用 AWS CLI

# 启动实例
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.micro \
  --key-name my-key \
  --security-group-ids sg-0123456789abcdef0 \
  --subnet-id subnet-0bb1c79de3EXAMPLE \
  --count 1 \
  --user-data file://user-data.sh \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=WebServer},{Key=Environment,Value=Production}]' \
  --block-device-mappings '[
    {
      "DeviceName": "/dev/xvda",
      "Ebs": {
        "VolumeSize": 30,
        "VolumeType": "gp3",
        "DeleteOnTermination": true,
        "Encrypted": true
      }
    }
  ]'

使用 Python Boto3

import boto3

ec2 = boto3.resource('ec2', region_name='us-east-1')

instance = ec2.create_instances(
    ImageId='ami-0c55b159cbfafe1f0',
    InstanceType='t3.micro',
    KeyName='my-key',
    MinCount=1,
    MaxCount=1,
    SecurityGroupIds=['sg-0123456789abcdef0'],
    SubnetId='subnet-0bb1c79de3EXAMPLE',
    UserData=open('user-data.sh').read(),
    TagSpecifications=[{
        'ResourceType': 'instance',
        'Tags': [
            {'Key': 'Name', 'Value': 'WebServer'},
            {'Key': 'Environment', 'Value': 'Production'}
        ]
    }],
    BlockDeviceMappings=[{
        'DeviceName': '/dev/xvda',
        'Ebs': {
            'VolumeSize': 30,
            'VolumeType': 'gp3',
            'DeleteOnTermination': True,
            'Encrypted': True
        }
    }]
)

print(f"Instance ID: {instance[0].id}")

实例生命周期管理

实例状态

pending → running → stopping → stopped → terminating → terminated
              ↓
           rebooting

常用操作

# 查看实例
aws ec2 describe-instances \
  --instance-ids i-1234567890abcdef0

# 启动实例
aws ec2 start-instances --instance-ids i-1234567890abcdef0

# 停止实例
aws ec2 stop-instances --instance-ids i-1234567890abcdef0

# 重启实例
aws ec2 reboot-instances --instance-ids i-1234567890abcdef0

# 终止实例
aws ec2 terminate-instances --instance-ids i-1234567890abcdef0

# 修改实例类型(需先停止)
aws ec2 modify-instance-attribute \
  --instance-id i-1234567890abcdef0 \
  --instance-type '{"Value": "t3.small"}'

EBS 卷管理

创建和附加卷

# 创建 EBS 卷
VOLUME_ID=$(aws ec2 create-volume \
  --availability-zone us-east-1a \
  --size 100 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125 \
  --encrypted \
  --tag-specifications 'ResourceType=volume,Tags=[{Key=Name,Value=data-volume}]' \
  --query 'VolumeId' \
  --output text)

# 等待卷可用
aws ec2 wait volume-available --volume-ids $VOLUME_ID

# 附加到实例
aws ec2 attach-volume \
  --volume-id $VOLUME_ID \
  --instance-id i-1234567890abcdef0 \
  --device /dev/sdf

# SSH 到实例并挂载
ssh ec2-user@instance-ip << 'EOF'
  # 格式化(仅首次)
  sudo mkfs -t ext4 /dev/xvdf
  
  # 挂载
  sudo mkdir /data
  sudo mount /dev/xvdf /data
  
  # 永久挂载
  echo '/dev/xvdf /data ext4 defaults,nofail 0 2' | sudo tee -a /etc/fstab
EOF

快照管理

# 创建快照
SNAPSHOT_ID=$(aws ec2 create-snapshot \
  --volume-id $VOLUME_ID \
  --description "Backup of data volume" \
  --tag-specifications 'ResourceType=snapshot,Tags=[{Key=Name,Value=data-backup}]' \
  --query 'SnapshotId' \
  --output text)

# 等待快照完成
aws ec2 wait snapshot-completed --snapshot-ids $SNAPSHOT_ID

# 从快照恢复卷
NEW_VOLUME_ID=$(aws ec2 create-volume \
  --snapshot-id $SNAPSHOT_ID \
  --availability-zone us-east-1a \
  --query 'VolumeId' \
  --output text)

# 自动快照(使用 Data Lifecycle Manager)
aws dlm create-lifecycle-policy \
  --execution-role-arn arn:aws:iam::123456789012:role/DLMRole \
  --description "Daily EBS snapshots" \
  --state ENABLED \
  --policy-details '{
    "PolicyType": "EBS_SNAPSHOT_MANAGEMENT",
    "ResourceTypes": ["VOLUME"],
    "TargetTags": [{"Key": "Backup", "Value": "true"}],
    "Schedules": [{
      "Name": "DailySnapshots",
      "CreateRule": {
        "Interval": 24,
        "IntervalUnit": "HOURS",
        "Times": ["03:00"]
      },
      "RetainRule": {
        "Count": 7
      }
    }]
  }'

Auto Scaling

创建启动模板

# 创建启动模板
aws ec2 create-launch-template \
  --launch-template-name my-template \
  --version-description "v1.0" \
  --launch-template-data '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.micro",
    "KeyName": "my-key",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "UserData": "IyEvYmluL2Jhc2gKZWNobyAiSGVsbG8gV29ybGQi",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "AutoScaled"}]
    }]
  }'

# 创建 Auto Scaling 组
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name my-asg \
  --launch-template LaunchTemplateName=my-template,Version='$Latest' \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 2 \
  --vpc-zone-identifier "subnet-1,subnet-2" \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --tags "Key=Environment,Value=Production,PropagateAtLaunch=true"

# 创建扩展策略(基于 CPU)
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name my-asg \
  --policy-name scale-out \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 70.0
  }'

网络配置

弹性 IP

# 分配弹性 IP
ALLOCATION_ID=$(aws ec2 allocate-address \
  --domain vpc \
  --query 'AllocationId' \
  --output text)

# 关联到实例
aws ec2 associate-address \
  --instance-id i-1234567890abcdef0 \
  --allocation-id $ALLOCATION_ID

# 取消关联
aws ec2 disassociate-address --association-id eipassoc-12345678

# 释放
aws ec2 release-address --allocation-id $ALLOCATION_ID

弹性网卡 (ENI)

# 创建网卡
ENI_ID=$(aws ec2 create-network-interface \
  --subnet-id subnet-0bb1c79de3EXAMPLE \
  --description "Secondary network interface" \
  --groups sg-0123456789abcdef0 \
  --query 'NetworkInterface.NetworkInterfaceId' \
  --output text)

# 附加到实例
aws ec2 attach-network-interface \
  --network-interface-id $ENI_ID \
  --instance-id i-1234567890abcdef0 \
  --device-index 1

实例元数据

访问元数据服务

# IMDSv2(推荐)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# 获取实例 ID
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

# 获取可用区
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/placement/availability-zone

# 获取 IAM 角色凭证
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/

成本优化

Spot 实例

# 请求 Spot 实例
aws ec2 request-spot-instances \
  --spot-price "0.05" \
  --instance-count 2 \
  --type "one-time" \
  --launch-specification '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.medium",
    "KeyName": "my-key",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "SubnetId": "subnet-0bb1c79de3EXAMPLE"
  }'

# 查看 Spot 价格历史
aws ec2 describe-spot-price-history \
  --instance-types t3.medium \
  --start-time $(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%S) \
  --product-descriptions "Linux/UNIX"

预留实例

# 查看可用的预留实例
aws ec2 describe-reserved-instances-offerings \
  --instance-type t3.medium \
  --product-description "Linux/UNIX" \
  --offering-class standard

# 购买预留实例
aws ec2 purchase-reserved-instances-offering \
  --reserved-instances-offering-id 12345678-1234-1234-1234-123456789012 \
  --instance-count 1

最佳实践

  1. 使用 IMDSv2 防止 SSRF 攻击
  2. 启用终止保护 防止误删除
  3. 定期创建快照 备份关键数据
  4. 使用 Systems Manager 管理实例
  5. 标签管理 方便成本分配和资源管理
  6. 监控实例 使用 CloudWatch
  7. 自动化部署 使用 User Data 或 Systems Manager
  8. 安全组最小化 只开放必要端口

故障排查

实例无法启动

# 查看状态检查
aws ec2 describe-instance-status \
  --instance-ids i-1234567890abcdef0

# 查看系统日志
aws ec2 get-console-output \
  --instance-id i-1234567890abcdef0

连接问题

# 测试安全组规则
aws ec2 describe-security-groups \
  --group-ids sg-0123456789abcdef0

# 使用 Session Manager(无需 SSH)
aws ssm start-session --target i-1234567890abcdef0

EC2 是 AWS 的核心服务,掌握它是运维的基础!