S3 对象存储

Amazon S3 (Simple Storage Service) 是高度可扩展的对象存储服务。

S3 基础概念

核心概念

  • Bucket(存储桶):存储对象的容器,全局唯一命名
  • Object(对象):存储的数据,最大 5TB
  • Key(键):对象的唯一标识符
  • Metadata(元数据):描述对象的键值对

存储类别

类别 用途 可用性 成本
Standard 频繁访问 99.99%
Intelligent-Tiering 自动优化 99.9%
Standard-IA 不频繁访问 99.9%
One Zone-IA 单可用区 99.5% 更低
Glacier Instant 归档(毫秒检索) 99.9% 很低
Glacier Flexible 归档(分钟到小时) 99.99% 极低
Glacier Deep Archive 长期归档(12小时) 99.99% 最低

创建和管理 Bucket

使用 AWS CLI

# 创建 Bucket
aws s3 mb s3://my-unique-bucket-name-12345 --region us-east-1

# 列出所有 Bucket
aws s3 ls

# 上传文件
aws s3 cp file.txt s3://my-bucket/

# 上传目录
aws s3 cp /local/path s3://my-bucket/path/ --recursive

# 下载文件
aws s3 cp s3://my-bucket/file.txt ./

# 同步目录
aws s3 sync /local/path s3://my-bucket/path/

# 删除文件
aws s3 rm s3://my-bucket/file.txt

# 删除 Bucket(必须为空)
aws s3 rb s3://my-bucket --force

使用 Python Boto3

import boto3
from botocore.exceptions import ClientError

s3 = boto3.client('s3')

# 创建 Bucket
try:
    s3.create_bucket(
        Bucket='my-bucket',
        CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
    )
except ClientError as e:
    print(f"Error: {e}")

# 上传文件
s3.upload_file('local.txt', 'my-bucket', 'remote.txt')

# 上传文件对象
with open('file.txt', 'rb') as f:
    s3.put_object(Bucket='my-bucket', Key='file.txt', Body=f)

# 下载文件
s3.download_file('my-bucket', 'remote.txt', 'local.txt')

# 列出对象
response = s3.list_objects_v2(Bucket='my-bucket', Prefix='path/')
for obj in response.get('Contents', []):
    print(obj['Key'])

# 删除对象
s3.delete_object(Bucket='my-bucket', Key='file.txt')

访问控制

Bucket 策略

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/*"
    },
    {
      "Sid": "RestrictByIP",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ],
      "Condition": {
        "NotIpAddress": {
          "aws:SourceIp": [
            "203.0.113.0/24",
            "192.0.2.0/24"
          ]
        }
      }
    }
  ]
}

应用策略

# 设置 Bucket 策略
aws s3api put-bucket-policy \
  --bucket my-bucket \
  --policy file://bucket-policy.json

# 查看策略
aws s3api get-bucket-policy \
  --bucket my-bucket \
  --query Policy \
  --output text | jq .

# 删除策略
aws s3api delete-bucket-policy --bucket my-bucket

ACL(访问控制列表)

# 设置公开读
aws s3api put-object-acl \
  --bucket my-bucket \
  --key file.txt \
  --acl public-read

# 设置私有
aws s3api put-object-acl \
  --bucket my-bucket \
  --key file.txt \
  --acl private

# 授予特定用户访问
aws s3api put-object-acl \
  --bucket my-bucket \
  --key file.txt \
  --grant-read emailaddress=user@example.com

版本控制

# 启用版本控制
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

# 列出对象版本
aws s3api list-object-versions \
  --bucket my-bucket \
  --prefix file.txt

# 下载特定版本
aws s3api get-object \
  --bucket my-bucket \
  --key file.txt \
  --version-id "version-id" \
  output.txt

# 删除特定版本
aws s3api delete-object \
  --bucket my-bucket \
  --key file.txt \
  --version-id "version-id"

# 暂停版本控制
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Suspended

生命周期策略

{
  "Rules": [
    {
      "Id": "Archive old logs",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    },
    {
      "Id": "Delete old versions",
      "Status": "Enabled",
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "STANDARD_IA"
        }
      ],
      "NoncurrentVersionExpiration": {
        "NoncurrentDays": 90
      }
    }
  ]
}

应用生命周期策略

# 设置生命周期
aws s3api put-bucket-lifecycle-configuration \
  --bucket my-bucket \
  --lifecycle-configuration file://lifecycle.json

# 查看生命周期
aws s3api get-bucket-lifecycle-configuration \
  --bucket my-bucket

静态网站托管

# 启用静态网站托管
aws s3 website s3://my-bucket/ \
  --index-document index.html \
  --error-document error.html

# 或使用 API
aws s3api put-bucket-website \
  --bucket my-bucket \
  --website-configuration '{
    "IndexDocument": {"Suffix": "index.html"},
    "ErrorDocument": {"Key": "error.html"}
  }'

# 上传网站文件
aws s3 sync ./website s3://my-bucket/ \
  --acl public-read \
  --cache-control "max-age=3600"

# 网站 URL
echo "http://my-bucket.s3-website-us-east-1.amazonaws.com"

S3 加密

服务端加密

# SSE-S3(S3 管理密钥)
aws s3 cp file.txt s3://my-bucket/ \
  --server-side-encryption AES256

# SSE-KMS(KMS 管理密钥)
aws s3 cp file.txt s3://my-bucket/ \
  --server-side-encryption aws:kms \
  --ssekms-key-id arn:aws:kms:us-east-1:123456789012:key/12345678

# 启用默认加密
aws s3api put-bucket-encryption \
  --bucket my-bucket \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "AES256"
      },
      "BucketKeyEnabled": true
    }]
  }'

客户端加密

from boto3 import Session
from boto3.s3.transfer import S3Transfer
import boto3

# 使用 KMS 加密
session = Session()
s3 = session.client('s3')

# 上传加密文件
s3.upload_file(
    'file.txt',
    'my-bucket',
    'file.txt',
    ExtraArgs={
        'ServerSideEncryption': 'aws:kms',
        'SSEKMSKeyId': 'arn:aws:kms:us-east-1:123456789012:key/12345678'
    }
)

跨区域复制 (CRR)

{
  "Role": "arn:aws:iam::123456789012:role/s3-replication-role",
  "Rules": [
    {
      "Status": "Enabled",
      "Priority": 1,
      "DeleteMarkerReplication": {
        "Status": "Enabled"
      },
      "Filter": {
        "Prefix": "documents/"
      },
      "Destination": {
        "Bucket": "arn:aws:s3:::destination-bucket",
        "ReplicationTime": {
          "Status": "Enabled",
          "Time": {
            "Minutes": 15
          }
        },
        "Metrics": {
          "Status": "Enabled",
          "EventThreshold": {
            "Minutes": 15
          }
        }
      }
    }
  ]
}
# 启用复制
aws s3api put-bucket-replication \
  --bucket source-bucket \
  --replication-configuration file://replication.json

S3 性能优化

分段上传

import boto3
from boto3.s3.transfer import TransferConfig

s3 = boto3.client('s3')

# 配置分段上传
config = TransferConfig(
    multipart_threshold=1024 * 25,  # 25MB
    max_concurrency=10,
    multipart_chunksize=1024 * 25,
    use_threads=True
)

# 上传大文件
s3.upload_file(
    'large-file.zip',
    'my-bucket',
    'large-file.zip',
    Config=config
)

传输加速

# 启用传输加速
aws s3api put-bucket-accelerate-configuration \
  --bucket my-bucket \
  --accelerate-configuration Status=Enabled

# 使用加速端点
aws s3 cp large-file.zip \
  s3://my-bucket/ \
  --endpoint-url https://s3-accelerate.amazonaws.com

S3 事件通知

{
  "LambdaFunctionConfigurations": [
    {
      "Id": "ProcessImage",
      "LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:ProcessImage",
      "Events": ["s3:ObjectCreated:*"],
      "Filter": {
        "Key": {
          "FilterRules": [
            {
              "Name": "prefix",
              "Value": "images/"
            },
            {
              "Name": "suffix",
              "Value": ".jpg"
            }
          ]
        }
      }
    }
  ],
  "QueueConfigurations": [
    {
      "QueueArn": "arn:aws:sqs:us-east-1:123456789012:my-queue",
      "Events": ["s3:ObjectRemoved:*"]
    }
  ]
}
# 配置事件通知
aws s3api put-bucket-notification-configuration \
  --bucket my-bucket \
  --notification-configuration file://notification.json

成本优化

分析存储类别

# 启用 S3 Storage Lens
aws s3control put-storage-lens-configuration \
  --account-id 123456789012 \
  --config-id default-config \
  --storage-lens-configuration file://storage-lens.json

# 查看存储分析
aws s3api get-bucket-analytics-configuration \
  --bucket my-bucket \
  --id analytics-id

请求成本优化

# 使用 S3 Select 减少数据传输
aws s3api select-object-content \
  --bucket my-bucket \
  --key data.json \
  --expression "SELECT * FROM S3Object[*] WHERE age > 30" \
  --expression-type SQL \
  --input-serialization '{"JSON": {"Type": "DOCUMENT"}}' \
  --output-serialization '{"JSON": {}}' \
  output.json

最佳实践

  1. 使用合适的存储类别 降低成本
  2. 启用版本控制 防止意外删除
  3. 设置生命周期策略 自动归档和删除
  4. 启用服务端加密 保护数据安全
  5. 使用 CloudFront 加速内容分发
  6. 监控访问日志 审计和分析
  7. 使用 S3 Batch Operations 批量处理对象
  8. 合理使用 Bucket 策略 控制访问权限

S3 是 AWS 最基础也是最重要的服务之一!