S3 对象存储
Amazon S3 (Simple Storage Service) 是高度可扩展的对象存储服务。
S3 基础概念
核心概念
- Bucket(存储桶):存储对象的容器,全局唯一命名
- Object(对象):存储的数据,最大 5TB
- Key(键):对象的唯一标识符
- Metadata(元数据):描述对象的键值对
存储类别
| 类别 | 用途 | 可用性 | 成本 |
|---|---|---|---|
| Standard | 频繁访问 | 99.99% | 高 |
| Intelligent-Tiering | 自动优化 | 99.9% | 中 |
| Standard-IA | 不频繁访问 | 99.9% | 低 |
| One Zone-IA | 单可用区 | 99.5% | 更低 |
| Glacier Instant | 归档(毫秒检索) | 99.9% | 很低 |
| Glacier Flexible | 归档(分钟到小时) | 99.99% | 极低 |
| Glacier Deep Archive | 长期归档(12小时) | 99.99% | 最低 |
创建和管理 Bucket
使用 AWS CLI
# 创建 Bucket
aws s3 mb s3://my-unique-bucket-name-12345 --region us-east-1
# 列出所有 Bucket
aws s3 ls
# 上传文件
aws s3 cp file.txt s3://my-bucket/
# 上传目录
aws s3 cp /local/path s3://my-bucket/path/ --recursive
# 下载文件
aws s3 cp s3://my-bucket/file.txt ./
# 同步目录
aws s3 sync /local/path s3://my-bucket/path/
# 删除文件
aws s3 rm s3://my-bucket/file.txt
# 删除 Bucket(必须为空)
aws s3 rb s3://my-bucket --force
使用 Python Boto3
import boto3
from botocore.exceptions import ClientError
s3 = boto3.client('s3')
# 创建 Bucket
try:
s3.create_bucket(
Bucket='my-bucket',
CreateBucketConfiguration={'LocationConstraint': 'us-west-2'}
)
except ClientError as e:
print(f"Error: {e}")
# 上传文件
s3.upload_file('local.txt', 'my-bucket', 'remote.txt')
# 上传文件对象
with open('file.txt', 'rb') as f:
s3.put_object(Bucket='my-bucket', Key='file.txt', Body=f)
# 下载文件
s3.download_file('my-bucket', 'remote.txt', 'local.txt')
# 列出对象
response = s3.list_objects_v2(Bucket='my-bucket', Prefix='path/')
for obj in response.get('Contents', []):
print(obj['Key'])
# 删除对象
s3.delete_object(Bucket='my-bucket', Key='file.txt')
访问控制
Bucket 策略
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
},
{
"Sid": "RestrictByIP",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
],
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
"203.0.113.0/24",
"192.0.2.0/24"
]
}
}
}
]
}
应用策略
# 设置 Bucket 策略
aws s3api put-bucket-policy \
--bucket my-bucket \
--policy file://bucket-policy.json
# 查看策略
aws s3api get-bucket-policy \
--bucket my-bucket \
--query Policy \
--output text | jq .
# 删除策略
aws s3api delete-bucket-policy --bucket my-bucket
ACL(访问控制列表)
# 设置公开读
aws s3api put-object-acl \
--bucket my-bucket \
--key file.txt \
--acl public-read
# 设置私有
aws s3api put-object-acl \
--bucket my-bucket \
--key file.txt \
--acl private
# 授予特定用户访问
aws s3api put-object-acl \
--bucket my-bucket \
--key file.txt \
--grant-read emailaddress=user@example.com
版本控制
# 启用版本控制
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled
# 列出对象版本
aws s3api list-object-versions \
--bucket my-bucket \
--prefix file.txt
# 下载特定版本
aws s3api get-object \
--bucket my-bucket \
--key file.txt \
--version-id "version-id" \
output.txt
# 删除特定版本
aws s3api delete-object \
--bucket my-bucket \
--key file.txt \
--version-id "version-id"
# 暂停版本控制
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Suspended
生命周期策略
{
"Rules": [
{
"Id": "Archive old logs",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
}
],
"Expiration": {
"Days": 365
}
},
{
"Id": "Delete old versions",
"Status": "Enabled",
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "STANDARD_IA"
}
],
"NoncurrentVersionExpiration": {
"NoncurrentDays": 90
}
}
]
}
应用生命周期策略
# 设置生命周期
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration file://lifecycle.json
# 查看生命周期
aws s3api get-bucket-lifecycle-configuration \
--bucket my-bucket
静态网站托管
# 启用静态网站托管
aws s3 website s3://my-bucket/ \
--index-document index.html \
--error-document error.html
# 或使用 API
aws s3api put-bucket-website \
--bucket my-bucket \
--website-configuration '{
"IndexDocument": {"Suffix": "index.html"},
"ErrorDocument": {"Key": "error.html"}
}'
# 上传网站文件
aws s3 sync ./website s3://my-bucket/ \
--acl public-read \
--cache-control "max-age=3600"
# 网站 URL
echo "http://my-bucket.s3-website-us-east-1.amazonaws.com"
S3 加密
服务端加密
# SSE-S3(S3 管理密钥)
aws s3 cp file.txt s3://my-bucket/ \
--server-side-encryption AES256
# SSE-KMS(KMS 管理密钥)
aws s3 cp file.txt s3://my-bucket/ \
--server-side-encryption aws:kms \
--ssekms-key-id arn:aws:kms:us-east-1:123456789012:key/12345678
# 启用默认加密
aws s3api put-bucket-encryption \
--bucket my-bucket \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
},
"BucketKeyEnabled": true
}]
}'
客户端加密
from boto3 import Session
from boto3.s3.transfer import S3Transfer
import boto3
# 使用 KMS 加密
session = Session()
s3 = session.client('s3')
# 上传加密文件
s3.upload_file(
'file.txt',
'my-bucket',
'file.txt',
ExtraArgs={
'ServerSideEncryption': 'aws:kms',
'SSEKMSKeyId': 'arn:aws:kms:us-east-1:123456789012:key/12345678'
}
)
跨区域复制 (CRR)
{
"Role": "arn:aws:iam::123456789012:role/s3-replication-role",
"Rules": [
{
"Status": "Enabled",
"Priority": 1,
"DeleteMarkerReplication": {
"Status": "Enabled"
},
"Filter": {
"Prefix": "documents/"
},
"Destination": {
"Bucket": "arn:aws:s3:::destination-bucket",
"ReplicationTime": {
"Status": "Enabled",
"Time": {
"Minutes": 15
}
},
"Metrics": {
"Status": "Enabled",
"EventThreshold": {
"Minutes": 15
}
}
}
}
]
}
# 启用复制
aws s3api put-bucket-replication \
--bucket source-bucket \
--replication-configuration file://replication.json
S3 性能优化
分段上传
import boto3
from boto3.s3.transfer import TransferConfig
s3 = boto3.client('s3')
# 配置分段上传
config = TransferConfig(
multipart_threshold=1024 * 25, # 25MB
max_concurrency=10,
multipart_chunksize=1024 * 25,
use_threads=True
)
# 上传大文件
s3.upload_file(
'large-file.zip',
'my-bucket',
'large-file.zip',
Config=config
)
传输加速
# 启用传输加速
aws s3api put-bucket-accelerate-configuration \
--bucket my-bucket \
--accelerate-configuration Status=Enabled
# 使用加速端点
aws s3 cp large-file.zip \
s3://my-bucket/ \
--endpoint-url https://s3-accelerate.amazonaws.com
S3 事件通知
{
"LambdaFunctionConfigurations": [
{
"Id": "ProcessImage",
"LambdaFunctionArn": "arn:aws:lambda:us-east-1:123456789012:function:ProcessImage",
"Events": ["s3:ObjectCreated:*"],
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "prefix",
"Value": "images/"
},
{
"Name": "suffix",
"Value": ".jpg"
}
]
}
}
}
],
"QueueConfigurations": [
{
"QueueArn": "arn:aws:sqs:us-east-1:123456789012:my-queue",
"Events": ["s3:ObjectRemoved:*"]
}
]
}
# 配置事件通知
aws s3api put-bucket-notification-configuration \
--bucket my-bucket \
--notification-configuration file://notification.json
成本优化
分析存储类别
# 启用 S3 Storage Lens
aws s3control put-storage-lens-configuration \
--account-id 123456789012 \
--config-id default-config \
--storage-lens-configuration file://storage-lens.json
# 查看存储分析
aws s3api get-bucket-analytics-configuration \
--bucket my-bucket \
--id analytics-id
请求成本优化
# 使用 S3 Select 减少数据传输
aws s3api select-object-content \
--bucket my-bucket \
--key data.json \
--expression "SELECT * FROM S3Object[*] WHERE age > 30" \
--expression-type SQL \
--input-serialization '{"JSON": {"Type": "DOCUMENT"}}' \
--output-serialization '{"JSON": {}}' \
output.json
最佳实践
- 使用合适的存储类别 降低成本
- 启用版本控制 防止意外删除
- 设置生命周期策略 自动归档和删除
- 启用服务端加密 保护数据安全
- 使用 CloudFront 加速内容分发
- 监控访问日志 审计和分析
- 使用 S3 Batch Operations 批量处理对象
- 合理使用 Bucket 策略 控制访问权限
S3 是 AWS 最基础也是最重要的服务之一!