Things you wish you didn't need to know about S3
A time travel paradox in the title is a good place to start a blog post, don’t you think? You don’t yet know the things you need to know so you can’t wish you didn’t need to know them. There is a solution though – Read this blog post.
This all started because Plerion is trying to build a comprehensive risk model for the most severe data breaches occurring in AWS environments. At the top of the list is unauthorised access to S3 buckets. Here are some examples in the media.
As with many things AWS security, the more one digs into the details the more oddities one discovers. None of these oddities are are new (or even “vulnerabilities”) but they all get regularly rediscovered with great surprise. I thought I’d make a list to maybe lessen the surprise for people in the future, and probably future me as well.
S3 buckets are the S3 API
Amazon S3 was one of the first services released by AWS so its incredibly robust and well tested. It also means it has a history predating standardised design patterns, resulting in a quirky API relative to other services.
One of those quirks is that a relatively small part of the API requires HTTP requests to be sent to generic S3 endpoints (such as s3.us-east-2.amazonaws.com
), while the vast majority of requests must be sent to the URL of a target bucket. Under the hood it might all be the same but at least to the caller, this is how it works.
For example, to list the contents of a bucket, the HTTP request looks something like:
GET / HTTP/1.1
Host: [bucketname].s3.amazonaws.com
To get the tags associated with a bucket:
GET /?tagging HTTP/1.1
Host: [bucketname].s3.amazonaws.com
...auth headers...
S3 isn’t the only service that works this way, Hosted Cognito UI endpoints do something similar (https://[your-user-pool-domain]/login
).
The majority of services use a generic endpoint. To get the attributes of an EC2 instance, the API endpoint is something like https://ec2.eu-west-2.amazonaws.com/
, or to query a DynamoDB table, the API endpoint is https://dynamodb.us-east-1.amazonaws.com/
. The target resources are typically passed as HTTP headers or parameters rather than through the host.
Because S3 buckets can be both public (accessed without authentication) as well as private (requiring authentication), it’s not always clear which API operations can be used in which way. Sometimes, you have to use the authenticated CLI (or make artisanal custom authenticated requests), and sometimes, you can just make a simple cURL request.
This doesn’t sound sketchy until you start playing stupid games with it. The whole idea broke my personal mental model that all AWS API requests must be authenticated.
This is all a bit abstract, so imagine you were a naughty little admin and put the policy below on a bucket. It’s obviously a bad idea, so don’t actually do it.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::{bucket_name}"
}
]
}
This policy says everyone can perform all S3 operations against a given bucket. I personally hadn’t internalized what this means in practice. And what it means is that the below request actually does delete a bucket. No authentication is required.
% curl -X DELETE \
https://[bucketname].s3-ap-southeast-2.amazonaws.com
Can you think of a genuine use case for this?
In recent years, AWS has done a stellar job confiscating foot guns they previously distributed. Today I just expect this kind of thing not to be possible. It’s even weirder because some S3 operations appear to be footgunless.
<Error>
<Code>AccessDenied</Code>
<Message>s3:GetBucketOwnershipControls does not support Anonymous requests!</Message>
<RequestId>...</RequestId>
<HostId>...</HostId>
</Error>
The obvious consequence of anonymous API requests is anonymous CloudTrail entries. If the request was not authenticated, it’s not possible to identify who deleted a bucket, checked its encryption configuration, checked its logging status, and so on.
"userIdentity": {
"type": "AWSAccount",
"principalId": "",
"accountId": "anonymous"
},
If you’re bored one day, make a test bucket and navigate to some of these in your browser:
- https://[bucketname].s3.amazonaws.com/?logging
- https://[bucketname].s3.amazonaws.com/?tagging
- https://[bucketname].s3.amazonaws.com/?encryption
The documentation for some operations is still online, but it’s no longer possible to execute them (e.g. GetObjectTorrent).
ListObjects is not the only way to get object keys
There are some people in this world who enjoy crimes. Sometimes, those people stumble upon a juicy-looking S3 bucket. They want to download all of its contents but cannot because they need to supply the key of each object, which is like a file path, to download all the objects.
Some crimes are simple. A GET request to the root of a bucket will return a full list of its contents. Therefore, it might appear that denying the s3:ListBucket
operation might be a good way to prevent crimes. It works in the sense that a GET request to the root of a bucket will no longer return a full list of its contents.
To make this more real, consider a bucket with a public-read
ACL but with a deny s3:ListBucket
policy. It’s a bit of an edge case since ACLs are off by default these days, and AWS encourages the use of policies only, but stay with me. In this scenario, there are still at least two ways to get object keys
- GET /?versions – AKA
s3:ListBucketVersions
– Returns metadata about all versions of the objects in a bucket. - GET /?uploads – AKA
s3:ListMultipartUploads
– This operation lists in-progress multipart uploads in a bucket. An in-progress multipart upload is an object upload that has been initiated by as3:CreateMultipartUpload
request, but has not yet been completed or aborted. In-progress uploads eventually get finished and become regular objects.
Some documentation around this is a little bit sketchy. For example, the HeadBucket reference states:
This is incorrect or at least incomplete. The HeadBucket operation just checks if you have access to perform the ListBucket operation.
It’s clarified a bit later in the document:
- General purpose bucket permissions – To use this operation, you must have permissions to perform the
s3:ListBucket
action. The bucket owner has this permission by default and can grant this permission to others. For more information about permissions, see Managing access permissions to your Amazon S3 resources in the Amazon S3 User Guide .
The point is, don’t rely on validating only that ListBucket is denied. We made that mistake in the first versions of our product.
Incomplete multipart uploads are Schrodinger’s objects
Schrodiner’s cat is the one that’s both alive and de-lifed at the same time, right? I’m clearly an expert at physics metaphors, so I’m confident that multipart uploads offer transparent visibility.
Let me explain. Starting a multipart upload is easy.
% aws s3api create-multipart-upload --bucket [bucket-name] --key [key]
{
...
"UploadId": "tzBxqN.33hNvywz2xTxVKMf6Bndv3P42NCnDpP_cSvw.biblp7mTHNYQhbN398kGc_80m.NnwCnft8gcD5z.nl5D818lchDMWFmJo_aXzwktHTLBV9_Ev8XuMKC_8_Yh1yK3.ylkJC.LbTqXKdI3L_9YeOu2g6n4NcrCZg1pK.o-"
}
So is uploading the parts.
aws s3api upload-part \
--bucket [bucket-name] \
--key [key] \
--upload-id tzBxqN.33hNvywz2xTxVKMf6Bndv3P42NCnDpP_cSvw.biblp7mTHNYQhbN398kGc_80m.NnwCnft8gcD5z.nl5D818lchDMWFmJo_aXzwktHTLBV9_Ev8XuMKC_8_Yh1yK3.ylkJC.LbTqXKdI3L_9YeOu2g6n4NcrCZg1pK.o- \
--part-number 1 \
--body somelocalfile.txt
But trying to list those unfinished uploads in the web console appears impossible. At least I couldn’t find a way, and the AWS AI refused to help because I was asking a security question.
If you really desperately want the list, you’ll have to navigate to /?uploads
or use the CLI.
aws s3api list-multipart-uploads --bucket [bucket-name]
The title of this AWS article is a clue as to why that might be annoying.
If the complete multipart upload request isn’t sent successfully, Amazon S3 will not assemble the parts and will not create any object. The parts remain in your Amazon S3 account until the multipart upload completes or is aborted, and you pay for the parts that are stored in Amazon S3.
I couldn’t figure out a way to download parts of not-yet-completed objects, but it’s certainly possible to delete them. AWS recommends applying a lifecycle rule to delete unfinished uploads after a number of days.
Multipart upload listings leak return principal ARNs
You can skip this section if you are of the view that account IDs, ARNs, and other identifiers are not sensitive. I don’t think they are state secrets, but I know they are very useful to attackers, and therefore, I prefer they aren’t made public.
If you care like me, you’ll be sad to know that listing unfinished multipart uploads via “/?uploads” returns the ARN of the principal who initiated the upload.
Access control lists can grant access based on email
Once upon a time in AWS history, AWS accounts were identified not just by their ID but also by the email registered as the root user. Archaeologists have discovered references to emails scattered all over S3 ACL documentation. ACLs are a wild ride.
One of those ancient references is in the PutBucketACL operation, which allows the grantee to be specified by “the person’s ID”, otherwise known as their email address. A side effect of this is that it’s possible to determine if a given email address has a registered AWS account.
s3_client.put_bucket_acl(
Bucket=bucket_name,
AccessControlPolicy={
'Grants': [
{
'Grantee': {
'EmailAddress': 'some@emailtotest.com',
'Type': 'AmazonCustomerByEmail',
},
'Permission': 'READ'
},
],
'Owner': {
'DisplayName': 'Whatever',
'ID': 'c3d78ab5093a9ab8a5184de715d409c2ab5a0e2da66f08c2f6cc5c0bdeadbeef'
}
}
)
The above call will fail with the following error if the email address does not have an AWS account associated with it.
botocore.exceptions.ClientError: An error occurred (UnresolvableGrantByEmailAddress) when calling the PutBucketAcl operation: The e-mail address you provided does not match any account on record.
Storage class is uploader’s choice
Each object in Amazon S3 has a storage class associated with it. A storage class defines the underlying performance characteristics of object operations. It’s pretty awesome. You can pay Amazon more for frequently accessed objects that need to travel at warp speed, or pay less for infrequently accessed objects that you might only need in the event of disaster recovery.
When reading the above it’s easy to miss that the storage class applies to objects rather than buckets. There’s no way to configure a bucket with the storage class you want. By extension that means the principal doing the uploading gets to choose the storage class of the object being uploaded.
aws s3 cp "my.txt" "s3://mybucket/myobject.txt" --storage-class [CLASS]
The security implication of this is that the uploader gets to choose (from a pre-defined list) how much the bucket owner gets charged per GB of storage and access. It’s not huge deal in most scenarios but it is annoying, kind of like getting charged for access errors (now fixed, kudos Aamazon).
There is a simple™️ solution – using an IAM policy with a special condition key. I wonder what percentage of buckets have a policy with this condition key set?
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowStandardAndStandardIAOnly",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::your-bucket-name/*",
"Condition": {
"StringEquals": {
"s3:x-amz-storage-class": ["STANDARD"]
}
}
}
]
}
It’s also possible to set a lifecycle policy that transitions all objects to a given storage class after a period of time.
Some more good news is that a common way to allow uploads from somewhat untrusted sources is to use pre-signed URLs. The AWS Signature Version 4 mechanism requires that all headers starting with X-Amz-
be signed. Storage class is specified via the x-amz-storage-class
header, and so there’s no immediately obvious way to manipulate it unless an application does something truly horrible.
Pretty much everything is uploader’s choice
If I weren’t lazy I’d make up a story that links all of these interesting things together, but I am lazy, so I won’t. Most S3 object-related things are controlled by the uploader, like tags, for example:
aws s3api put-object --bucket [target-bucket] \
--key tags.txt \
--body "makeyourtime.txt" \
--tagging "AllYourTags=AreBelong&To=Us"
If you do any sort of automation based on tag values, you may enjoy the results more than most.
My personal favourite is Object Lock, which allows legal and compliance teams to retain objects for reasons comma legal. It only works if a bucket has been configured with object locking enabled, but it’s too fun not to mention.
aws s3api put-object \
--bucket bucket-with-object-lock-enabled \
--key forever.txt \
--body incriminating-evidence.txt \
--object-lock-retain-until-date "2099-01-01T00:00:00+0000" \
--object-lock-legal-hold-status "ON" \
--object-lock-mode "COMPLIANCE"
Notice the handy information in the soothing blue box.
This next one is for the phishing connoisseurs. If you need an open redirect, just upload a file with this cool trick. For the redirect to work, the bucket has to have static website hosting enabled.
Still bored? Try the full list of headers PutObject supports:
Restrictions apply to pre-signed URLs, but folks relying on Cognito identities with IAM policies have some thinking to do. Potential attackers will be able to sign requests using their authenticated Cognito context.
S3 will tell you the bucket owner if you ask nicely
There’s been some great research on how to enumerate account IDs from S3 buckets. Firstly by Ben Bridts for accessible buckets, and later by Sam Cox for all buckets, including private ones.
However, if you want to check if one specific account ID is the owner of a given accessible bucket, there’s a much simpler way. The ListBucket operation accepts the following header.
If an incorrect account ID is supplied in this header, an access denied error is returned.
% curl -X GET "https://[bucketname].amazonaws.com/" \
-H "x-amz-expected-bucket-owner: 123456789012"
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>NVVHCDK2122RCN6Q</RequestId><HostId>2wOa4z6aWXDhDnctk7//HxHt1edlu9V0R6+9dA6F0YveHLzFwTjUe+buefu5YDz3dSSLSz7hL5E=</HostId></Error>
If the correct account ID is supplied, the ListBucket operation returns as normal, assuming the caller has ListBucket privileges.
% curl -X GET "[bucketname].amazonaws.com/" \
-H "x-amz-expected-bucket-owner: [correct-account-id]"
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">...</ListBucketResult>
The problem with the methods mentioned is that they just don’t ask nicely enough. If you want something from S3, you must be extra courteous! Enter the ListBucket API parameter fetch-owner
.
Here it is in action.
% curl "https://[bucketname].amazonaws.com?fetch-owner=true"
<?xml version="1.0" encoding="UTF-8"?>
<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>[bucketname]</Name>
<Prefix/>
<Marker/>
<MaxKeys>1000</MaxKeys>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>a.txt</Key>
<LastModified>2024-05-27T02:33:26.000Z</LastModified>
<ETag>"0cc175b9c0f1b6a831c399e269772661"</ETag>
<Size>1</Size>
<Owner>
<ID>c3d78ab5093a9ab8a5184de715d409c2ab5a0e2da66f08c2f6cc5c0bdeadbeef</ID>
<DisplayName>tasty.aws.account</DisplayName>
</Owner>
<StorageClass>STANDARD</StorageClass>
</Contents>
</ListBucketResult>
Notice the “ID” inside the “Owner” element is returned with each key. The ID is what’s known as a canonical user ID inside the depths of AWS documentation. It’s a 64 character hex string that is an obfuscated form of the AWS account ID. I don’t know how it’s generated but superhero Aidan Steele pointed out AWS will kindly resolve it to an account ID if it’s placed inside an IAM policy, like so.
"Principal": {"CanonicalUser":"c3d78ab5093a9ab8a5184de715d409c2ab5a0e2da66f08c2f6cc5c0bdeadbeef"},
Save the policy. Refresh it, and voila, a perfectly formed AWS account ID appears.
By the way, ListBucketVersions and ListMultipartUploads behave similarly without the need for fetch-owner.
Keys are case sensitive
This might seem minor, but it’s possible to upload multiple files with the same name, as long as their case doesn’t match.
In isolation this is not a problem but it can quickly become a problem if an application expects files uploaded to S3 to be case insensitive. Allow me a moment to present a contrived example.
Imagine an application where user passwords are stored in files in S3. Each user has their own file named after their username. The sign up process just checks the existence of the file in the s3 bucket. If the file exists the new user can’t sign up with the supplied username. Given that S3 keys are not case sensitive, if “jeff” already has an account, someone can still register “JEFF”.
The real problem is in the password update function of the pretend application. Here, the application converts the user’s username to lower case before writing to their file in the S3 bucket. A bad hacker person can login in as “JEFF” because that’s what they signed up as, update their password (that’s not really theirs), and login in as “jeff” because they just set the other Jeff’s password.
It’s contrived, but bug bounties pay out many dollars for these kinds of issues every year.
There’s even more to worry about. Although object keys look and feel like filenames, they actually don’t behave like files in many ways. Here’s one:
You can use any UTF-8 character in an object key name. However, using certain characters in key names can cause problems with some applications and protocols.
Try it out for yourself. You’ll see that spaces, slashes, percentage characters, and many others are valid in object keys. If you try to be clever and use a dot in a key, AWS has this advice:
More ways to make a bucket public
Back to the origin story of this post – sometimes a bucket is public even if it isn’t public. Know what I mean? Even if your ACLs are turned off, resource policy is tightly scoped, and block public access is enabled, there are ways to make a bucket publicly accessible.
The most common way is with an Amazon CloudFront distribution. Generally if you’re putting a content delivery network in front of an S3 bucket, you probably intend to deliver the content to the internet. However, if there’s one thing we’ve learned from endless S3 data breaches it’s that intent does not always match reality. This is especially true when most security tools will proclaim with confidence that such a bucket is not public.
If a bucket is exposed via Cloudfront, the resource policy will typically be restricted to just Cloudfront and will therefore be flagged as non public.
% aws s3api get-bucket-policy-status --bucket testdgcloudfronts3
{
"PolicyStatus": {
"IsPublic": false
}
}
Sending a request directly to the bucket will result in an access denied error.
% curl https://testdgcloudfronts3.s3.amazonaws.com/
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>MEMW5FV1FZM4RGEH</RequestId>
<HostId>CqPSxMuM1yGhQsQ5T+S6ErzFQsMfAk7sglaF+liLEVNTu8EdYv82E0TDJUoDrGPIGFUwsntYOi/EEh+SZAREOg==</HostId>
</Error>
However, sending the same request to the Cloudfront distribution will naturally return object contents.
% curl https://d1v6ltrik088xf.cloudfront.net/
<html>
<head>
<title>Plerion</title>
<body>Simplify cloud security</body>
</html>
Another fun way to expose a bucket with a restricted resource policy is with Cognito identity pools.
AWS Cognito is a user directory and authentication service. It allows developers to quickly create an authentication experience for an app. Cognito is no different to your favourite AWS services – it is also wildly complicated. However, the thing that matters here is that at the end of a successful login sequence, the user gets temporary AWS credentials for a pre-configured role.
That role has its own permissions and can be used inside resource policies. For example, you could allow the role to s3:ListBucket
and s3:GetObject
a target bucket. Because the user gets temporary AWS credentials, they can then call those APIs successfully. Again, it doesn’t matter if public access is disabled because the specific identity within the account is allowed.
So how do you successfully login? Since I’m claiming this is public access, there are two Cognito configuration options that matter:
- Self-registration – This is exactly what it sounds like. It allows a user to sign up to an app through a typical registration process. If anyone on the internet can sign up and then sign in, that’s effectively public access.
- Guest access – This provides a unique identifier and AWS credentials for unauthenticated users. Everything else is the same except the user doesn’t need to complete authentication.
Configuring Cognito is a blog post all of it’s own, but as an example guest access comes down to this series of API calls.
% aws cognito-identity get-id --identity-pool-id [your-pool-id]
{
"IdentityId": "us-east-1:0d90d259-58ce-ce5f-f6be-69e6fba3a7ec"
}
% aws cognito-identity get-credentials-for-identity \
--identity-id us-east-1:0d90d259-58ce-ce5f-f6be-69e6fba3a7ec
{
"IdentityId": "us-east-1:0d90d259-58ce-ce5f-f6be-69e6fba3a7ec",
"Credentials": {
"AccessKeyId": "ASIA...",
"SecretKey": "...",
"SessionToken": "...",
"Expiration": "2024-05-24T16:38:30+10:00"
}
}
% aws s3 ls testdgcognitos3 --profile guest
2024-05-24 13:16:58 63 index.html
Odds are there are other even more exotic ways to make a bucket public but these are the two options I know for sure are used regularly on the interwebs, yet are rarely flagged by security tools.
If you made it this far, the time paradox has been resolved. Congratulations, you now know the things you wish you didn’t need to know about S3. There are only losers in this game, but at least we’ve all got a participation ribbon to comfort us in moments of angst.