How to Better Structure AWS S3 Security

How to Better Structure AWS S3 Security

If the new IT intern suggests that you install a publicly accessible web server on your core file server – you might suggest that they be fired.

  

If they give up on that, but instead decide to dump the reports issuing from your highly sensitive data warehouse jobs to your webserver – they’d definitely be fired.

  

But things aren’t always so clear in the brave new world of the cloud – where services like Amazon’s Simple Storage Service (S3), which performs multiple, often overlapping roles in an application stack, is always one click away from exposing your sensitive files online.

  

Cloud storage services are now more than merely “a place to keep a file” – they often serve as both inputs and outputs to more elaborate chains of processes. The end result of all of this is the recent spate of high profile data breaches that have stemmed from S3 buckets.

 
 An S3 Bucket Primer
 
 S3 is one of the core services within AWS. Conceptually, it’s similar to an infinitely large file server at a remote site or a FTP server that you’re connecting to from across the Internet.
 
 However, S3 differs in a few fundamental ways that are important to understand: failing to do so will trip you up and may result in insecure configurations.
 
 S3 is organized around the concepts of Buckets and Objects, instead of servers with files.
 

Buckets are the top level organizational resource within S3 and are always assigned a DNS addressable name. Ex: http://MyCompanyBucket.s3.amazonaws.com

  

This might trick you into thinking of a bucket like a server, where you might create multiple hierarchies within a shared folder for each group that needs access within your organization.

  

Here’s the thing:

 •There’s no cost difference between creating 1 bucket and a dozen

 •By default you’re limited to a 100 buckets, but getting more is as simple as making a support request.

 •There is no performance difference between accessing a 100 files on one bucket or 1 file in a 100 different buckets.

  

With these facts in mind, we need to steal a concept from computer science class: the Single Responsibility Principle.

 
 Within a network, a file server is a general resource typically used by lots of different departments for all kinds of work.
 
 S3 allows you to devote a bucket to each individual application, group or even an individual user doing work. For security (and your sanity as a sysadmin) you want the usage of that bucket to be as narrowly aligned as possible and devoted to a single task.
 
 A significant number of the unintentional data exposure incidents on S3 appear to have been caused by public facing S3 buckets (for websites) that were also (likely accidently) used for the storage of sensitive information.

Sidebar: A warning sign is often found in the bucket naming. Generic, general names like: ‘mycompany’ or ‘data-store’ are asking for trouble. Ideally you should establish a naming convention like: companyname-production/staging/development-applicationname

 
 Bucket Policies
 
 Policies are the top level permission structures for buckets.
 

They define:

 •Who can access a bucket (what users/principals)

 •How they can access it (http only, using MFA) 

•Where they can access it from (a Virtual Private Cloud, specific IP)

 Policies are defined in blocks of JSON that you can either write by hand or use AWS’s Policy Generator – https://awspolicygen.s3.amazonaws.com/policygen.html – to create.

Benefit #1 of organizing your buckets into narrowly defined roles: your bucket policies will be an order of magnitude simpler, since you won’t have to try to puzzle out conflicting policy statements or even just read through (up to 20kb!) of JSON to try and reason out the implications of a change.

  

Example Bucket Policy

 1

 2 

3{ 

4 “Version”: “2012-10-17”, 

5 “Id”: “S3PolicyId1”, 

6 “Statement”: [ 

7 { 

8 “Sid”: “IPAllow”,

9 “Effect”: “Allow”, 

10 “Principal”: “*”, 

11 “Action”: “s3:*”, 

12 “Resource”: “arn:aws:s3:::examplebucket/*”, 

13 “Condition”: { 

14 “IpAddress”: {“aws:SourceIp”: “54.240.143.0/24”}, 

15 “NotIpAddress”: {“aws:SourceIp”: “54.240.143.188/32”} 

16 } 

17 } 

18 ] 

19} 

20 

21

  

Narrow buckets mean simpler policies, which in turn mean less likelihood of accidentally over permissioning users – and unintentionally creating a data breach.

  

Think of Bucket Policies as how the data should be treated.

  

IAM Policies in S3

 
 Identity and Access Management IAM policies, on the other hand, are all about what rights a user/group has to a resource in AWS (not just S3).
 
 You can apply both IAM and Bucket policies simultaneously: access attempts will calculate the least privilege union of the two policies and take action accordingly.
 
 Further Reading: IAM Policies and Bucket Policies and ACLs! Oh, My!
 
 VPC Endpoints in S3
 
 A very powerful, but often underutilized tool in securing AWS services is to divide applications into different logically separated application groups inside of a Virtual Private Cloud.
 
 On a grander scale than simply designating a bucket for a particular purpose, a VPC is a logically separated set of Amazon Web Services (including S3) that can be cordoned off for greater security.
 
 Most of the large databreaches that have surfaced regarding groups using S3 have NOT been website related. Organizations are using a variety of AWS’s tools like RedShift and Quicksite to do analysis of massive amounts of (potentially) sensitive data: analysis, reports and raw data that should not be placed on a public network.
 
 The tool of choice to separate this is AWS’s Virtual Private Cloud. With VPC you can define a set of services that will be unable to connect to the general Internet, and only be accessible via a VPN (IPSEC) connection into the VPC.
 
 Think of a VPN connected VPC as a separate section of your internal network – and resources like S3 within the VPC aren’t publicly addressable:
 •A bot scanning for open buckets won’t be able to see them. 
•Your new data scientist can’t accidently leave a bucket publicly accessible because they were trying to download a report. 
•Day to day users of the services don’t have to try and figure out if their actions will cause chaos and destruction.
 
 Article continues below…
 
 Have you heard our weekly podcast?
 
 Enable S3 Logging
 
 By default, S3 doesn’t maintain access logs for objects (files) in a bucket. On a per bucket basis you can enable access logs to write to another S3 bucket.
 
 http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html

Reviewing access periodically can give you great insight into if your data is being accessed from an unknown location, or in the case of a data breach, how and when exfiltration occurred.

 
 S3 stores raw logs to the logging bucket where you can parse them with a number of different open source tools, like: •https://github.com/adamculp/s3-log-analyzer 
•https://github.com/cboettig/s3-log-parse 
•https://github.com/ogdch/s3-logs-analyzer

More recently, AWS Athena was launched. It’s a new service that lets you directly run SQL queries against structured data sources like JSON, CSV and log files stored in S3.

 
 In Conclusion
 
 AWS S3 is a powerful and extremely useful service that increases the capabilities of IT and application groups. Properly administered, it can be a safe and powerful tool for data storage and as the base of more complex applications.
 
 Steps to keep your data secure on AWS S3: 
1.Review which of your S3 buckets are open to the public internet 
2.Split S3 Buckets to 1 per application or module 
3.Separate concerns with VPC S3 Endpoints 
4.Log everything