avatarGonzalo Fernandez Plaza

Summary

The provided content is a comprehensive guide on Amazon S3, detailing its features, storage classes, security policies, versioning, encryption methods, website hosting capabilities, CORS, lifecycle rules, event notifications, access logs, access points, MFA-Delete, Requester Pays, and integration with Athena, tailored for the AWS Solutions Architect Associate certification exam.

Abstract

The text offers an in-depth exploration of Amazon S3, an object storage service by AWS, emphasizing its scalability, availability, security, and performance. It outlines S3's use of buckets and objects, various storage classes for different access needs, and security mechanisms including policies and encryption. The guide discusses S3 versioning for data protection, the hosting of static websites, CORS for cross-origin resource sharing, lifecycle rules for automated data management, event notifications for triggering other AWS services, access logs for request tracking, access points for controlled data sharing, MFA-Delete for enhanced security, and the Requester Pays feature to transfer data cost to the requester. It concludes with an introduction to Amazon Athena for SQL querying of S3 data. The content serves as a study resource for individuals preparing for the AWS Solutions Architect Associate certification.

Opinions

  • The author suggests that understanding S3 storage classes is crucial for the AWS certification exam, indicating their significance in the exam's content.
  • The text implies that choosing the appropriate S3 storage class depends on the frequency of data access and the need for rapid retrieval, highlighting the importance of aligning storage solutions with business requirements.
  • The guide recommends enabling S3 versioning as a best practice for added security and data protection, underscoring the feature's utility in preserving object versions.
  • The author expresses a preference for using AWS KMS for server-side encryption, citing its robust key management system, and points out the necessity of HTTPS when using client-side encryption.
  • The content encourages the use of S3 for static website hosting, noting the ease of setup and the need for a public access policy for website availability.
  • The author emphasizes the necessity of configuring CORS for S3 buckets to enable controlled cross-origin data sharing, providing examples to illustrate its operation.
  • The guide highlights the benefits of S3 lifecycle rules for cost savings through automated data tiering and expiration actions for data cleanup.
  • The text underscores the practicality of S3 event notifications in automating workflows by triggering other AWS services like Lambda, SNS, and SQS in response to S3 bucket events.
  • The author advises against storing S3 access logs in the same bucket to prevent infinite logging loops, suggesting a dedicated bucket for logs instead.
  • The content promotes S3 access points as a means to simplify access management to shared datasets, allowing for granular access control within a bucket.
  • The author advocates for the S3 MFA-Delete feature as an additional security layer for critical data, acknowledging the need for AWS CLI or API for its activation.
  • The guide introduces the S3 Requester Pays feature as a cost-effective solution for data sharing, shifting the financial burden of data retrieval from the bucket owner to the requester.
  • The text concludes with an endorsement of Amazon Athena for its serverless nature and ability to run SQL queries directly on S3 data, which is beneficial for business intelligence and analytics.

Amazon S3 Fundamentals — AWS Solutions Architect Associate Course

Chapter 7: Amazon S3 Fundamentals for the AWS Solutions Architect Associate Certification

Amazon Simple Storage Service (Amazon S3) is an Object Storage Service that offers industry-leading scalability, data availability, security, and performance. Let’s dive into it!

Amazon S3 Fundamentals for the AWS Solutions Architect Associate Certification.
  1. S3 Intro
  2. Object Storage Classes
  3. Security & Policies
  4. Versioning
  5. Encryption
  6. S3 Websites
  7. CORS
  8. S3 Lifecycle Rules
  9. S3 Event Notification
  10. S3 Access Logs
  11. S3 Access Points
  12. S3 MFA-Delete
  13. S3 Requester Pays
  14. Athena

Remember that all the chapters from the course can be found in the following link:

Amazon S3 Introduction

In S3, we store objects in buckets, and each object can have a maximum of 5TB. We can understand objects are files in a regular file system, and buckets are directories. Buckets have a globally unique name and are defined at the region level.

The main characteristics of the objects are:

  • Key → The name that you assign to an object. You use the object key to retrieve the object.
  • Value → The content that you are storing.
  • Metadata → A set of name-value pairs with which you can store information regarding the object.
  • Access control information → Control Access to the objects. You can make some users not being able to access it.
  • Version ID → Within a bucket, a key and version ID uniquely identify an object. It is generated by S3 when you add an object to a bucket.

OBJECT STORAGE CLASSES

Amazon S3 offers a range of storage classes designed for different use cases. We must know all of them for the exam, as this is a common question. This relates to how objects are stored, so it is at the object level, not the bucket level. Types of classes:

1. General-Purpose Storage:

  • S3 Standard General purpose → It offers high durability, availability, and performance object storage for frequently accessed data.
  • S3 Standard-Infrequent Access (IA) → It is used for data that is accessed less frequently but requires rapid access when needed. Storing these objects is cheaper.
  • S3 One Zone-Infrequent Access (IA) → Data is accessed less frequently but requires rapid access when needed without replicating the data in at least three AZs. Ideal for customers who want a lower-cost option for infrequently accessed data but do not require so much availability as the previous options.
  • S3 Intelligent Tiering → Automatically moves objects from storage classes so that the user pays less money. It delivers automatic cost savings by moving objects between four access tiers when access patterns change.

2. Glacier: Low cost, Amazon S3 Glacier is a secure cloud storage service for data archiving and long-term backup. The main difference with S3 General Purpose Storage is that if you want to restore files, it will take some time.

  • S3 Glacier → A file must be here for at least 90 days. We have the expedited mode (1–5 minutes to access data), standard mode (3–5 hours to access data), and bulk mode (5–12 hours to access data).
  • S3 Glacier Deep Archive → A file must be here for at least 180 days. We have the standard mode (12 hours to access data) and bulk mode (48 hours to access data).

Which class is the best? This question will be answered depending on the files you want to store and whether or not you are interested in accessing them at the moment. There is no concrete answer, and one or the other will be better for each situation. This is a typical exam question; choose the class that best suits your needs.

Amazon S3 Storage Classes Comparisson Table.

SECURITY & POLICIES

As we saw in the introduction, a policy is an AWS object that defines its permissions when associated with an identity or resource. We have these types:

1. User-based → The ones that we already know. They are attached to an IAM user, group, or role. If this policy doesn’t allow it, the user might not be able to see the bucket or object.

2. Resource-based → JSON Bucket policies. They are applied both to buckets and objects in S3. We can use the Policy Generator to create them. We have the following attributes:

  • Effect → Allow/Deny
  • Principal → Who can do an action over the bucket/object.
  • Action → What the user can do over the bucket/object.
  • Resource → Object/bucket affected.

In this example, everybody can get an object in the bucket.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicRead",
      "Principal": "*",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:GetObjectVersion"],
      "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*",
    }
  ]
}

By default, S3 buckets block all public access to prevent company data leaks, but you can change that if necessary for your application.

VERSIONING

You can activate versioning at the bucket level for more security. Uploading an object with the same key will not overwrite but create a new version. It is good practice to implement. If we want to see the versions of a bucket, we need to click “List Versions”; otherwise, we would only see the last one.

Amazon S3 versioning.

Some considerations:

  • If you stop the versioning of a bucket, the previous versions are kept.
  • If you do versioning when there are already objects, the old objects will have a “null version”.
  • If we delete an object when versioning is enabled, it will not be deleted, but a Delete Marker will be created that will indicate to AWS not to show it so that we can restore it if necessary. If we want to delete the object, we must delete the Delete Marker.
Delete Marker in S3 Versioning.

ENCRYPTION

Data protection refers to protecting data in transit (as it travels to and from Amazon S3) and at rest (while it is stored on disks in Amazon S3 data centers). We have the following ways to apply encryption:

1. Server-side encryption → Server-side encryption is the data encryption at its destination by the application or service that receives it. Therefore, you send the object without encryption, and the server will encrypt it. Types:

  • SSE-S3 (Server Side Encryption) → Encrypt S3 objects using keys managed by AWS. To request server-side encryption using the object creation REST APIs, provide the “x-amz-server-side-encryption” request header.
SSE-S3 encryption on Amazon S3.
  • SSE-KMS (Key Management Service) → AWS Key Management Service (AWS KMS) is a service that combines secure, highly available hardware and software to provide a key management system scaled for the cloud. You can use AWS KMS to encrypt your Amazon S3 objects. The only difference with SSE-S3 is that AWS KMS manages the encryption key.
SSE-KMS encryption on Amazon S3.
  • SSE-C → In this case, the customer provides the encryption keys. The encryption key you provide is part of your request, so AWS doesn’t store any key. HTTPS is mandatory. We can only use it with the CLI. We can find more information at the following link.
SSE-C encryption on Amazon S3.
  • DSSE-KMS (NEW) → Dual-layer server-side encryption with AWS KMS applies two layers of encryption to objects when they are uploaded to Amazon S3.

2. Client-side encryption → Client-side encryption is the act of encrypting data before sending it, in this case, to S3. One way to do it is with the AWS Encryption SDK. The client also has to decrypt it.

Client-side encryption on Amazon S3.

S3 WEBSITES

We can host static websites in S3 and make them accessible online. We can get 403 Access Denied Errors when accessing because the bucket policy does not allow public reads; we only need to modify it by following these steps:

  • Upload the static files to an S3 bucket.
  • Activate “Static Website Hosting” in the bucket properties.
  • Uncheck “Block all public Access”.
  • Write a public access policy; it’s not enough with the previous step.
  • It will generate an endpoint that you can access to see your website.

You can read more about this process at the following link.

Where to configure S3 Static Website Hosting on Amazon S3.

CROSS-ORIGIN RESOURCE SHARING (CORS)

If you access a website, you should not get additional data from third-party servers, as this can be malicious. But there can be exceptions if both website owners agree to cooperate. Cross-Origin Resource Sharing (CORS) regulates this cooperation. CORS is an HTTP-header-based mechanism that allows a server to indicate any origins (domain, scheme, or port) other than its own, from which a browser should permit the loading of resources.

Let’s see an example of how it works with this example from https://lenguajejs.com. Let’s call “domain.com” as Domain A and “otherdomain.com” as Domain B.

  • First example → Domain A tries to make an AJAX request to itself. As it’s the Same Origin, it will work.
  • Second Example → Domain A wants to request Domain B. If we don’t have CORS enabled in Domain B, it will fail
  • Third Example → Domain A wants to request Domain B, which has CORS enabled. In this case, it will work.
CORS Explanation.

The same thing happens with S3 buckets. If a client makes a cross-origin request to our S3 bucket, we need to enable the correct CORS headers in the S3 bucket. To specify all the origins, you can use the symbol ‘*’.

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "PUT",
            "POST",
            "DELETE"
        ],
        "AllowedOrigins": [
            "http://www.example.com"
        ],
        "ExposeHeaders": []
    }
]

S3 LIFECYCLE RULES

You can move objects between Storage classes. This can be done manually or automatically with lifecycle rules. There are two types of actions:

  • Transition actions → They define when objects move from one storage class to another. For example, move an object from S3 Standard to S3 Glacier after 90 days.
  • Expiration actions → You set the objects to be deleted after a certain period.

S3 EVENT NOTIFICATIONS

You can use the Amazon S3 Event Notifications feature to receive notifications when certain events happen in your S3 bucket (object created, deleted, replicated, etc.). This can trigger other services (Lambda, SNS, and SQS). For example, we could create a process that processes the file when an object is created in a bucket using a Lambda function (which we will study later) and creates a thumbnail.

S3 Event Notifications behavior.

S3 ACCESS LOGS

It provides detailed records of the requests that are made to a bucket. So, by enabling these logs, we can save all the requests to a bucket in another bucket. Never make the bucket where you store the logs the same as the app's bucket, as it will create an infinite loop. If a user puts something in the bucket, it will be logged in the same bucket, create another log, and so on ad infinitum.

S3 ACCESS POINTS

With S3 Access Points, customers can create unique access control policies for each access point to easily control access to shared dataset. Each access point will have its own security, then our users can access our access points and connect to the part of the bucket they have access to. Each access point has:

  • Its DNS name
  • Access Point Policy

For example, you can create an access point for your S3 bucket that grants “Marketing users” access to the “Marketing” folder of this bucket.

S3 MFA-DELETE

You can add another layer of security by configuring a bucket to enable MFA (multi-factor authentication) deletion. When you do this, the bucket owner must include two forms of authentication in any request to delete a version or change the versioning state of the bucket.

You cannot enable MFA Delete using the AWS Management Console. You must use the AWS Command Line Interface (AWS CLI) or the API.

S3 REQUESTER PAYS (NEW)

With S3 Requester Pays enabled in a bucket, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. This can be useful when you want to share data but not incur charges associated with others accessing the data.

ATHENA

Amazon Athena is an interactive query service that easily analyzes data in Amazon S3 using standard SQL. Athena is Serverless. It allows you to perform SQL queries directly against S3 files. It’s used in Business Intelligence, Business Analytics, or reporting.

AWS Athena Example.

Thanks for Reading!

And that’s it for the S3 chapter. This is perhaps one of the most extended chapters of the course. If you like my work and want to support me…

  1. The BEST way is to follow me on Medium here.
  2. Feel free to clap if this post is helpful for you! :)
S3
Athena
Exam Preparation
AWS
Recommended from ReadMedium