Cloud Security Architecture — Batch Jobs

ACM.49 Security and Application Architecture for running batch jobs (A work in progress…)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

⚙️ Part of my series on Automating Cybersecurity Metrics. The Code.

🔒 Related Stories: AWS Security | DevOps | Cloud Security Architecture

💻 Free Content on Jobs in Cybersecurity | ✉️ Sign up for the Email List

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In the last post we manually created a Lambda function to get a feel for its moving parts and looked at how we can keep credentials out of GitHub when implementing Lambda functions.

Keeping Credentials Out of GitHub

ACM.48: Manually creating a Lambda function to retrieve secrets from secrets manager

medium.com

Once again we’re going to step back and look at the bigger picture — the overall architecture of our solution — in this post.

This is a place to track the components we create as we work through our cybersecurity metrics automation architecture in this series. These architecture diagrams are not complete and will be updated as I add new resources and components. Follow along by signing up for the email list to get the quickest updates, or follow me on Twitter, LinkedIn or the GitHub repo in this post. Also check out this post for an explanation and the evolution of what we are building:

Automating Cybersecurity Metrics (ACM)

A series of blog posts on cybersecurity metrics and security automation

medium.com

This post highlights what I already explained before. Security architecture is not a checklist.

Security Architecture is Not A Checklist

ACM.14 Think like an attacker and architect accordingly

medium.com

Although I’m creating a specific architecture in this series the concepts apply to cloud security architecture in general. The same controls and approach could be used for other types of applications deployed in the cloud. Some of the controls used in this architecture are universal — IAM, encryption, and networking.

A generic batch job architecture

Notice that I’m not calling this an AWS architecture because as I mentioned earlier each of the three major cloud providers have a Batch service. I’ve also used batch jobs outside of cloud environments so you could use containers to achieve a similar architecture on-premises — you’d just have to do a lot of extra work to secure the containers and an orchestration environment like Kubernetes. Cloud providers manage some of that behind the scenes when using services that offer batch jobs and functions.

What are the components of our security and application architecture?

IAM roles, users, groups and policies
Resource Policies
A KMS key and a key policy to encrypt batch job credentials
A KMS Access key and secret key with MFA associated
A Secret to store our credentials, encrypted with our encryption key
Lambda functions to handle batch job triggers and authentication
The actual Batch jobs and related components
Data storage (such as S3 buckets) and related encryption keys
Network security controls

Test script to build all the resources below

There is a test script in the root of the GitHub repo that should create all the components below. I recommend running the test script in a test account with no naming conflicts or restrictions.

You’ll need to follow the instructions for adding MFA to users when required and configuring the correct AWS CLI profiles as explained in the series. The test script in the GitHub repo pauses and refers you to the information to set this up when required.

Identity and Access Management

Identity and Access Management (IAM) is the management of users allowed to access the cloud and the permissions that define the actions they are allowed take in the cloud. Identities generally represent a single person and security best practice ensures that each identity has their own credentials and can be independently identified in logs. In other words, you don’t create a user name and password and give those credentials to six people, because then if you need to investigate a security incident you can’t tell which person took the action.

As explained in the series we are not using AWS SSO four our batch job credentials since it doesn’t currently support what we are going to do. However, you could have an AWS SSO user who assigns their MFA device to a set of batch job automation credentials just for the purpose of kicking off batch jobs.

IAM Users

The first thing to do in any cloud account is to login as the admin user and create other users. The code in the GitHub repository has separate scripts for creating the initial IAM user. After that, commands can be executed using the permissions assigned to the IAM user.

Next the IAM user can create users in the account. In our case, we are creating users directly in the account. Some organizations may have users that exist in a separate directory and users authenticate to a third-party identity provider (IdP) but to keep it simple here we’re creating users in the cloud directory. IdPs and federation are outside the scope of this example.

IAM Groups and Policies

Best practice is to create and apply policies to groups, not individual users so we crated some groups for that will have separate permissions. A new group are created when you need to apply a different set of permissions to a group of people. You also might create different groups if you need to give different people permission to manage the group.

After creating the groups we did two things:

Add a policy to each group that allows the group to assign a group role. Groups are also allowed to assign other roles such as batch job roles.
Add users to the groups.

IAM Roles and Policies

Next we created IAM roles. IAM roles define a set of permissions. Users and services on AWS can assume roles and then take actions allowed in that permission set.

Why do we need groups and roles? First, AWS services can only assume roles. They can’t be a part of groups. We just looked at how we could allow a Lambda function to assume a role. We are also going to use roles from within our Batch jobs in a way that we cannot with the policies assigned to a group.

Encryption keys, key policies, automation credentials, and secrets

Encryption keys should be created before things that need to be encrypted are created. On AWS, an encryption key policy can limit who can take what actions can be taken with that encryption key (encrypt or decrypt).

We can created a KMS key policy to protect batch job credential which we will use because we want to require MFA to start a batch job. We create the credentials and a policy that specifies which AMI identities can encrypt and decrypt the credentials. We then created the credentials with the IAM Admin role and stored it in Secrets Manager (a secrets vault). Then we tested decrypting secrets in secrets manager using a Lambda role that is authorized to decrypt the credentials. We created a Lambda function to trigger our batch job and tested our role to access the batch job credentials.

Encrypt has an asterisk* next to it because as explained because order to encrypt the credentials a principal needs encrypt, not decrypt permissions. That means you cannot segregate your encrypt and decrypt permissions cleanly in an AWS policy. However, we restricted the ability to put secrets into Secrets manager to the IAM role and the permission to retrieve them to the Lambda role. That means that even though the IAM Admins have decrypt permissions, they can’t retrieve the secret to decrypt it.

I understand how AWS justifies this implementation. However, this seems like a flaw in the design of KMS policies because the customer is not trying to define a policy to encrypt or decrypt the data key. The customer is trying to define a policy around who can encrypt or decrypt their own data that the encryption key protects. Functionality related to envelope encryption should part of the behind-the-scenes implementation. It will be difficult to fix now due to backward compatibility but AWS could offer two versions like they did with EC2 classic and try to get people to move over to the corrected implementation over time.

The code to create batch admin credentials, secrets, and keys above was refactored to allow creation of multiple sets of credentials because you might have multiple people managing different types of batch jobs.

Zero Trust Policies

Different types of policies exist on AWS — IAM Policies, Trust Policies, and Resource Policies. The posts into this series dive into some of the details of what these different types of policies are and how to create them. The series covers topics such as restricting access to certain CloudFormation stacks and batch jobs, limiting who can access specific resources such as KMS keys and and what resources a principal can access (two sides to the equation). In the last section we looked at how policies can help with segregation of duties. In addition we covered the confused deputy attack and how that applies to trust policies.

In general, every policy we build will be as close to a zero trust policy as possible as explained in this post. That is easier to do on a cloud platform than in an on-premises environment and too few people take advantage of this capability. I’m trying to show you in this series that it can be done, and where the platforms could improve to make it easier.

Creating Zero Trust AWS Policies

ACM.36: Tools and techniques to create zero trust resource, IAM, and Trust policies on AWS (Zero Trust Policies ~ Part…

medium.com

I explained why you might want a separate IAM team due to all this complexity. Separate the people who give the permissions (create the policies) from those who use the permissions.

Why You Should Consider Separate IAM Administrators

AC.25 Refactoring IAM for centralized management to reduce the cost of a data breach

medium.com

Batch Job Trigger

To trigger our batch job we need a mechanism to get the MFA required to assume a role. The following outlines the steps in the process.

Networking

As I started to develop the Lambda functions I realized I needed to dive a bit more into networking and think through that part of the architecture. I have a detailed post explaining the thought process that went into the diagrams below coming soon.

Developer Networking (No VPN — single developer, GitHub Enterprise Cloud account, VPC Endpoint for CloudFormation)

If we add more developers we’ll probably want to deploy a VPN so we have a CIDR block that we can use in our GitHub network restriction rules.

As mentioned it would be nice if we could set up a Private Link to GitHub but looks like that option does not yet exist. It looks like GitLab is exploring an option that offers AWS Private Link.

GitLab Dedicated | GitLab

GitLab Dedicated is a fully isolated, single-tenant SaaS service that is: Hosted and managed by GitLab, Inc. Deployed…

docs.gitlab.com

Organizations might also opt to deploy source control locally within their own private network.

We may have a service that we want to service up web pages to the Internet. We can use a load balancer for that purpose. We can put the Load Balancer in a public subnet and the service that it serves up in a private subnet.

If we have any resources that need outbound Internet access through a NAT we can configure subnets in our Batch VPC like this (not sure if we need this yet):

We can use a completely private subnet for anything that does not require the above options and use AWS Private Link to access AWS services on the AWS network (so traffic does not need to traverse the Internet.)

I’ve already explained how to create some of the above network architecture.

Another topic covered in my posts is segregation of duties for secrets use and secrets management. I cover a design that would require a three-party collusion to access secrets inappropriately. The IAM policy architecture looks something like this:

As we create and change policies to add additional functionality, it is important to maintain the integrity of the above design. Someone may come along later and see a need to give the IAM Admin permissions to “get” a secret in a secret policy — but it should not provide access to the user-specific secrets for which we created the above design or our architecture is then flawed.

Follow for updates.

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
⭐️ Author: Cybersecurity Books
⭐️ Presentations: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab

Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔒 Request a penetration test or security assessment
🔒 Schedule a consulting call
🔒 Cybersecurity Speaker for Presentation

Follow for more stories like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
❤️ Sign Up my Medium Email List
❤️ Twitter: @teriradichel
❤️ LinkedIn: https://www.linkedin.com/in/teriradichel
❤️ Mastodon: @teriradichel@infosec.exchange
❤️ Facebook: 2nd Sight Lab
❤️ YouTube: @2ndsightlab

Summarize

Cloud Security Architecture — Batch Jobs

ACM.49 Security and Application Architecture for running batch jobs (A work in progress…)

Keeping Credentials Out of GitHub

ACM.48: Manually creating a Lambda function to retrieve secrets from secrets manager

Automating Cybersecurity Metrics (ACM)

A series of blog posts on cybersecurity metrics and security automation

Security Architecture is Not A Checklist

ACM.14 Think like an attacker and architect accordingly

Creating Zero Trust AWS Policies

ACM.36: Tools and techniques to create zero trust resource, IAM, and Trust policies on AWS (Zero Trust Policies ~ Part…

Why You Should Consider Separate IAM Administrators

AC.25 Refactoring IAM for centralized management to reduce the cost of a data breach

GitLab Dedicated | GitLab

GitLab Dedicated is a fully isolated, single-tenant SaaS service that is: Hosted and managed by GitLab, Inc. Deployed…