How to Make CloudFormation Faster

ACM.284 Optimizing for speed within given constraints

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

⚙️ Check out my series on Automating Cybersecurity Metrics | Code.

🔒 Related Stories: Container Security | AWS Security | Application Security

💻 Free Content on Jobs in Cybersecurity | ✉️ Sign up for the Email List

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ron Cogswell https://www.flickr.com/photos/22711505@N05/52094740356 https://creativecommons.org/licenses/by/2.0/

In the last post I wrote about a question someone asked me: How do I get my security team to let me stop using CloudFormation?

How do I convince my security team to let me stop using CloudFormation

ACM.283 What problem are you really trying to solve, what problems does your solution cause, and are there any…

medium.com

In this post, I will let you know some of the things I do to speed up CloudFormation deployments within the bounds of what we have available to us as developers.

Dependency management

One of the things CloudFormation does for you is manage dependencies. It will deploy resources in the correct order regardless of the order you add them to your CloudFormation template.

Parallel resource deployment

The other thing CloudFormation does for you if you have multiple resources in a stack is that it will try to deploy multiple resources at the same time if they do not have dependencies.

Taking control of your deployments

Although the above features are very nice, if you wish to have more control over the speed of your deployments, you can take matters into your own hands a bit more. That is one of the reasons I wrote about why I put single resources in each template. Doing so gives me control over the order of the deployments and I can skip over resources I don’t need to deploy. This is especially important during the development phase where you’re working through errors.

For example, I deploy an entirely new account, new AMIs, new networking, new EC2 instances, S3 buckets, KMS keys, users, and roles for every penetration test. I also install new infrastructure for the resources I use during the penetration test, such as checking to see if I can store an attack in your web application somehow that executes at a later point and time and sends me your data. I need to run resources for that and I want to keep each customer environment separate.

When testing I peform different types of tasks. Some may be more CPU intensive or memory intensive. I may decided to redeploy my instances with a different configuration. Let’s say I’ve deployed my role, my networking, a KMS key, and an S3 bucket. If I have all that in a single CloudFormation stack I have to wait for CloudFormation to check each of those resources before it gets to the last step which is the deployment of my EC2 instances. I also may not want to redeploy all the instances. If I have a Windows and a Linux instance I might only want to redeploy the Windows instance.

To improve the speed of deployments both while troubleshooting and while redeploying something with a different configuration, I create a script which lets me choose things I want to redeploy. That way I can skip over all the things that are already ready to go and focus on the things that I am trying to fix or adjust.

Here’s an example of what I am talking about. Note that I’m on a plane and this is pseudocode given that I’m not logged into my AWS or GitHub account at the moment.

Remember that I have a single function to deploy all my stacks that I’ve been deploying throughout this entire series. Let’s say it looks like this for simplicity and because I can’t even seem to access my public GitHub from the plane at the moment.

#securitymetricsautomation/Functions/shared_functions.sh

#deploy a cloudformation stack
deploy_cfn_stack(){

  resourcetype="$1"
  name="$2"
  template="$3"

  #code deploy the thing using cloudformation

  #check to see if the stack is ready. 
  #- If it is, continue. 
  #- If it is not, then wait. << WAIT TIME IS CONFIGURABLE

}

Controlling Wait Times

For that last comment — wait time is configurable — have you ever seen a stack complete deployment in the console but you are still waiting for the stack to complete? I’ve also noticed where the stack is complete but the console refresh button does not work but completely refreshing the page will show that the stack is complete.

In my function, which you can get from my GitHub repo, periodically checks the stack status and then continues. This is faster than waiting and watching the CloudFormation console in my experience. That wait time is also configurable. You can determine how long, on average, it takes to deploy and decide if you want to wait 3 seconds or one minute. I think I wait 5 seconds in the current iteration of my function.

If you want to get really fancy you could configure different wait times for different types of resources but I have not found that I was in that much of a hurry.

Dependencies

The reason you have to wait for a stack to deploy is that the resources in one stack are dependent on the resources in another stack. Now this is where my recent visit to the AWS Heroes conference got me thinking. You have to wait for a CloudFormation stack to completely deploy if you need to reference the outputs of that stuck using an FN:ImportValue:

Fn::ImportValue

Return a value shared across multiple AWS CloudFormation stacks using the Fn::ImportValue intrinsic function.

docs.aws.amazon.com

What if, instead of the FN:ImportValue, you passed in the value as a parameter? What if the ID you need from a resource in a stack is available even though the CloudFormation stack has not finished deploying? Perhaps you could query for the ID or ARN and continue when it is available instead of waiting for the CloudFormation stack?

I haven’t gone this route because that would require a bunch of extra code and I do not find the wait times so extreme at the moment that I need to go to those lengths. That would likely involve a bunch of querying and error handling when the resource doesn’t exist. I’m not sure how that would work out. Again, I feel like a better approach would be to try to get the word out to AWS to fix this problem and make CloudFormation tell you immediately when you can proceed somehow. But, it’s an option.

AWS CloudFormation also has this concept of a wait condition which you can use as an alternative to the status checking I’m doing manually. You could see if this meets your needs.

wait — AWS CLI 2.13.9 Command Reference

undefined

Parallell Processing

If you have resources with no dependencies you could get even fancier and skip the wait altogether for certain resources. Deploy them at the top of your script and continue by adding a “no wait” into the function parameters I have in my current iteration of my function. I may do that so don’t be surprised if you see that later when I get back to testing full deployment of all resources in a new account.

Parallel processing would be especially useful if you have multi-account and multi-region deployments and resources only need to wait for the dependencies in their own region. But in that case, you could also simply execute the same single script multiple times in multiple regions.

Optional Resource Deployment for Large or Complex Stacks

If that function I created can deploy any stack and I put each resource in it’s own template file, then I can use an if-then construct to control which templates I deploy when I run a script to deploy a stack.

In other words, I can ask if the person running the script wants to deploy all the templates and resources. Alternatively, I can step through the resources and ask the user which ones they want to deploy during the deployment process. If the user knows all by the last resource is good to go, they could simply skip over all those resources and only redeploy the last one.

My if-then script looks something like this. Mind you, I consider my bash scripts are a kind of prototype in my GitHub repository and you could migrate this concept to whatever language you use to deploy your resources. The trick is that your resources each need to be in their own templates.

#Deploy a stack of reources including the following:

# - IAMRolePolicy.yaml
# - KMSKey.yaml
# - SecretsManagerSecret.yaml
# - S3Bucket.yaml
# - VCP.yaml
# - Subnet.yaml
# - SecurityGroup.yaml
# - LinuxEC2Instance.yaml
# - WindowsEC2Instance.yaml

echo "DeployAllResources? (y for yes, CTRL-C to exit)"
read all

if [ "$all" != "y" ]; then

  echo "Deploy IAM Role and policy?"; read y; 
  if [ "$y" == "y"]; then 
    name="PentestRole"
    type="Role"
    template="IamRole.yaml")

    deploy($name $type $template);

    name="PentestRolePolicy"
    type="IAMPolicy"
    template="IamRolePolicy.yaml") 
    deploy($name $type $template);

  fi

  echo "Deploy VPC?"; read y;
  if [ "$y" == "y"]; then 
    name="PentestVPC"
    type="VPC"
    template="VPC.yaml")

    deploy($name $type $template);

    name="PentestSubnet"
    type="Subnet"
    template="Subnet.yaml") 
    deploy($name $type $template);

  fi

  #etc. etc. etc.
  #etc. etc. etc.
  #etc. etc. etc.

  echo "Deploy Linux EC2 instance?"; read y;
  if [ "$y" == "y"]; then
    #deploy the Linux EC2 instance
  fi

  echo "Deploy Windows EC2 instance?"; read y;
  if [ "$y" == "y"]; then
    #deploy the Windows EC2 instance
  fi
fi

Faster troubleshooting

If I use a script like the above for a new penetration test or application or whatever I am trying to deploy, I start with the “deploy everything” option.

Then, when I hit an error along the way because let’s something changed in AWS since I last ran the script, I can step through the script and simply hit the enter key for the resources I don’t want to redeploy until I get to the one I modified and want to try again.

This method makes it faster for me to pinpoint and troubleshoot a problem, compared to a single monolithic script.

Automatically deleting when redeploying

In my code, I have a function that automatically deletes failed resources and when redeploying them. Amazon added some new functionality which may mimic that behavior. I haven’t tried it yet. That will help you speed up your CloudFormation testing and redeployments dramatically in my experience. I was always trying to remember the script or modify some stored script to delete the failed resource. Definitely find a way to automate that process, but protect the things you don’t want to automatically delete with termination protection.

Dumping out error messages

When a CloudFormation command fails on the command line, AWS gives you a command to run to see the stack events. Why they don’t just show you the error right there, I don’t know. I added some functionality to spit out the error message so I don’t have to navigate to the console and dig through the events to find the problem. That will also save a bunch of time when troubleshooting problems.

Things we can’t control

There are certain things we simply can’t control. When you deploy a route for a subnet, it may be that the route doesn’t stabilize. Sometimes there’s an issue and you just have to wait. Using an alternate deployment mechanism is not going to speed that up. That’s up to AWS to find ways to prevent or alert on that problem faster, and it something that I have had take a long time.

If you try to bypass a resource before it has completely deployed and move on and it has a dependency, you might have a messy situation with multiple stack failures if you don’t want long enough. In some cases, patience is a virtue and waiting to be sure a resource is ready to go may be in your best interest. You’ll. figure these things out by testing different types of resources if you are managing your own dependencies.

Here’s an example. If you deploy a particular resource, it may need to propagate to a number of regions. You can probably start using that resource as soon as it is available in your own region. However, if you somehow get a handle to a resource you need and then you deploy something dependent on that resource and it fails, then you’ll have to clean up multiple stacks and roll back.

At the moment, my function simply waits for each resource to deploy before moving onto the next, but I do use the optional deployment mechanism above for the static website deployment I’ve been working on here:

Components of a Static Web Site on AWS

ACM.227 Route 53, TLS, S3, API Gateway, CloudFront, WAF, and triggering Lambda Functions

medium.com

I haven’t checked in that code yet as it’s a bit messy but coming soon.

These are things I’ve been doing for years but never really thought about sharing until I got the question in my last post. Maybe this will help someone speed up their CloudFormation stacks in its current iteration.

And perhaps if you say “Pretty Please” AWS will roll out some changes to speed up CloudFormation as well.

😊

Follow for updates.

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
⭐️ Author: Cybersecurity Books
⭐️ Presentations: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab

Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔒 Request a penetration test or security assessment
🔒 Schedule a consulting call
🔒 Cybersecurity Speaker for Presentation

Follow for more stories like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
❤️ Sign Up my Medium Email List
❤️ Twitter: @teriradichel
❤️ LinkedIn: https://www.linkedin.com/in/teriradichel
❤️ Mastodon: @teriradichel@infosec.exchange
❤️ Facebook: 2nd Sight Lab
❤️ YouTube: @2ndsightlab