Summary

The web content outlines a method for minimizing AWS EC2 instance costs using Terraform to automate the starting and stopping of instances based on tags and schedules.

Abstract

The article titled "How to Minimize the Costs of Running AWS EC2 Instances Using Terraform" discusses strategies for reducing expenses associated with AWS infrastructure. It emphasizes the importance of optimizing usage, particularly for testing environments that are often over-provisioned and underutilized outside of regular working hours. The proposed solution involves tagging EC2 instances with Auto-Start and using Terraform to automate their lifecycle, starting them at 6:30 AM and stopping them at 7:00 PM on weekdays. The implementation process includes creating IAM policies and roles, writing a Python script for the Lambda function to manage instance states, and setting up CloudWatch rules to trigger these functions at the specified times. The article argues that automating these tasks with Terraform not only saves money but also makes the infrastructure more manageable, portable, and easy to recreate.

Opinions

The author suggests that manually managing EC2 instances is inefficient and proposes automation as a more effective approach.
It is noted that even after cleaning up unused resources, many EC2 instances are still necessary for development processes, but their usage can be optimized.
The author provides a critique of the manual process required by AWS documentation and advocates for a Terraform-based solution for better scalability and ease of management.
The article conveys that the proposed Terraform solution is not only cost-effective but also aligns with best practices for infrastructure as code (IaC), enhancing the overall efficiency of AWS resource management.

How to Minimize the Costs of Running AWS EC2 Instances Using Terraform

Let’s optimize AWS infrastructure usage and save some money

Introduction

Reducing running costs for a business is always interesting, especially when it comes to the variable costs that depend on the actual usage of the services.

Running AWS EC2 instances 24/7 for hosting and running software applications on testing environments is relatively expensive and the costs for running these instances all the time may exceed the budget allocated to the project.

Cleaning up an AWS account and terminating all unneeded EC2 nodes is the first step in reducing the costs of running EC2 instances. However, it is not the only thing that can be done to reduce costs. In most cases even with the cleanup, many EC2 instances are still needed for the software development process. The exact number of the needed instances is dependent on the project size and the number of contributors or teams on the project.

Most of these instances will be used as testing environments during the development of the software, and it will be heavily used by the engineers during the working hours to deploy new versions of the source code and test the new features of the applications. However, these environments are less likely to be used on the weekend, as well as after work hours. Therefore it makes a loss of sense automatically start and stop these instances and make them available only during the working hours. One proposal for achieving this task is presented below:

Automatic restart can be applied to individual EC2 nodes based on tags. This means that to apply the automatic start and stop on a given EC2 node, the EC2 node needs to be tagged with a specific tag (let us name it is calledAuto-Start) with a specific value.
All EC2 instances tagged with the tag Auto-Start will be started each working day on the morning early at 6:30 AM.
All EC2 instances tagged with the tag Auto-Start will be stopped each working day on the evening at 7:00 PM.

Stopping and starting EC2 instances automatically is well documented on the AWS account and can be implemented by following the instruction provided on this page. However, I have some notes regarding the instruction.

Instruction needs to be performed manually from the AWS console 😢.
Affected EC2 nodes need to be included in the start and stop script, That means if there is a need to extend the solution by including new EC2 nodes, It is necessary to modify these scripts with the ids of the new nodes.

In this post, I will go through the steps needed to implement the proposed solution above and apply it using Terraform.

Implementation

The first step is to create an IAM policy that allows the following actions: Start an EC2 instance, Stop an EC2 instance, and list EC2 instances. This policy can be created with the below terraform resource definition.

The next step is to define an IAM role and attach the created policy in the previous step to the created role. The terraform resource aws_iam_role is used to create the role and assign the services that will use this role using the assume_role_policy. However, it can not be used to attach IAM policies to the roles; for this purpose, we need to use another terraform resource called aws_iam_role_policy_attachment. The snippet below illustrates how to define these resources in Terraform.

The next step is to define the lumbda function that will handle the stop and start of the EC2 instances. But before jumping to define the lumbda function in AWS using terraform, Let us take a minute to illustrate the Python script that can be used for such function. boto3 is an AWS Python client library that can be used to perform actions on AWS. The implemented script should provide a simple interface for stopping and starting EC2 nodes based on the EC2 instances tags. Below is an implementation of a script that defines two functions one for stopping all EC2 instances that tagged with the tagAuto-Start with true value. The other function starts the same list of the EC2 instances.

Now that the Python script for stopping and starting the EC2 instances is ready, we can proceed by creating the lambda function using the below terraform resource definition (The filename is the path to the compressed file of the Python function).

The next step is to define the CloudWatch rules that will trigger the execution of the lambda function defined in the previous step. To implement this, we need to define a rule for each of the cases that we would like to support (two cases one for stop and the other for start). These rules will be triggered based on a Cron expression on a specific time during the day. The below snippet defines the rule for the stop use case and it defines the exact time that the lambda function should be triggered. The same snippet can be modified and used to trigger the start of the EC2 nodes.

The last step needed to grant the permissions to the CloudWatch is to execute the lambda function. This is a necessary step and without it, the CloudWatch will fail to trigger the lambda function.

Below is a complete implementation of the solution proposed by this post:

Conclusion

It is always a good idea to automate as many actions as possible that we usually perform. Terraform is a great tool that helps us in automating infrastructure tasks and actions.

Implementing such solutions with Terraform make the AWS infrastructure more portable, and easy to be nuked and recreated again if needed. On the other hand, optimizing the usage of cloud infrastructure is a necessity to reduce the costs of running the infrastructure.