avatarFabrizio De Cicco

Summary

The article discusses the implementation of Terraform and Terragrunt to manage cloud infrastructure efficiently, detailing the transition from manual deployment to an automated CI/CD pipeline using Azure DevOps, which addresses challenges such as secret management and module dependency.

Abstract

The article outlines the journey of transitioning from a manual and unsustainable cloud infrastructure deployment process to a streamlined, efficient system using Terraform and Terragrunt. It highlights the adoption of a mono-repository strategy for Infrastructure as Code (IaC), the integration of Azure Pipelines for automation, and the centralization of secret management. The implementation includes a detailed breakdown of the CI/CD pipeline stages, the use of Terragrunt for managing Terraform modules, and the enforcement of code quality and validation checks. The author emphasizes the benefits of this approach, such as improved code maintainability, automated deployment processes, and enhanced security through centralized secret management. The article also provides insights into the technical details of the pipeline configuration, including the use of templates and scripts for Terragrunt commands, and the incorporation of manual validation steps for non-development environments to ensure infrastructure consistency.

Opinions

  • The author conveys a positive opinion on the use of Terragrunt as a wrapper for Terraform, highlighting its ability to simplify IaC management and provide additional features like remote state management and dependency handling.
  • There is a clear endorsement of the mono-repository approach for IaC, as it simplifies the management of shared configurations and propagates changes across various components.
  • The author values the principle of simplicity in pipeline design, which is achieved through the use of concurrent jobs and streamlined stages.
  • The article expresses the importance of maintaining code quality and adhering to standards, as evidenced by the inclusion of format validation and automated testing in the pipeline.
  • The author emphasizes the significance of manual validation steps in the deployment process, particularly for production environments, to ensure that the infrastructure remains consistent and secure.
  • The use of Azure DevOps environments and exclusive locks is seen as a critical feature for preventing concurrent deployments and maintaining infrastructure stability.
  • The author suggests a forward-looking perspective by mentioning potential improvements, such as incorporating Terratest for testing and Terrascan for security and compliance scanning.
  • The article concludes with an open invitation for feedback and shared experiences, indicating the author's commitment to community engagement and continuous improvement in IaC practices.

How Terraform and Terragrunt Simplified Our Cloud Infrastructure

Find out how we solved the problems of manual deployment, secret management, and module dependency with Terraform and Terragrunt

Introduction

Our infrastructure is Azure-based, and we leverage Terragrunt, a powerful wrapper for Terraform, to manage our cloud resources efficiently. Previously, our deployment process was a manual affair, conducted locally and proving to be unsustainable at scale. The management of secrets posed another hurdle, lacking a centralized system such as a dedicated vault, leading to a scattered array of secrets.tf files.

The introduction of an Azure Pipeline into our workflow marked a transformative chapter in our journey. It promised not only to scale our deployment processes but also to centralize secret management into Azure DevOps, thereby aligning our operations with a single source of truth.

In this article, I will share the technical details of our implementation and how we solved the challenges that we faced, enhancing our cloud infrastructure management.

Mono-Repository

With a multitude of products under our belt, we found solace in a single Infrastructure-as-Code (IaC) repository approach. This mono-repository strategy simplifies the management of shared configuration values, making it significantly easier to propagate changes across various components. The following is a sample of how our repository is structured.

.
├── README.md
├── azure-pipelines.yml
├── pipelines
│   ├── scripts
│   │   ├── initialize.sh
│   │   ├── terragrunt_exec.sh
│   │   └── terragrunt_init.sh
│   └── templates
│       └── terragrunt-command.yml
├── terraform
│   └── modules
│       ├── product-x
│       │   ├── README.md
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   └── variables.tf
│       ├── product-y
│       │   ├── README.md
│       │   ├── main.tf
│       │   ├── outputs.tf
│       │   ├── variables.tf
│       └── product-z
│           ├── README.md
│           ├── main.tf
│           ├── outputs.tf
│           └── variables.tf
└── terragrunt
    ├── config.yml
    ├── dev
    │   ├── product-x
    │   │   └── terragrunt.hcl
    │   ├── product-y
    │   │   └── terragrunt.hcl
    │   └── product-z
    │       └── terragrunt.hcl
    ├── production
    │   ├── product-x
    │   │   └── terragrunt.hcl
    │   ├── product-y
    │   │   └── terragrunt.hcl
    │   └── product-z
    │       └── terragrunt.hcl
    ├── terragrunt.hcl
    └── test
        ├── product-x
        │   └── terragrunt.hcl
        ├── product-y
        │   └── terragrunt.hcl
        └── product-z
            └── terragrunt.hcl

Here is a brief explanation of each file and directory:

  • README.md: it contains the documentation for our repo, such as the purpose, usage, and requirements of our infra-as-code project.
  • azure-pipelines.yml: it defines the CI/CD pipeline for our infra-as-code project, such as the stages, jobs, and tasks that are executed to deploy our infrastructure (we’ll deep dive into it later in the article).
  • pipelines: this directory contains the scripts and templates that are used by our pipeline, such as the initialize.sh script that initializes the remote state for each environment, the terragrunt_exec.sh and terragrunt_init.sh scripts that run the Terragrunt commands, and the terragrunt-command.yml template that defines the parameters and inputs for the Terragrunt commands.
  • terraform: this directory contains the Terraform modules that define the resources and configurations for our infrastructure, such as the product-x, product-y, and product-z modules that create the resources for each product. Each module has a README.md file that describes the module, a main.tf file that contains the Terraform code, an outputs.tf file that defines the outputs of the module, and a variables.tf file that defines the variables of the module (there are more .tf files in our repo, but we only show the typical ones for illustration).
  • terragrunt: this directory contains the Terragrunt configurations that manage the Terraform modules, such as the config.yml file that defines the common variables for all the environments, the dev, production, and test directories that contain the environment-specific configurations, for each product. Each configuration has a terragrunt.hcl file that specifies the Terraform module source, the remote state settings, the dependencies, the inputs, and the hooks for the Terragrunt commands.

Azure Pipelines Integration

Adhering to the principle of simplicity, our pipeline is streamlined into two fundamental stages: Integration and Deployment. At the beginning of the pipeline, we define some variables that can be used in any part of the code.

  • terraform_cloud: a variable group that contains shared variables for Terraform and Terragrunt configurations.
  • environments: a comma-separated list of environments (dev, test, production) where the pipeline will run.
  • tf_vars: a list of Terraform variable names that must match the names in the variable groups and variable.tf files. They can be either secrets (set as sensitive values in the Azure DevOps variable group) or plain-text.

Integration

This stage is crucial for maintaining code quality and ensuring that the IaC adheres to the required standards before proceeding to the deployment stage. The YAML code below defines our Azure Pipeline integration stage.

- stage: integration
  displayName: Integration
  jobs:
    - job: format_validation
      displayName: Format validation of all configurations
      dependsOn: []
      steps:
        - script: |
            export BOLD_GREEN="\033[1;32m"
            terraform fmt -check -recursive |& tee /tmp/terraform_format_check.log
            terraform_format_check_log=$(cat /tmp/terraform_format_check.log)
            if [[ $terraform_format_check_log == *"terraform"* ]]; then
              echo "##vso[task.logissue type=warning]There were one or more errors when checking the format of your Terraform configuration files. Please check the logs." && \ 
              echo "##vso[task.complete result=SucceededWithIssues;]"
            else
              echo -e "${BOLD_GREEN}All Terraform configurations are correctly formatted."        
            fi
          displayName: Check Terraform format
        - template: pipelines/templates/terragrunt-command.yml
          parameters:
            command: hclfmt
            commandParameters: --terragrunt-check
            terragruntVersion: $(terragrunt_version)
            workingDirectory: terragrunt
    - job: terraform_validation
      displayName: Terraform validation of all configurations
      dependsOn: []
      steps:
        - script: |
            set -euo pipefail
            for module in $(ls)
            do
              echo "##[group]$module"
              (cd $module && terraform init -backend=false && terraform validate)
              echo "##[endgroup]"
            done
          workingDirectory: terraform/modules
          displayName: Validate Terraform configurations
    - ${{ each env in split(variables.environments, ',') }}:
        - job: terragrunt_validation_${{ env }}
          displayName: Terragrunt input validation of all ${{ env }} configurations
          dependsOn: []
          variables:
            - group: terraform_cloud_${{ env }}
            - name: az_service_connection
              value: tfcloud_${{ env }}
          steps:
            - task: AzureCLI@2
              displayName: Initialize remote state # if not already initialized
              env:
                env: ${{ env }}
                location: $(az_remote_state_location)
                storageAccountName: tfcloudsa${{ env }}
                containerName: $(az_remote_state_container_name)
              inputs:
                azureSubscription: ${{ variables.az_service_connection }}
                scriptType: bash
                scriptLocation: scriptPath
                scriptPath: pipelines/scripts/initialize.sh
            - template: pipelines/templates/terragrunt-command.yml
              parameters:
                azServiceConnection: ${{ variables.az_service_connection }}
                command: validate-inputs
                commandParameters: --terragrunt-log-level error
                terragruntVersion: $(terragrunt_version)
                tfVars: ${{ variables.tf_vars }}
                workingDirectory: terragrunt/${{ env }}

The dependsOn: [] attribute signifies that the jobs are self-contained, with no interdependencies, allowing them to execute concurrently and thus enhance the overall efficiency of the pipeline. Here’s a breakdown of its components:

  • Format validation of all configurations: it checks the format of the Terraform and Terragrunt configuration files using the terraform fmt and hclfmt commands, respectively. It uses a script and the template terragrunt-command.yml to execute these commands, and logs any errors or warnings that occur.
  • Terraform validation of all configurations: it validates the Terraform configuration files in the terraform/modules directory using the terraform init and terraform validatecommands. It uses a script to loop through each module and run the commands, and groups the output by module name. The use of set -euo pipefail in the validation script ensures that the script will exit immediately if a command exits with a non-zero status, which is a good practice for error handling in shell scripts.
  • Terragrunt input validation of all configurations: it validates the Terragrunt input variables for each environment specified in the variables.environments list. It uses the template terragrunt-command.yml to execute the validate-inputs command with some parameters, such as the terragruntVersion and the tfVars. It also sets some variables and inputs for the Azure CLI task, such as the az_service_connection and the scriptPath. The script initializes the remote state for each environment using the initialize.sh script, shown below as a reference.
#!/usr/bin/env bash

resourceGroup="terraform-$location-$env"

echo "Initializing remote state storage..."

if [ $(az group exists --name $resourceGroup) = false ]; then
  echo "Creating resource group for remote state..."
  az group create -n $resourceGroup -l $location
  echo "Resource group \"$resourceGroup\" created."
else
  echo "Resource group \"$resourceGroup\" already exists."
fi

isAvailable=$(az storage account check-name -n $storageAccountName --query "nameAvailable" -o tsv) 
if [ $isAvailable = true ]; then
  echo "Creating storage account for remote state..."
  az storage account create -n $storageAccountName -g $resourceGroup --sku Standard_RAGRS -l $location 
  echo "Storage account \"$storageAccountName\" created."
else
  echo "Storage account \"$storageAccountName\" already exists."
fi

storageAccountKey=$(az storage account keys list -n $storageAccountName --query "[0].value" -o tsv)  
isContainerExist=$(az storage container exists -n $containerName --account-key $storageAccountKey --account-name $storageAccountName --query "exists" -o tsv)

if [ $isContainerExist = false ]; then
  echo "Creating storage container for remote state..."
  az storage container create -n $containerName --account-name $storageAccountName --account-key $storageAccountKey --public-access off
  echo "Storage container \"$containerName\" created."
else
  echo "Storage container \"$containerName\" already exists."
fi

echo "Remote state initialization completed."

Deployment

The YAML code below defines our deployment stage for each environment specified in the variables.environments list, except when the Build.Reason is PullRequest (where only the integration stage is performed as a validation of the build).

- ${{ if ne(variables['Build.Reason'], 'PullRequest') }}:
      - ${{ each env in split(variables.environments, ',') }}:
          - stage: deploy_${{ env }}
            displayName: Deployment of ${{ env }}
            dependsOn: integration
            variables:
              - group: terraform_cloud_${{ env }}
              - name: az_service_connection
                value: tfcloud_${{ env }}
            jobs:
              - job: terragrunt_plan_${{ env }}
                displayName: Terragrunt plan of all ${{ env }} configurations
                dependsOn: []
                steps:
                  - template: pipelines/templates/terragrunt-command.yml
                    parameters:
                      azServiceConnection: ${{ variables.az_service_connection }}
                      command: run-all plan # run-all to ensure module dependencies are met
                      terragruntVersion: $(terragrunt_version)
                      tfVars: ${{ variables.tf_vars }}
                      workingDirectory: terragrunt/${{ env }}
              - ${{ if ne(env, 'dev') }}:
                - job: plan_validation_${{ env }}
                  displayName: Manual validation of all ${{ env }} plans
                  dependsOn: terragrunt_plan_${{ env }}
                  pool: server # reserved keyword which indicates this is an agentless job, required for ManualValidation@0 task
                  steps:
                  - task: ManualValidation@0
                    displayName: Request manual validation of all ${{ env }} plans
                    timeoutInMinutes: 30
                    inputs:
                      notifyUsers: |
                        [email protected]
                        [email protected]
                      instructions: >-
                        To validate all the plans before applying,
                        check the job “Terragrunt plan of all ${{ env }} configurations”
                        in the stage “Deployment of ${{ env }}”
              - deployment: terragrunt_apply_${{ env }}
                displayName: Terragrunt apply of all ${{ env }} configurations
                ${{ if ne(env, 'dev') }}:
                  dependsOn: plan_validation_${{ env }}
                ${{ else }}:
                  dependsOn: terragrunt_plan_${{ env }}
                environment: terraform-cloud-${{ env }}
                strategy:
                  runOnce:
                    deploy:
                      steps:
                        - checkout: self
                        - template: pipelines/templates/terragrunt-command.yml
                          parameters:
                            azServiceConnection: ${{ variables.az_service_connection }}
                            command: run-all apply # run-all to ensure module dependencies are met
                            commandParameters: --terragrunt-non-interactive
                            terragruntVersion: $(terragrunt_version)
                            tfVars: ${{ variables.tf_vars }}
                            workingDirectory: terragrunt/${{ env }}

The deployment stage depends on the integration stage, and uses the variables from the terraform_${{ env }} group. The deployment stage consists of three jobs:

  • Terragrunt plan of all configurations: it runs the run-all plan command for all the Terragrunt product-specific configurations in the environment-related directory, using the template terragrunt-command.yml. This command generates a plan for each module and shows the changes that will be applied by the run-all apply command. We use the run-all command to automatically follow all Terragrunt dependencies blocks, instead of managing each module and its dependencies separately.
  • Manual validation of all plans: it is only executed for non-dev environments, and it depends on the previous job. It is an agentless job that uses the ManualValidation@0 task to request a manual approval from the users specified in the notifyUsers input. The users have 30 minutes to validate the plans before applying them. This step, along with the exclusive lock on the Azure DevOps environment, guarantees that the infrastructure remains consistent from the plan to the apply job, by blocking any other deployments until the approval or rejection of the current one!
  • Terragrunt apply of all configurations: it is a deployment job that applies the changes to the target environment, which is specified by the environment property. It depends on the previous job, or the plan job if the environment is dev. It uses the same template and command as the plan job, but with the --terragrunt-non-interactive parameter to avoid any prompts. It also checks out the source repo before running the command.

Terragrunt Command Template

Since the Azure Pipelines agent does not include Terragrunt, you need to install it before using it. The YAML code below shows the template terragrunt-command.yml that enables flexible and “Don’t Repeat Yourself” (DRY) executions across the pipeline.

parameters:
- name: azServiceConnection
  type: string
  default: ''
- name: command
  type: string
- name: commandParameters
  type: string
  default: ''
- name: terragruntVersion
  type: string
- name: tfVars
  type: string
  default: ''
- name: workingDirectory
  type: string

steps:
- task: Bash@3
  displayName: Install Terragrunt
  env:
    terragruntversion: ${{ parameters.terragruntVersion }}
  inputs:
    targetType: filePath
    filePath: pipelines/scripts/terragrunt_init.sh
- ${{ if ne(parameters.azServiceConnection, '') }}: # remote state required
  - task: AzureCLI@2
    displayName: Execute Terragrunt ${{ parameters.command }} command
    env:
      workingdir: ${{ parameters.workingDirectory }}
      command: ${{ parameters.command }}
      parameters: ${{ parameters.commandParameters }}
      cloudEnabled: true
      SYSTEM_ACCESSTOKEN: $(System.AccessToken)
      ${{ each tf_var in split(replace(parameters.tfVars,' ',''), ',') }}:
        TF_VAR_${{ tf_var }}: ${{ replace('$(tf_var_placeholder)','tf_var_placeholder',tf_var) }} # $(tf_var_placeholder) -> $(tf_var) e.g. $(db_connection_string)
    inputs:
      azureSubscription: ${{ parameters.azServiceConnection }}
      addSpnToEnvironment: true
      scriptType: bash
      scriptLocation: scriptPath
      scriptPath: pipelines/scripts/terragrunt_exec.sh
- ${{ else }}: # local command
  - task: Bash@3
    displayName: Execute Terragrunt ${{ parameters.command }} command
    env:
      workingdir: ${{ parameters.workingDirectory }}
      command: ${{ parameters.command }}
      parameters: ${{ parameters.commandParameters }}
      cloudEnabled: false
    inputs:
      targetType: filePath
      filePath: pipelines/scripts/terragrunt_exec.sh

The template consists of three steps:

  • The first task uses Bash@3 to install Terragrunt, referencing the terragruntVersion parameter and executing the terragrunt_init.sh script from the pipelines/scripts directory, shown below as a reference.
#!/usr/bin/env bash

set -euo pipefail

mkdir -p terragrunt
cd terragrunt

# Download terragrunt
curl -SL "https://github.com/gruntwork-io/terragrunt/releases/download/v$terragruntversion/terragrunt_linux_amd64" --output terragrunt   

# Copy the file to /usr/local/bin so we don't have to specify the full path
sudo cp -a terragrunt /usr/local/bin

# Make the file executable
chmod +x /usr/local/bin/terragrunt

# Test if it works by printing out the version of terragrunt
terragrunt --version
  • The second task conditionally executes if azServiceConnection is not empty, indicating that a remote state is required. It uses AzureCLI@2 to run the specified Terragrunt command with the provided parameters and environment variables. It also sets up Terraform variables (TF_VAR_) dynamically based on the tfVars parameter.
  • The third task is for local command execution if azServiceConnection is empty. It uses Bash@3 to execute the terragrunt_exec.sh script with the specified command and parameters, shown below as a reference.
#!/usr/bin/env bash

set -euo pipefail

export BOLD_GREEN="\033[1;32m"
if [ $cloudEnabled = true ]; then 
    export ARM_CLIENT_ID=$servicePrincipalId
    export ARM_CLIENT_SECRET=$servicePrincipalKey      
    export ARM_SUBSCRIPTION_ID=$(az account show --query 'id' -o tsv)
    export ARM_TENANT_ID=$tenantId
    export AZDO_ORG_SERVICE_URL=$SYSTEM_TEAMFOUNDATIONSERVERURI
    export AZDO_PERSONAL_ACCESS_TOKEN=$SYSTEM_ACCESSTOKEN
fi

terragrunt="terragrunt"
command="$command"
terragrunt_working_dir="--terragrunt-working-dir $workingdir"
parameters="$parameters"

if [ $command = 'hclfmt' ]; then
    echo "##[command]$terragrunt $command $terragrunt_working_dir $parameters"
    eval ${terragrunt} ${command} ${terragrunt_working_dir} ${parameters} |& tee /tmp/terragrunt_format_check.log
    terragrunt_format_check_log=$(cat /tmp/terragrunt_format_check.log)
    if [[ $terragrunt_format_check_log == *"error occurred"* ]]; then
        echo "##vso[task.logissue type=warning]There were one or more errors when checking the format of your Terragrunt configurations. Please check the logs." && \ 
        echo "##vso[task.complete result=SucceededWithIssues;]"
    else
        echo -e "${BOLD_GREEN}All Terragrunt configurations are correctly formatted."
    fi
elif [ $command = 'validate-inputs' ]; then
    for module in $(ls $workingdir)
    do
        echo "##[group]$module"
        terragrunt_working_dir="--terragrunt-working-dir $workingdir/$module"
        echo "##[command]$terragrunt $command $terragrunt_working_dir $parameters"
        eval ${terragrunt} ${command} ${terragrunt_working_dir} ${parameters}
        echo -e "${BOLD_GREEN}All required inputs are passed in by terragrunt."
        echo "##[endgroup]"
    done
else
 echo "##[command]$terragrunt $command $terragrunt_working_dir $parameters"
    eval ${terragrunt} ${command} ${terragrunt_working_dir} ${parameters}
fi

Conclusion

In this article, we have explored how to use Terragrunt to enhance Terraform’s functionality and simplify its workflow, by providing features such as remote state management, DRY configurations, dependency management, configuration inheritance, environment-specific configurations, locking mechanism, secrets management, and integration with CI/CD pipelines, on Azure DevOps.

There are many potential ways to improve or expand our implementation, such as using Terratest for testing our Terraform code, or using Terrascan for security and compliance scanning.

I hope that this article has given you some insights and inspiration for managing your cloud infrastructure with Terraform and Terragrunt. If you have any questions, feedback, or experiences to share, please feel free to leave a comment below.

Photo by Howie R on Unsplash
DevOps
Terraform
Terragrunt
Azure
Infrastructure As Code
Recommended from ReadMedium