How Terraform and Terragrunt Simplified Our Cloud Infrastructure
Find out how we solved the problems of manual deployment, secret management, and module dependency with Terraform and Terragrunt
Introduction
Our infrastructure is Azure-based, and we leverage Terragrunt, a powerful wrapper for Terraform, to manage our cloud resources efficiently. Previously, our deployment process was a manual affair, conducted locally and proving to be unsustainable at scale. The management of secrets posed another hurdle, lacking a centralized system such as a dedicated vault, leading to a scattered array of secrets.tf
files.
The introduction of an Azure Pipeline into our workflow marked a transformative chapter in our journey. It promised not only to scale our deployment processes but also to centralize secret management into Azure DevOps, thereby aligning our operations with a single source of truth.
In this article, I will share the technical details of our implementation and how we solved the challenges that we faced, enhancing our cloud infrastructure management.

Mono-Repository
With a multitude of products under our belt, we found solace in a single Infrastructure-as-Code (IaC) repository approach. This mono-repository strategy simplifies the management of shared configuration values, making it significantly easier to propagate changes across various components. The following is a sample of how our repository is structured.
. ├── README.md ├── azure-pipelines.yml ├── pipelines │ ├── scripts │ │ ├── initialize.sh │ │ ├── terragrunt_exec.sh │ │ └── terragrunt_init.sh │ └── templates │ └── terragrunt-command.yml ├── terraform │ └── modules │ ├── product-x │ │ ├── README.md │ │ ├── main.tf │ │ ├── outputs.tf │ │ └── variables.tf │ ├── product-y │ │ ├── README.md │ │ ├── main.tf │ │ ├── outputs.tf │ │ ├── variables.tf │ └── product-z │ ├── README.md │ ├── main.tf │ ├── outputs.tf │ └── variables.tf └── terragrunt ├── config.yml ├── dev │ ├── product-x │ │ └── terragrunt.hcl │ ├── product-y │ │ └── terragrunt.hcl │ └── product-z │ └── terragrunt.hcl ├── production │ ├── product-x │ │ └── terragrunt.hcl │ ├── product-y │ │ └── terragrunt.hcl │ └── product-z │ └── terragrunt.hcl ├── terragrunt.hcl └── test ├── product-x │ └── terragrunt.hcl ├── product-y │ └── terragrunt.hcl └── product-z └── terragrunt.hcl
Here is a brief explanation of each file and directory:
README.md
: it contains the documentation for our repo, such as the purpose, usage, and requirements of our infra-as-code project.azure-pipelines.yml
: it defines the CI/CD pipeline for our infra-as-code project, such as the stages, jobs, and tasks that are executed to deploy our infrastructure (we’ll deep dive into it later in the article).pipelines
: this directory contains the scripts and templates that are used by our pipeline, such as theinitialize.sh
script that initializes the remote state for each environment, theterragrunt_exec.sh
andterragrunt_init.sh
scripts that run the Terragrunt commands, and theterragrunt-command.yml
template that defines the parameters and inputs for the Terragrunt commands.terraform
: this directory contains the Terraform modules that define the resources and configurations for our infrastructure, such as theproduct-x
,product-y
, andproduct-z
modules that create the resources for each product. Each module has aREADME.md
file that describes the module, amain.tf
file that contains the Terraform code, anoutputs.tf
file that defines the outputs of the module, and avariables.tf
file that defines the variables of the module (there are more.tf
files in our repo, but we only show the typical ones for illustration).terragrunt
: this directory contains the Terragrunt configurations that manage the Terraform modules, such as theconfig.yml
file that defines the common variables for all the environments, thedev
,production
, andtest
directories that contain the environment-specific configurations, for each product. Each configuration has aterragrunt.hcl
file that specifies the Terraform module source, the remote state settings, the dependencies, the inputs, and the hooks for the Terragrunt commands.
Azure Pipelines Integration

Adhering to the principle of simplicity, our pipeline is streamlined into two fundamental stages: Integration and Deployment. At the beginning of the pipeline, we define some variables that can be used in any part of the code.
terraform_cloud
: a variable group that contains shared variables for Terraform and Terragrunt configurations.environments
: a comma-separated list of environments (dev
,test
,production
) where the pipeline will run.tf_vars
: a list of Terraform variable names that must match the names in the variable groups andvariable.tf
files. They can be either secrets (set as sensitive values in the Azure DevOps variable group) or plain-text.
Integration
This stage is crucial for maintaining code quality and ensuring that the IaC adheres to the required standards before proceeding to the deployment stage. The YAML code below defines our Azure Pipeline integration stage.
- stage: integration
displayName: Integration
jobs:
- job: format_validation
displayName: Format validation of all configurations
dependsOn: []
steps:
- script: |
export BOLD_GREEN="\033[1;32m"
terraform fmt -check -recursive |& tee /tmp/terraform_format_check.log
terraform_format_check_log=$(cat /tmp/terraform_format_check.log)
if [[ $terraform_format_check_log == *"terraform"* ]]; then
echo "##vso[task.logissue type=warning]There were one or more errors when checking the format of your Terraform configuration files. Please check the logs." && \
echo "##vso[task.complete result=SucceededWithIssues;]"
else
echo -e "${BOLD_GREEN}All Terraform configurations are correctly formatted."
fi
displayName: Check Terraform format
- template: pipelines/templates/terragrunt-command.yml
parameters:
command: hclfmt
commandParameters: --terragrunt-check
terragruntVersion: $(terragrunt_version)
workingDirectory: terragrunt
- job: terraform_validation
displayName: Terraform validation of all configurations
dependsOn: []
steps:
- script: |
set -euo pipefail
for module in $(ls)
do
echo "##[group]$module"
(cd $module && terraform init -backend=false && terraform validate)
echo "##[endgroup]"
done
workingDirectory: terraform/modules
displayName: Validate Terraform configurations
- ${{ each env in split(variables.environments, ',') }}:
- job: terragrunt_validation_${{ env }}
displayName: Terragrunt input validation of all ${{ env }} configurations
dependsOn: []
variables:
- group: terraform_cloud_${{ env }}
- name: az_service_connection
value: tfcloud_${{ env }}
steps:
- task: AzureCLI@2
displayName: Initialize remote state # if not already initialized
env:
env: ${{ env }}
location: $(az_remote_state_location)
storageAccountName: tfcloudsa${{ env }}
containerName: $(az_remote_state_container_name)
inputs:
azureSubscription: ${{ variables.az_service_connection }}
scriptType: bash
scriptLocation: scriptPath
scriptPath: pipelines/scripts/initialize.sh
- template: pipelines/templates/terragrunt-command.yml
parameters:
azServiceConnection: ${{ variables.az_service_connection }}
command: validate-inputs
commandParameters: --terragrunt-log-level error
terragruntVersion: $(terragrunt_version)
tfVars: ${{ variables.tf_vars }}
workingDirectory: terragrunt/${{ env }}
The dependsOn: []
attribute signifies that the jobs are self-contained, with no interdependencies, allowing them to execute concurrently and thus enhance the overall efficiency of the pipeline. Here’s a breakdown of its components:
- Format validation of all configurations: it checks the format of the Terraform and Terragrunt configuration files using the
terraform fmt
andhclfmt
commands, respectively. It uses a script and the templateterragrunt-command.yml
to execute these commands, and logs any errors or warnings that occur. - Terraform validation of all configurations: it validates the Terraform configuration files in the
terraform/modules
directory using theterraform init
andterraform validate
commands. It uses a script to loop through each module and run the commands, and groups the output by module name. The use ofset -euo pipefail
in the validation script ensures that the script will exit immediately if a command exits with a non-zero status, which is a good practice for error handling in shell scripts. - Terragrunt input validation of all configurations: it validates the Terragrunt input variables for each environment specified in the
variables.environments
list. It uses the templateterragrunt-command.yml
to execute thevalidate-inputs
command with some parameters, such as theterragruntVersion
and thetfVars
. It also sets some variables and inputs for the Azure CLI task, such as theaz_service_connection
and thescriptPath
. The script initializes the remote state for each environment using theinitialize.sh
script, shown below as a reference.
#!/usr/bin/env bash
resourceGroup="terraform-$location-$env"
echo "Initializing remote state storage..."
if [ $(az group exists --name $resourceGroup) = false ]; then
echo "Creating resource group for remote state..."
az group create -n $resourceGroup -l $location
echo "Resource group \"$resourceGroup\" created."
else
echo "Resource group \"$resourceGroup\" already exists."
fi
isAvailable=$(az storage account check-name -n $storageAccountName --query "nameAvailable" -o tsv)
if [ $isAvailable = true ]; then
echo "Creating storage account for remote state..."
az storage account create -n $storageAccountName -g $resourceGroup --sku Standard_RAGRS -l $location
echo "Storage account \"$storageAccountName\" created."
else
echo "Storage account \"$storageAccountName\" already exists."
fi
storageAccountKey=$(az storage account keys list -n $storageAccountName --query "[0].value" -o tsv)
isContainerExist=$(az storage container exists -n $containerName --account-key $storageAccountKey --account-name $storageAccountName --query "exists" -o tsv)
if [ $isContainerExist = false ]; then
echo "Creating storage container for remote state..."
az storage container create -n $containerName --account-name $storageAccountName --account-key $storageAccountKey --public-access off
echo "Storage container \"$containerName\" created."
else
echo "Storage container \"$containerName\" already exists."
fi
echo "Remote state initialization completed."
Deployment
The YAML code below defines our deployment stage for each environment specified in the variables.environments
list, except when the Build.Reason
is PullRequest
(where only the integration stage is performed as a validation of the build).
- ${{ if ne(variables['Build.Reason'], 'PullRequest') }}:
- ${{ each env in split(variables.environments, ',') }}:
- stage: deploy_${{ env }}
displayName: Deployment of ${{ env }}
dependsOn: integration
variables:
- group: terraform_cloud_${{ env }}
- name: az_service_connection
value: tfcloud_${{ env }}
jobs:
- job: terragrunt_plan_${{ env }}
displayName: Terragrunt plan of all ${{ env }} configurations
dependsOn: []
steps:
- template: pipelines/templates/terragrunt-command.yml
parameters:
azServiceConnection: ${{ variables.az_service_connection }}
command: run-all plan # run-all to ensure module dependencies are met
terragruntVersion: $(terragrunt_version)
tfVars: ${{ variables.tf_vars }}
workingDirectory: terragrunt/${{ env }}
- ${{ if ne(env, 'dev') }}:
- job: plan_validation_${{ env }}
displayName: Manual validation of all ${{ env }} plans
dependsOn: terragrunt_plan_${{ env }}
pool: server # reserved keyword which indicates this is an agentless job, required for ManualValidation@0 task
steps:
- task: ManualValidation@0
displayName: Request manual validation of all ${{ env }} plans
timeoutInMinutes: 30
inputs:
notifyUsers: |
[email protected]
[email protected]
instructions: >-
To validate all the plans before applying,
check the job “Terragrunt plan of all ${{ env }} configurations”
in the stage “Deployment of ${{ env }}”
- deployment: terragrunt_apply_${{ env }}
displayName: Terragrunt apply of all ${{ env }} configurations
${{ if ne(env, 'dev') }}:
dependsOn: plan_validation_${{ env }}
${{ else }}:
dependsOn: terragrunt_plan_${{ env }}
environment: terraform-cloud-${{ env }}
strategy:
runOnce:
deploy:
steps:
- checkout: self
- template: pipelines/templates/terragrunt-command.yml
parameters:
azServiceConnection: ${{ variables.az_service_connection }}
command: run-all apply # run-all to ensure module dependencies are met
commandParameters: --terragrunt-non-interactive
terragruntVersion: $(terragrunt_version)
tfVars: ${{ variables.tf_vars }}
workingDirectory: terragrunt/${{ env }}
The deployment stage depends on the integration stage, and uses the variables from the terraform_${{ env }}
group. The deployment stage consists of three jobs:
- Terragrunt plan of all configurations: it runs the
run-all plan
command for all the Terragrunt product-specific configurations in the environment-related directory, using the templateterragrunt-command.yml
. This command generates a plan for each module and shows the changes that will be applied by therun-all apply
command. We use therun-all
command to automatically follow all Terragrunt dependencies blocks, instead of managing each module and its dependencies separately. - Manual validation of all plans: it is only executed for non-dev environments, and it depends on the previous job. It is an agentless job that uses the
ManualValidation@0
task to request a manual approval from the users specified in thenotifyUsers
input. The users have 30 minutes to validate the plans before applying them. This step, along with the exclusive lock on the Azure DevOps environment, guarantees that the infrastructure remains consistent from the plan to the apply job, by blocking any other deployments until the approval or rejection of the current one! - Terragrunt apply of all configurations: it is a deployment job that applies the changes to the target environment, which is specified by the
environment
property. It depends on the previous job, or the plan job if the environment is dev. It uses the same template and command as the plan job, but with the--terragrunt-non-interactive
parameter to avoid any prompts. It also checks out the source repo before running the command.
Terragrunt Command Template
Since the Azure Pipelines agent does not include Terragrunt, you need to install it before using it. The YAML code below shows the template terragrunt-command.yml
that enables flexible and “Don’t Repeat Yourself” (DRY) executions across the pipeline.
parameters:
- name: azServiceConnection
type: string
default: ''
- name: command
type: string
- name: commandParameters
type: string
default: ''
- name: terragruntVersion
type: string
- name: tfVars
type: string
default: ''
- name: workingDirectory
type: string
steps:
- task: Bash@3
displayName: Install Terragrunt
env:
terragruntversion: ${{ parameters.terragruntVersion }}
inputs:
targetType: filePath
filePath: pipelines/scripts/terragrunt_init.sh
- ${{ if ne(parameters.azServiceConnection, '') }}: # remote state required
- task: AzureCLI@2
displayName: Execute Terragrunt ${{ parameters.command }} command
env:
workingdir: ${{ parameters.workingDirectory }}
command: ${{ parameters.command }}
parameters: ${{ parameters.commandParameters }}
cloudEnabled: true
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
${{ each tf_var in split(replace(parameters.tfVars,' ',''), ',') }}:
TF_VAR_${{ tf_var }}: ${{ replace('$(tf_var_placeholder)','tf_var_placeholder',tf_var) }} # $(tf_var_placeholder) -> $(tf_var) e.g. $(db_connection_string)
inputs:
azureSubscription: ${{ parameters.azServiceConnection }}
addSpnToEnvironment: true
scriptType: bash
scriptLocation: scriptPath
scriptPath: pipelines/scripts/terragrunt_exec.sh
- ${{ else }}: # local command
- task: Bash@3
displayName: Execute Terragrunt ${{ parameters.command }} command
env:
workingdir: ${{ parameters.workingDirectory }}
command: ${{ parameters.command }}
parameters: ${{ parameters.commandParameters }}
cloudEnabled: false
inputs:
targetType: filePath
filePath: pipelines/scripts/terragrunt_exec.sh
The template consists of three steps:
- The first task uses Bash@3 to install Terragrunt, referencing the
terragruntVersion
parameter and executing theterragrunt_init.sh
script from thepipelines/scripts
directory, shown below as a reference.
#!/usr/bin/env bash
set -euo pipefail
mkdir -p terragrunt
cd terragrunt
# Download terragrunt
curl -SL "https://github.com/gruntwork-io/terragrunt/releases/download/v$terragruntversion/terragrunt_linux_amd64" --output terragrunt
# Copy the file to /usr/local/bin so we don't have to specify the full path
sudo cp -a terragrunt /usr/local/bin
# Make the file executable
chmod +x /usr/local/bin/terragrunt
# Test if it works by printing out the version of terragrunt
terragrunt --version
- The second task conditionally executes if
azServiceConnection
is not empty, indicating that a remote state is required. It uses AzureCLI@2 to run the specified Terragrunt command with the provided parameters and environment variables. It also sets up Terraform variables (TF_VAR_
) dynamically based on thetfVars
parameter. - The third task is for local command execution if
azServiceConnection
is empty. It uses Bash@3 to execute theterragrunt_exec.sh
script with the specified command and parameters, shown below as a reference.
#!/usr/bin/env bash
set -euo pipefail
export BOLD_GREEN="\033[1;32m"
if [ $cloudEnabled = true ]; then
export ARM_CLIENT_ID=$servicePrincipalId
export ARM_CLIENT_SECRET=$servicePrincipalKey
export ARM_SUBSCRIPTION_ID=$(az account show --query 'id' -o tsv)
export ARM_TENANT_ID=$tenantId
export AZDO_ORG_SERVICE_URL=$SYSTEM_TEAMFOUNDATIONSERVERURI
export AZDO_PERSONAL_ACCESS_TOKEN=$SYSTEM_ACCESSTOKEN
fi
terragrunt="terragrunt"
command="$command"
terragrunt_working_dir="--terragrunt-working-dir $workingdir"
parameters="$parameters"
if [ $command = 'hclfmt' ]; then
echo "##[command]$terragrunt $command $terragrunt_working_dir $parameters"
eval ${terragrunt} ${command} ${terragrunt_working_dir} ${parameters} |& tee /tmp/terragrunt_format_check.log
terragrunt_format_check_log=$(cat /tmp/terragrunt_format_check.log)
if [[ $terragrunt_format_check_log == *"error occurred"* ]]; then
echo "##vso[task.logissue type=warning]There were one or more errors when checking the format of your Terragrunt configurations. Please check the logs." && \
echo "##vso[task.complete result=SucceededWithIssues;]"
else
echo -e "${BOLD_GREEN}All Terragrunt configurations are correctly formatted."
fi
elif [ $command = 'validate-inputs' ]; then
for module in $(ls $workingdir)
do
echo "##[group]$module"
terragrunt_working_dir="--terragrunt-working-dir $workingdir/$module"
echo "##[command]$terragrunt $command $terragrunt_working_dir $parameters"
eval ${terragrunt} ${command} ${terragrunt_working_dir} ${parameters}
echo -e "${BOLD_GREEN}All required inputs are passed in by terragrunt."
echo "##[endgroup]"
done
else
echo "##[command]$terragrunt $command $terragrunt_working_dir $parameters"
eval ${terragrunt} ${command} ${terragrunt_working_dir} ${parameters}
fi
Conclusion
In this article, we have explored how to use Terragrunt to enhance Terraform’s functionality and simplify its workflow, by providing features such as remote state management, DRY configurations, dependency management, configuration inheritance, environment-specific configurations, locking mechanism, secrets management, and integration with CI/CD pipelines, on Azure DevOps.
There are many potential ways to improve or expand our implementation, such as using Terratest for testing our Terraform code, or using Terrascan for security and compliance scanning.
I hope that this article has given you some insights and inspiration for managing your cloud infrastructure with Terraform and Terragrunt. If you have any questions, feedback, or experiences to share, please feel free to leave a comment below.