Introduction

Someone clicks the wrong button in the AWS console and a production database disappears. The team spends four hours reconstructing it from memory. That's Tuesday without infrastructure as code.

Terraform is HashiCorp's tool for defining cloud infrastructure in configuration files. You describe what you want, Terraform figures out what to create and in what order, and it tracks everything it builds. Tear it all down? One command. But the real value isn't the automation -- it's the state file. Terraform knows what exists. It knows what you asked for. It computes the difference. That's a fundamentally different relationship with infrastructure than clicking around in a console.

The provider model is what made Terraform win. Same language, same workflow for AWS, Google Cloud, Azure, Cloudflare, Datadog, GitHub repos. Hundreds of services. Not locked to one cloud. That flexibility matters more than people think it does early on, and by the time they realize it, they're glad they picked something portable.

Infrastructure as code is not about replacing the cloud console. It is about making your infrastructure reviewable, repeatable, and recoverable.

HCL Quick Reference

HCL is the configuration language. More readable than JSON, less ambiguous than YAML. A Terraform project is one or more .tf files in a directory -- Terraform merges all of them together, so split them however makes sense. Most teams: main.tf, variables.tf, outputs.tf, providers.tf.

Basic EC2 instance:

main.tf
# Configure the AWS providerterraform {
 required_providers {
 aws = {
 source = "hashicorp/aws"
 version = "~> 5.0"
 }
 }
 required_version = ">= 1.7.0"
}
provider"aws" {
 region = "us-east-1"
}
# Create an EC2 instanceresource"aws_instance""web_server" {
 ami = "ami-0c02fb55956c7d316"
 instance_type = "t3.micro"tags = {
 Name = "web-server"
 Environment = "production"
 ManagedBy = "terraform"
 }
}

Pin provider versions. Pin the Terraform version. Skip this and a new provider release will break your config silently on someone else's machine. Every resource follows resource "TYPE" "NAME" -- type from the provider, name is your local label. Reference it elsewhere as aws_instance.web_server.

Workflow: terraform init (downloads providers), terraform plan (previews changes), terraform apply (creates resources), terraform destroy (tears them down). The plan step is the most valuable part. Shows you exactly what will happen before anything happens.

Providers and Resources

Providers are plugins. AWS, Google Cloud, Azure, Kubernetes, Datadog, GitHub -- thousands of them in the Terraform Registry. Each exposes resources (things Terraform creates) and data sources (things Terraform reads but doesn't own).

network.tf
# Create a VPCresource"aws_vpc""main" {
 cidr_block = "10.0.0.0/16"
 enable_dns_hostnames = true
 enable_dns_support = truetags = {
 Name = "main-vpc"
 }
}
# Create a public subnetresource"aws_subnet""public" {
 vpc_id = aws_vpc.main.id
 cidr_block = "10.0.1.0/24"
 availability_zone = "us-east-1a"
 map_public_ip_on_launch = truetags = {
 Name = "public-subnet"
 }
}
# Internet gateway for outbound trafficresource"aws_internet_gateway""gw" {
 vpc_id = aws_vpc.main.idtags = {
 Name = "main-igw"
 }
}
# Route table for public subnetresource"aws_route_table""public" {
 vpc_id = aws_vpc.main.idroute {
 cidr_block = "0.0.0.0/0"
 gateway_id = aws_internet_gateway.gw.id
 }
}
resource"aws_route_table_association""public" {
 subnet_id = aws_subnet.public.id
 route_table_id = aws_route_table.public.id
}

The dependency graph is implicit. The subnet references aws_vpc.main.id, the route table references both the VPC and the internet gateway. Terraform reads these references and creates resources in the right order. VPC before subnet, internet gateway before route table entry. You never specify ordering manually, and you shouldn't try to -- explicit depends_on is a code smell that usually means your references aren't structured correctly.

Variables, Outputs and Locals

Hardcoded values work for experiments. They fall apart the moment you need the same config in a different region or with a different instance size.

variables.tf
variable"aws_region" {
 description = "AWS region to deploy resources"
 type = string
 default = "us-east-1"
}
variable"environment" {
 description = "Deployment environment name"
 type = stringvalidation {
 condition = contains(["dev", "staging", "production"], var.environment)
 error_message = "Environment must be dev, staging, or production."
 }
}
variable"instance_type" {
 description = "EC2 instance type"
 type = string
 default = "t3.micro"
}
variable"allowed_cidr_blocks" {
 description = "CIDR blocks allowed to access the instance"
 type = list(string)
 default = ["0.0.0.0/0"]
}
variable"extra_tags" {
 description = "Additional tags to apply to all resources"
 type = map(string)
 default = {}
}
# Local values for computed or combined valueslocals {
 common_tags = merge({
 Environment = var.environment
 ManagedBy = "terraform"
 Project = "web-platform"
 }, var.extra_tags)
 name_prefix = "${var.environment}-web"
}
# Outputs expose values after applyoutput"instance_public_ip" {
 description = "Public IP of the web server"
 value = aws_instance.web_server.public_ip
}
output"vpc_id" {
 description = "ID of the VPC"
 value = aws_vpc.main.id
}

The validation block on environment -- if someone passes "prod" instead of "production", Terraform catches it before creating anything. Cheap safety net.

Locals are computed values. local.common_tags merges standard tags with extra tags from a variable. Every resource references one tag map instead of duplicating the same block everywhere. Outputs expose values after apply: the IP of a new server, the ID of a VPC. They also pass data between modules.

In practice, one .tfvars file per environment. production.tfvars sets instance type to t3.large, dev.tfvars keeps it at t3.micro. Pass the file with terraform apply -var-file="production.tfvars". Simple, but it works.

Modules for Reusable Infrastructure

Every application needs a VPC. Every application needs a load balancer and an auto-scaling group. Copy-pasting those resource blocks across projects is how you end up with five slightly different VPC configurations and no idea which one is correct.

A module is a directory of .tf files with inputs and outputs. Your root config is already a module. Child modules encapsulate reusable infrastructure.

The module structure matters more than most people realize, so here's what a standardized web application stack looks like before you see the code: security group, launch template, and auto-scaling group, all parameterized by app name, VPC, subnets, and sizing. One module, callable with different parameters for different services.

modules/web-app/main.tf
# modules/web-app/main.tf# A reusable module for web application infrastructureresource"aws_security_group""web" {
 name_prefix = "${var.app_name}-web-"
 vpc_id = var.vpc_id
 ingress {
 from_port = 80
 to_port = 80
 protocol = "tcp"
 cidr_blocks = ["0.0.0.0/0"]
 }
 ingress {
 from_port = 443
 to_port = 443
 protocol = "tcp"
 cidr_blocks = ["0.0.0.0/0"]
 }
 egress {
 from_port = 0
 to_port = 0
 protocol = "-1"
 cidr_blocks = ["0.0.0.0/0"]
 }
 tags = var.tags
}
resource"aws_launch_template""web" {
 name_prefix = "${var.app_name}-"
 image_id = var.ami_id
 instance_type = var.instance_type
 vpc_security_group_ids = [aws_security_group.web.id]
 tag_specifications {
 resource_type = "instance"
 tags = merge(var.tags, {
 Name = "${var.app_name}-web"
 })
 }
}
resource"aws_autoscaling_group""web" {
 name = "${var.app_name}-asg"
 desired_capacity = var.desired_capacity
 max_size = var.max_size
 min_size = var.min_size
 vpc_zone_identifier = var.subnet_ids
 launch_template {
 id = aws_launch_template.web.id
 version = "$Latest"
 }
}

Calling it from root config:

main.tf (root module)
module"api_service" {
 source = "./modules/web-app"
 app_name = "api"
 vpc_id = aws_vpc.main.id
 subnet_ids = aws_subnet.public[*].id
 ami_id = data.aws_ami.amazon_linux.id
 instance_type = "t3.small"
 desired_capacity = 2
 min_size = 1
 max_size = 4
 tags = local.common_tags
}
module"frontend_service" {
 source = "./modules/web-app"
 app_name = "frontend"
 vpc_id = aws_vpc.main.id
 subnet_ids = aws_subnet.public[*].id
 ami_id = data.aws_ami.amazon_linux.id
 instance_type = "t3.micro"
 desired_capacity = 3
 min_size = 2
 max_size = 6
 tags = local.common_tags
}

One module. Two separate application stacks. Change how security groups work in the module and every caller picks up the fix.

The Terraform Registry has community modules for common patterns -- VPCs, EKS clusters, RDS databases. The official AWS VPC module handles public and private subnets, NAT gateways, and route tables in a single call. Use those for infrastructure primitives. Write custom modules for your application-specific patterns.

Rule of thumb: module when you build the same resources more than twice. But keep each module focused. A VPC module should not also deploy an application. That's not a module, that's a monolith wearing a module's clothes.

State Management and Remote Backends

This is the hardest concept in Terraform. Not the syntax. Not HCL. State.

Terraform's state file is a JSON mapping between what's in your config and what actually exists in the cloud. aws_instance.web_server maps to instance i-0abc123def456. Corrupt that file and Terraform no longer knows what it created. You're untangling things by hand, and "by hand" in this context means reading API responses and cross-referencing them against your config while production is potentially broken.

By default, state lives in a local file called terraform.tfstate. Solo work, fine. On a team, two people running terraform apply with different local state files will overwrite each other and potentially corrupt live infrastructure.

Get this wrong and you'll corrupt your state file. Read it carefully.

backend.tf
terraform {
 backend"s3" {
 bucket = "mycompany-terraform-state"
 key = "web-platform/terraform.tfstate"
 region = "us-east-1"
 dynamodb_table = "terraform-locks"
 encrypt = true
 }
 required_providers {
 aws = {
 source = "hashicorp/aws"
 version = "~> 5.0"
 }
 }
}

Every plan or apply now fetches state from S3, acquires a DynamoDB lock, runs, writes state back, releases the lock. Someone else tries to run at the same time? "State locked." They wait.

Never edit the state file by hand. Ever. Use terraform state mv and terraform state rm if you need to move or remove resources. The state file contains sensitive data -- database passwords, API keys, connection strings -- so encrypt the S3 bucket and restrict access with IAM policies. Enable versioning on the bucket. You will, at some point, need to recover from accidental corruption. When that day comes, you'll be glad the previous state file is still there.

Terraform Cloud handles all of this automatically. For larger teams, probably worth it. For smaller teams, S3 plus DynamoDB works fine indefinitely.

Workspaces for Multiple Environments

Dev, staging, production. Same infrastructure, different sizes. Workspaces are Terraform's answer: same configuration, separate state files, completely independent resources. You reference the current workspace name with terraform.workspace to vary settings per environment.

environments.tf
# Environment-specific configuration using workspaceslocals {
 environment = terraform.workspace# Different settings per environment
 env_config = {
 dev = {
 instance_type = "t3.micro"
 desired_capacity = 1
 max_size = 2
 multi_az = false
 db_instance_class = "db.t3.micro"
 }
 staging = {
 instance_type = "t3.small"
 desired_capacity = 2
 max_size = 4
 multi_az = false
 db_instance_class = "db.t3.small"
 }
 production = {
 instance_type = "t3.large"
 desired_capacity = 3
 max_size = 10
 multi_az = true
 db_instance_class = "db.r6g.large"
 }
 }
 # Select config for current workspace
 config = local.env_config[local.environment]
}
resource"aws_instance""web" {
 ami = data.aws_ami.amazon_linux.id
 instance_type = local.config.instance_typetags = {
 Name = "${local.environment}-web-server"
 Environment = local.environment
 }
}
resource"aws_db_instance""main" {
 identifier = "${local.environment}-database"
 engine = "postgres"
 engine_version = "16.1"
 instance_class = local.config.db_instance_class
 multi_az = local.config.multi_az# ... other configuration
}

terraform workspace new staging, terraform workspace select staging, apply. Resources get created with staging-sized instances and staging tags. Switch to production and the same code provisions production-grade resources.

Workspaces have a real problem, though. Nothing stops you from running terraform apply in the wrong workspace. Dev-sized database instances landing in production because someone forgot to switch. The database buckles under real traffic. Emergency resize with downtime. This has happened.

Many experienced teams prefer separate directories per environment with shared modules instead. When the directory is literally named environments/production/, ambiguity disappears. But separate directories mean more files to maintain. Pick your tradeoff.

Terraform in CI/CD Pipelines

Running Terraform from laptops is how teams get into trouble. Version mismatches. Forgotten git pull before apply. Zero audit trail. Moving Terraform into a CI/CD pipeline fixes all of these and, honestly, should happen earlier than most teams do it.

The pattern: branch, PR, CI runs terraform plan and posts results as a comment, reviewers approve, CD runs terraform apply on merge to main.

.github/workflows/terraform.yml
name:Terraform CI/CDon:pull_request:branches: [main]
 paths: ['infrastructure/**']
 push:branches: [main]
 paths: ['infrastructure/**']
permissions:contents:readpull-requests:writeid-token:writejobs:plan:name:Terraform Planruns-on:ubuntu-latestif:github.event_name == 'pull_request'steps:
 - uses:actions/checkout@v4
 - name:Configure AWS Credentialsuses:aws-actions/configure-aws-credentials@v4with:role-to-assume:${{ secrets.AWS_ROLE_ARN }}aws-region:us-east-1
 - name:Setup Terraformuses:hashicorp/setup-terraform@v3with:terraform_version:"1.7.5"
 - name:Terraform Initrun:terraform initworking-directory:infrastructure/
 - name:Terraform Planid:planrun:terraform plan -no-color -out=tfplanworking-directory:infrastructure/
 - name:Comment Plan on PRuses:actions/github-script@v7with:script: |
 // Post plan output as PR commentapply:name:Terraform Applyruns-on:ubuntu-latestif:github.ref == 'refs/heads/main' && github.event_name == 'push'steps:
 - uses:actions/checkout@v4
 - name:Configure AWS & Setup Terraform# Same steps as above...
 - name:Terraform Applyrun:terraform apply -auto-approveworking-directory:infrastructure/

OIDC authentication via role-to-assume instead of stored AWS keys. Short-lived credentials, nothing to rotate. Pinned Terraform version for consistency. Plan on PRs, apply only on pushes to main.

Save the plan output with -out=tfplan and use that saved plan for apply. Without this, state can change between plan and apply, and the changes reviewers approved are not necessarily the changes that get applied. Subtle. Dangerous.

For more involved setups, Atlantis (open source, self-hosted) or Spacelift (managed, policy enforcement, drift detection). Both purpose-built for Terraform in PR workflows. But even a basic GitHub Actions setup like the one above is vastly better than running from laptops.

Next Steps

Set up remote state with S3 and DynamoDB locking before writing your first resource. Not after. Before. The number of teams that start with local state intending to migrate later and then spend a painful afternoon doing the migration is embarrassingly high.

Terraform vs Pulumi vs CDK vs OpenTofu -- the tooling is fragmenting. But the practice of codifying infrastructure is only getting more important. Which tool you pick matters less than picking one and using it for everything.

Anurag Sinha

Anurag Sinha

Full Stack Developer & Technical Writer

Anurag is a full stack developer and technical writer. He covers web technologies, backend systems, and developer tools for the Codertronix community.