Testing and Validation: Catching Bugs Before They Cost Money

This chapter covers the validation layers around Terraform: fmt, validate, tflint, Terratest, and policy-as-code.

The Layers

You don't pick one tool; you stack them. Each catches a different class of problem.

terraform fmt        Formatting (style)
terraform validate   Syntax and internal consistency
tflint               Lint rules, provider-specific best practices
tfsec / Checkov      Security misconfigurations
terraform test       Plan/apply-based tests (native, 1.6+)
Terratest            Real-integration tests in Go
OPA / Conftest       Custom policy over the plan JSON
Sentinel             HashiCorp's policy-as-code (HCP Terraform)

Each layer is fast and cheap compared to the next. Put the cheap ones in pre-commit hooks; the expensive ones in CI.

terraform fmt

Formats your code to Terraform's canonical style.

terraform fmt              # format files in the current directory
terraform fmt -recursive   # include subdirectories
terraform fmt -check       # exit 1 if any file would be changed (for CI)
terraform fmt -diff        # show the changes it would make

In CI, always:

terraform fmt -check -recursive

Fails fast if anyone committed unformatted code. Zero excuses.

terraform validate

Checks syntax and internal consistency. Doesn't hit any cloud APIs.

terraform init -backend=false
terraform validate

Fails on: typos in resource types, missing required arguments, references to things that don't exist, type mismatches.

Won't catch: runtime errors, missing IAM permissions, values that depend on the real cloud.

Always run in CI. It's fast and catches a surprising amount.

tflint

A Terraform linter. Catches things validate doesn't:

  • Deprecated syntax.
  • Unused variables.
  • AWS-specific rules (invalid instance types for a region, missing required tags, etc.).
  • Naming conventions.

Install:

brew install tflint
# or the binary from github.com/terraform-linters/tflint

Configure with a .tflint.hcl:

plugin "terraform" {
  enabled = true
  preset  = "recommended"
}

plugin "aws" {
  enabled = true
  version = "0.30.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
}

rule "terraform_unused_declarations" {
  enabled = true
}

Run:

tflint --init       # once, to download plugins
tflint --recursive

The AWS plugin catches "you used an instance type that doesn't exist in this region", which the provider would also catch at plan time, but much slower.

Security Scanners: tfsec and Checkov

Static analysis that knows common security mistakes.

tfsec

brew install tfsec
tfsec .

Example findings:

Check: aws-s3-enable-versioning
  Severity: MEDIUM
  Location: main.tf:15
  S3 bucket does not have versioning enabled.

Check: aws-ec2-no-public-egress-sgr
  Severity: MEDIUM
  Location: sg.tf:23
  Security group rule allows egress to 0.0.0.0/0.

Checkov

pip install checkov
checkov -d .

Similar scope, different rule set. Some teams run both; many pick one.

Use these for:

  • Public resources (S3 buckets, security groups) that should be private.
  • Missing encryption at rest.
  • IAM policies with * actions.
  • Unrestricted ingress rules.

Not a substitute for code review, but a useful safety net.

terraform test (Native)

Since Terraform 1.6, terraform test is built in. You write test cases in HCL.

tests/bucket.tftest.hcl:

variables {
  environment = "test"
}

run "valid_config" {
  command = plan

  assert {
    condition     = length(aws_s3_bucket.notes.bucket) > 0
    error_message = "bucket name must not be empty"
  }
}

run "production_uses_large_instance" {
  command = plan

  variables {
    environment = "prod"
  }

  assert {
    condition     = aws_instance.web.instance_type == "m5.large"
    error_message = "prod must use m5.large"
  }
}

Run:

terraform test

Each run block is a test. command = plan runs a plan (no cloud changes). command = apply actually applies (use against a test environment; clean up after).

Great for module testing: you can verify outputs and resource configuration without deploying.

Terratest

Go-based integration testing. Deploys real infrastructure, asserts behavior, destroys it.

Example (test/vpc_test.go):

package test

import (
    "testing"

    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestVPCModule(t *testing.T) {
    t.Parallel()

    opts := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "cidr_block": "10.99.0.0/16",
            "name":       "test-vpc",
        },
    }

    defer terraform.Destroy(t, opts)

    terraform.InitAndApply(t, opts)

    vpcID := terraform.Output(t, opts, "vpc_id")
    assert.NotEmpty(t, vpcID)
}

Run:

cd test
go test -v -timeout 30m

Terratest is more work to set up than terraform test, but:

  • Tests against real cloud resources (catches provider bugs, IAM issues, eventual consistency).
  • Lets you assert HTTP-level behavior (does the deployed ALB actually respond?).
  • Full Go, so you can script anything.

Use Terratest for modules that go through real API calls (networking, ALBs, Lambda deployments).

Policy as Code

Policy says "plans must satisfy these rules". Stronger than linting because it runs on the actual plan output, not the source code.

OPA / Conftest

OPA (Open Policy Agent) with Conftest wrapper:

# policies/s3.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_s3_bucket_public_access_block"
  resource.change.after.block_public_acls == false
  msg := sprintf("S3 bucket %s must block public ACLs", [resource.address])
}

Run:

terraform show -json tfplan.binary > plan.json
conftest test --policy policies/ plan.json

Any deny rule that matches fails the check.

Sentinel

HashiCorp's policy-as-code language, available in HCP Terraform.

import "tfplan/v2" as tfplan

s3_buckets = filter tfplan.resource_changes as _, rc {
  rc.type is "aws_s3_bucket_public_access_block"
}

main = rule {
  all s3_buckets as _, bucket {
    bucket.change.after.block_public_acls is true
  }
}

Attached to a workspace; every plan is checked. Violations block apply.

OPA vs Sentinel

OPA: open source, language (Rego) used beyond Terraform. Works with any CI.

Sentinel: HashiCorp-only, HCP Terraform / Terraform Enterprise. More integrated with the HCP UI.

Pick based on your Terraform delivery path.

A Realistic Pipeline

What to run where:

Pre-commit hook     terraform fmt, terraform validate, tflint (fast)
PR CI (always)      fmt check, validate, tflint, tfsec, plan, conftest on plan
PR CI (optional)    terraform test (native)
Nightly             Terratest against a sandbox account
Apply to prod       Sentinel (if using HCP) or conftest gate

Each layer runs when its feedback is valuable without blocking you.

Pre-Commit Hook

Install pre-commit (pre-commit.com) and add .pre-commit-config.yaml:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.86.0
    hooks:
      - id: terraform_fmt
      - id: terraform_validate
      - id: terraform_tflint
      - id: terraform_tfsec
      - id: terraform_docs

pre-commit install wires these as git hooks. Every commit runs them.

Common Pitfalls

Skipping terraform fmt. Unformatted diffs make real changes hard to see. Free tool; run it.

Depending only on terraform validate. It catches syntax, nothing else. Layer more.

Running Terratest on every PR. Slow and expensive. Reserve for scheduled runs or release branches.

Policy that's too strict. If 80% of PRs fail policy, engineers route around it. Make policy actionable: say what to fix, not just what's wrong.

No cleanup in Terratest. A test that leaks resources is worse than no test. defer terraform.Destroy always.

Testing only the happy path. Testing that terraform plan succeeds for valid input is half a test. Also test that invalid input fails with the right error.

Next Steps

Continue to 11-ecosystem.md for the tools built on top of Terraform.