Terraform Module Patterns for Production EKS

2026-05-12 10 min

Most Terraform code I see for EKS works for a single cluster and falls apart the second a second environment shows up. The patterns below are what I have converged on across multiple production clusters at Turna and earlier projects — not best practices in the abstract, but choices that pay rent on real teams.

Module boundaries follow ownership, not AWS services

The temptation is to make a module per AWS service: modules/vpc, modules/eks, modules/rds. This looks tidy and ages poorly. Real ownership boundaries are coarser. A platform team owns "networking" (VPC + subnets + endpoints + NAT + flow logs) as one indivisible unit. They almost never change one piece without considering the others.

I structure modules around that ownership:

modules/
├── platform-network/     # VPC, subnets, endpoints, NAT, flow logs
├── platform-cluster/     # EKS, addons, IRSA, default node groups
├── platform-observability/ # Prometheus stack, Loki, alerting
└── app-database/         # RDS instances, parameter groups, IAM

Each module has 5–15 inputs and a flat output surface. If a module needs more than 20 inputs, it is doing too much.

One Terraform workspace per environment, not per stack

Terraform workspaces are tempting because they look like a free per-environment knob. They are a trap when used to separate prod from dev, because they share the same backend prefix and a single fat-fingered workspace select mistake destroys the wrong environment.

What I use instead: separate directories per environment, each with its own backend config and its own state file.

envs/
├── dev/
│   ├── main.tf          # Calls platform-* modules with dev-sized inputs
│   ├── backend.tf       # S3 key: env/dev/terraform.tfstate
│   └── terraform.tfvars
├── stage/
└── prod/

Workspaces are still useful for short-lived stack variants (a feature branch's preview cluster, a load test stack). For prod vs dev, separate directories with separate backends.

Variables: required, optional, derived

A pattern that has saved me dozens of hours: separate variables into three buckets by intent.

  • Required inputs have no default. The plan fails loudly if the caller forgot one. Example: cluster_name, vpc_cidr.
  • Optional inputs have a sensible default that works for 80% of callers. Example: node_group_instance_types = ["t3.large"].
  • Derived values are locals, not variables. Example: locals { subnet_count = length(var.availability_zones) }.

I never use optional variables for safety-critical knobs. If turning on cluster logging is important, make it required.

Remote state, not state passing

You will eventually need values from one stack (the network ID, a security group) in another (the cluster, an app). Two ways:

  1. The data source way: data "terraform_remote_state" "network". Reaches into another stack's state.
  2. The output-and-pass way: write the value to SSM Parameter Store or AWS Secrets Manager, read it from the consumer.

I default to the second one. Reaching into another stack's state couples your modules tightly and breaks if the producer reorganizes its outputs. SSM is a stable contract you control, and it works for tools other than Terraform.

resource "aws_ssm_parameter" "vpc_id" {
  name  = "/platform/network/${var.env}/vpc_id"
  type  = "String"
  value = aws_vpc.main.id
}

On the consumer side, a small data block fetches it. No state coupling.

The provider block belongs to the root, not the module

A module that declares its own provider block (with region, profile, assume-role) seems convenient — until you call that module twice with different providers and Terraform refuses. The fix is to keep required_providers in the module (declaring the version range it tolerates) but put the actual provider block only in the root configuration.

# inside module
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# in root
provider "aws" {
  region = var.region
}

Lock the module versions

Calling a module by Git reference without a tag means the next terraform init may pull a different commit. Always pin to a tag:

module "cluster" {
  source = "git::https://github.com/org/tf-modules.git//platform-cluster?ref=v1.4.2"
  # ...
}

Renovate or Dependabot can open PRs to bump these tags. The PR is the audit trail.

Things I have stopped using

  • Per-environment tfvars files inside a single directory. Easier than separate directories on day one, much worse on day 100. Switch early.
  • Recursive module nesting. A module that calls another module that calls another module produces traces no human can follow. Two levels max.
  • The terraform-aws-modules EKS module's defaults without overrides. Excellent starting point; reading the source first time is mandatory.
  • Using count for environment toggling. count = var.env == "prod" ? 1 : 0 creates resources whose addresses depend on the variable and re-create when you change it. Use modules or branches in main.tf.

What stays the same across all of this

The actual EKS cluster config is mostly boring. Two node groups (system + workload), IRSA enabled, CloudWatch logging on, KMS-encrypted secrets, private API endpoint. The interesting code is everything around the cluster — the IAM policies, the addon orchestration, the per-app IRSA roles. That is where Terraform earns its keep.