Understanding the Cloud Team’s Terraform factory pattern¶

When the Cloud Team refers to a Terraform factory, we mean a Terraform root module designed to be deployed multiple times, with each deployment receiving its own set of input variables provided at runtime. Each invocation is treated as a fully independent deployment, with its own Terraform state file and lifecycle.

This approach allows a single, shared Terraform configuration to be reused at scale, supporting dozens or even hundreds of deployments, while ensuring that each Terraform operation affects only the resources associated with that specific deployment.

By isolating state per deployment, this pattern avoids cross‑coupling between products/services, reduces the risk of unintended changes, and provides a well‑defined and predictable blast radius for all Terraform operations.

When to use the Terraform factory pattern¶

The Terraform factory pattern is not intended for routine product or service deployments. For those use cases, the DevOps division already has a well‑established deployment approach using standard templates and patterns.

By contrast, the Terraform factory pattern is best suited to platform‑level or foundational infrastructure that must be deployed repeatedly at scale, often across many teams or product groups, while remaining technically owned and controlled by a central team. In these scenarios, the infrastructure has a consistent shape, requires strong guardrails, and benefits from being managed as a single, reusable root module rather than duplicated across repositories.

This pattern is particularly effective when a single team needs to provide a standardised set of resources with centrally enforced practices such as security controls, auditing, compliance requirements, and operational guardrails. The factory allows that team to evolve and improve the implementation over time, while consumers simply provide configuration inputs for their specific deployment.

The Cloud Team’s existing gitlab-project-factory and gcp-product-factory repositories are good examples of this model in practice. In both cases, the Cloud Team is responsible for defining and maintaining a standard baseline of resources that must be applied consistently across many products and services. The factory pattern allows these resources to be provisioned repeatedly, with strong isolation between deployments, while ensuring that central requirements are always applied.

A more recent example is the Azure Integration Services (AIS) ais-api-factory repository. In this case, a single team (AIS) is responsible for managing the deployment of potentially hundreds of similar resource sets—Azure API Management (APIM) API products—on behalf of multiple requesting product and development teams. The Terraform factory pattern provides a scalable mechanism for managing these deployments consistently, without requiring each consuming team to understand or maintain the underlying infrastructure implementation.

In summary, the Terraform factory pattern is most appropriate when:

A single team owns and maintains the Terraform root module
The same infrastructure pattern must be deployed many times
Each deployment requires an independent lifecycle and isolated state
Strong central standards and guardrails must be enforced
Consumers interact primarily through configuration, not implementation

For team‑specific, bespoke, or one‑off infrastructure deployments, the standard product deployment patterns should continue to be used instead.

Implementation¶

When implementing Terraform using this pattern, the Cloud Team follows a small number of conventions and techniques to ensure consistency, safety, and scalability. These are outlined in the sections below.

Partial backend configuration¶

To support separate state for each factory deployment, we use Terraform’s partial backend configuration. This allows some backend settings to be defined directly in the root module-such as the remote state backend type and storage location-while deferring other settings until runtime.

Typically, the root module defines shared backend configuration that is common to all deployments. For example, when using Google Cloud Storage as a backend, the bucket and service account used for state access are fixed for the factory as a whole:

# backend.tf

terraform {
  backend "gcs" {
    bucket                      = "my-state-bucket-012345"
    impersonate_service_account = "my-state-service-account@my-project-012345.iam.gserviceaccount.com"
  }
}

At runtime, each individual deployment must supply its own backend configuration for values that identify that deployment uniquely—most commonly the backend prefix. This is provided during terraform init using the -backend-config flag:

terraform init -backend-config="prefix={DEPLOYMENT_PREFIX}"

By using a distinct backend prefix for each deployment, every instance of the factory maintains a fully isolated Terraform state file. This is a foundational requirement of the factory pattern and ensures deployments can be created, modified, and destroyed independently.

Input variables¶

Terraform factories rely on input variables to parameterise deployments. The root module declares its inputs in variables.tf, defining both the expected values and their schema.

Each deployment then supplies its own variable values via a dedicated .tfvars file. These files are typically stored in a subdirectory of the root module—commonly under ./vars/—with one directory per deployment. For example, a deployment named deployment-a might define its configuration in:

./vars/deployment-a/deployment-a.tfvars

When running Terraform, this file is passed explicitly using the -var-file parameter:

terraform init -backend-config="prefix=deployment-a"
terraform plan -var-file=./vars/deployment-a/deployment-a.tfvars

This structure keeps deployment‑specific configuration clearly separated from shared infrastructure logic, making it easier to review changes, reason about impact, and manage deployments at scale.

HCL vs YAML¶

The Cloud Team currently standardises on defining all factory input variables using native HCL variable definition files (.tfvars).

Alternative formats such as JSON or YAML have been considered. Terraform does support JSON natively, but its strictness and lack of support for comments make it poorly suited for human‑authored configuration. YAML, while more flexible, is not supported natively by Terraform and would require additional tooling to convert it into either HCL or JSON at runtime.

Using native HCL provides several advantages out of the box: schema validation via variable blocks, consistent tooling support, and no need for additional conversion or validation layers. For these reasons, HCL remains the preferred format for factory inputs.

Executing Terraform factories locally¶

At present, running Terraform factories locally is more manual than desired. Each factory repository typically contains a run-*.sh script copied and modified from another factory. These scripts help automate repetitive steps such as supplying backend configuration and variable file arguments.

While functional, this approach does not scale well. Script duplication leads to drift between repositories, and improvements or fixes must be applied repeatedly across multiple factories.

To address this, the Cloud Team is actively exploring alternatives. One avenue is a “factory mode” which has been implemented in our internal logan tool, intended to provide a consistent and reusable way to execute Terraform factories locally. This approach is still being tested and validated, but may become the standard mechanism in future.

Terragrunt is also being evaluated as a potential option. Of particular interest is its stacks feature, which could enable orchestration of multiple factories from a higher‑level parent configuration. With Terragrunt having recently reached a stable 1.0 release, the Cloud Team is keen to assess whether it could be adopted with minimal disruption to the existing factory pattern.

CI/CD pipelines¶

Support for Terraform factories in CI/CD has recently been improved through the introduction of a shared tf-factory-pipeline-helper tool. This tool centralises the logic required to dynamically generate CI pipelines for factory‑based repositories.

Previously, each factory implemented its own custom CI scripting, which proved difficult to maintain and did not scale effectively. By consolidating this logic into a shared tool, we reduce duplication, minimise drift between implementations, and ensure consistent behaviour across all factory repositories.

In addition, the Cloud Team is developing a shared CI template within the ci-templates project. This template wraps the tf-factory-pipeline-helper tool and provides a central, standardised CI/CD implementation that can be easily adopted by any repository using the Terraform factory pattern.

At present, limitations in the GitLab permission model for the UIS DevOps group on gitlab.developers.cam.ac.uk mean that full factory deployments (that is, executing terraform apply) via CI/CD are not yet enabled. As a result, current CI pipelines are focused on supporting merge request reviews and detecting configuration drift, rather than performing deployments.

For merge requests, pipelines execute terraform plan for the relevant deployments to aid review and change validation. In addition, scheduled pipelines run a complete set of terraform plan jobs across all deployments defined within a factory. Any detected drift or plan failures trigger alerts, allowing the relevant team to investigate and take appropriate action.

Once the permission model limitations are resolved, our intention is to support first‑class factory deployments directly from CI pipelines, enabling controlled and auditable terraform apply operations alongside the existing drift detection workflows.

Example Terraform factory repository structure¶

Below is an example of a typical Terraform factory repository structure as used by the Cloud Team.

.
├── modules/
│   └── example-module/
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
├── vars/
│   ├── deployment-a/
│   │   └── deployment-a.tfvars
│   ├── deployment-b/
│   │   └── deployment-b.tfvars
│   └── deployment-c/
│       └── deployment-c.tfvars
├── .gitlab-ci.yml
├── README.md
├── backend.tf
├── main.tf
├── outputs.tf
├── run-factory.sh
├── providers.tf
├── versions.tf
└── variables.tf

Root module files¶

The repository root contains the Terraform factory root module. This is the reusable definition that will be instantiated many times.

backend.tf defines the shared backend configuration and relies on partial backend configuration to supply per‑deployment values (such as the state prefix) at runtime.
main.tf contains the main entrypoint logic for the factory. Resources can be defined here or in further *.tf files as desired.
variables.tf declares all input variables required by the factory. These inputs define the contract between the factory and each deployment.
outputs.tf exposes any outputs that may be useful to users, tooling, or downstream systems.

Modules directory¶

The modules/ directory contains any internal child modules used by the factory. These modules are not intended to be consumed directly by external repositories, but help structure and reuse logic within the factory itself.

Not all factories will require child modules; simpler factories may define everything directly in the root module.

Deployment variables¶

The vars/ directory contains configuration for each individual factory deployment. Each deployment has its own subdirectory containing a .tfvars file with values for all required input variables.

This structure keeps deployment‑specific configuration clearly separated from shared infrastructure logic and makes it easy to review, add, or remove deployments over time.