Skip to content

A guide to service infrastructure and product factories

Each service in DevOps will need some service infrastructure. Our managed service infrastructure thinks in terms of "products" which are names given to groups of resources. Unfortunately what "product" means is overloaded in DevOps, both from a project management and a technical perspective. This document covers a few different technical meanings of "product" in DevOps and how you can decide how to group service infrastructure resources into products.

There are quicker task-focused guides if all you want to do is start a new service or ensure an existing service is up-to-date.

This document is long, dense and necessarily pedantic with terminology. We'd recommend getting your drink of choice and reading the document a few times from top-to-bottom.

Product factories

A lot of our managed infrastructure follows a "product factory" model. In this model, the term "product" means a name given to sets of resources. Creating a new "product" creates a new set of resources associated with that name.

With our current managed infrastructure, we have the following resources which can be created and associated with some "product" name:

  • Google user accounts and Google groups for delivery teams,
  • Google Cloud resources,
  • GitLab projects and groups, and
  • GitLab CI/CD runners.

Each of these are managed by their own infrastructure as code (IaC) projects and each IaC project uses an independent set of "product" names for their resource sets.

In theory, "products" in each of these sets do not need to have a 1:1 relationship with each other. You could, for example, have a single GitLab "product" with multiple GitLab CI runner "products" or multiple Google Cloud "products" sharing a single GitLab "product".

The common case, and the preferable one, is to align Google's, GitLab's and GitLab CI's idea of a "product" with one another and to have exactly one "product" for a given service. That situation is covered in a dedicated how-to guide.

As services split and merge, however, we may find that the case becomes more complex. This document explains what we mean by "product" in each of these cases, what the relationship between products are and what we mean by a "product identity".

Service infrastructure

"Service infrastructure" is loosely defined as a set of resources managed by a single team in order to deliver a service. A service will usually need the following bits of managed infrastructure:

  • Google user accounts and Google groups. These are used to grant access to resources in the Google Cloud Console. They are managed by configuration in the Team Data Manager GitLab project which is, in turn, driven by a JSON configuration file. Team members managing resources in Google Cloud are encouraged to do so via a @gcloudadmin.g.apps.cam.ac.uk account and Team Data Manager is responsible for creating these accounts.
  • Google Cloud team folders. These contain Google Cloud Folders and Projects for services. They, along with the associated IAM policies, are managed by the gcp-infra GitLab project which, in turn, sets IAM policies for the groups created by Team Data Manager.
  • Google Cloud resources. These provide the service for users. Commonplace resources for services are managed by configuration in the gcp-product-factory GitLab project which is, in turn, driven by per-product configuration files. Additional Cloud resources are usually provisioned by service-specific terraform configurations.
  • GitLab projects and groups. These facilitate project management and host code implementing the services. They are managed by configuration in the gitlab-project-factory GitLab project which is, in turn, driven by per-product configuration files.
  • GitLab CI runners. These perform actions related to service development. They are managed by configuration in the gitlab-runner-infrastructure GitLab project driven by configuration in a single file within that project.

A service also consists of a large amount of information, expertise, know-how and user support infrastructure but these are not yet things we manage through infrastructure as code.

Each of the IaC projects above have their own independent concept of a "product". These need not align 1:1 with each other but things are a lot simpler for everyone if they do.

Guidance for choosing product sets

Generally we strongly recommend having the concept of "product" be aligned between gcp-product-factory, gitlab-project-factory and gitlab-runner-infrastructure so that a single "Google product" maps to a single "GitLab product" which uses a single "GitLab runner product".

We also recommend that gitlab-project-factory products have parent_group_id = 5 which corresponds to the uis/devops group. Tempting though it is to group products by team, that tends to lead to pain in the long-run.

What about managing permissions in GitLab?

One strong motivator for grouping GitLab projects by team is that permissions can be set at the team group level and cascade down to the projects. Unfortunately, for the moment, one needs to take the hit and manage GitLab product group's permissions manually.

It is likely that GitLab project factory will learn about team_data.json groups at some point and so be able to set permissions automatically.

In advanced cases you may want to vary this arrangement. Some examples:

  • The uis/devops/lib, uis/devops/tools and uis/devops/django GitLab groups contain reusable libraries, tools and Django modules. Each of these could be considered a separate "product". Our common CI pipelines need some Google Cloud resources to work but none of these products need to deploy to Google. As such we have a shared code Google product which is shared between all the projects in those groups.
  • We have a lot of Identity and Access Management (IAM) services. There is a strong ontological argument to be made that it is less confusing to have a uis/devops/iam group containing these products. In that case the GitLab project factory "product" folders would live under uis/devops/iam and have parent_group_id = 938. Note that this grouping is based on the nature of the products and not on the delivery team managing them.
  • A single "product" may be better represented in GitLab via nested groups. For example, it may make sense to have the Raven "product" be split into uis/raven/saml2 and uis/raven/legacy sub-groups which each contain multiple projects. In that case the subgroup facility of gitlab-project-factory can be used; there is still a single GitLab and Google "product" but multiple groups.
  • Continuing with the example of Raven. As a sensitive service, we may want to limit the ability of CI jobs in uis/raven/saml2 to affect projects in uis/raven/legacy. In that case we can configure separate GitLab CI runners for uis/raven/saml2 and uis/raven/legacy. So the Raven service now has a single Google "product", a single GitLab "product" but two GitLab CI/CD "products".

It bears repeating that the common case will be for services to have a single gcp-product-factory product, a single gitlab-project-factory product and a single gitlab-runner-infrastructure runner.

What if my CI/CD jobs need to access resources from other products?

If you need to get GitLab API tokens scoped to other GitLab projects or groups, you can do so via the "additional token mechanism" provided by gitlab-project-factory which is documented in its README.

The identities section below covers how you specify Google Cloud IAM permissions to grant access to additional tokens or Google Cloud resources from other products.

Delivery team user accounts and groups

Although not termed a "product" in the configuration per se, the Team Data Manager service has the concept of mapping a name to a group of user accounts. These names are usually aligned with delivery teams rather than services but, as a name mapping to a set of service infrastructure resources managed by IaC, it matches our working definition of "product".

Team membership is driven by a JSON configuration file owned by teamCloud.

Generally there should be exactly one entry in team_data.json for each delivery team.

While Team Data Manager is responsible for maintaining a set of Google user accounts and role groups, it is gcp-infra and gcp-product-factory which actually contain IAM policies granting permissions on Cloud resources to role group.

This relationship is summarised in the following diagram:

Remember that there can be multiple teams and multiple products in gcp-product-factory. There does not need to be a 1:1 mapping between them; a single team will often have IAM permissions on multiple products.

Google Cloud Projects

Tip

More detail on excatly what Google Cloud resources are created for a product can be found in a dedicated reference guide. This section focuses on resources which interact with other bits of service infrastructure.

The configuration in gcp-product-factory is responsible for creating:

  • A single product-specific Google Project known as the meta project.
  • A Google Project for each environment, for example "production", "staging" and "development".
  • Within the meta project:
    • A Cloud Storage bucket which holds machine-readable information on the product
    • A Secret which holds a GitLab API token.
    • A Service Account which can access the token.
    • A Service Account representing CI jobs for the product.

Note

There is an issue to move the GitLab API token secret to the GitLab project factory.

The CI service account and GitLab token secret are intended to be used by GitLab CI runners configured in gitlab-runner-infrastructure. The token secret is intended to be updated by the gitlab-project-factory configuration.

This diagram summarises the interaction between gitlab-project-factory, gitlab-runner-infrastructure and gitlab-project-factory in the case where a product has a "production", "staging" and "development" environment:

Note

Any CI job can, by impersonation, manage resources in all Google Cloud projects. You may want to limit this to CI jobs running in infrastructure projects. This is an example where you may want multiple GitLab Runner products for a single gcp-product-factory product: you can have a general GitLab Runner product using the "shared code" Google project for most GitLab projects but a dedicated deploy GitLab Runner product, scoped to a single GitLab project, using your service's gcp-product-factory product for deployment.

GitLab projects and groups

The gitlab-project-factory configuration manages a single GitLab group representing the "product" in GitLab and projects and sub-groups within that product group.

Danger

Nothing stops someone manually creating GitLab projects or groups within a product group which are not managed by gitlab-project-factory. Once we transition fully to gitlab-project-factory, manual creation of projects will be limited to a small set of users.

The common-case interaction between gitlab-project-factory and gcp-product-factory is fairly minimal:

In some cases it may make sense to group some projects within a gitlab-project-factory product together. There is a subgroup facility for this. Advanced users may need to obtain GitLab API tokens which are scoped to subsets of projects within the product or even to projects from unrelated services. The gitlab-project-factory documentation covers how to create these tokens.

A more complex case is shown in the following diagram:

Advanced users are responsible for specifying the IAM principals which can access additional API token secrets. IAM principals are covered in the identities section below.

GitLab CI runners

The gitlab-runner-infrastructure configuration is responsible for managing a set of GitLab CI runners which live within a single Kubernetes cluster.

A "product" in gitlab-runner-infrastructure names one of these runners. The common case is to have a single GitLab runner "product" for a service. Some of our CI jobs need to access a GitLab API token so that they may, for example, create automatic Merge Requests or create new releases when a repository is tagged. Deployment CI jobs need to impersonate an environment-specific terraform deployment service account in order to deploy services.

The relationship between gitlab-runner-infrastructure, gcp-product-factory and gitlab-project-factory is summarised in the following diagram:

The common case is that there is one GitLab CI/CD Runner for a given GitLab project factory "product" and, similarly, that there is one GitLab CI/CD Runner for a given Google product factory "product".

Identities

Access to Google Cloud Resources and GitLab API tokens is ultimately controlled by Cloud IAM permissions. Access is either granted directly to an IAM principal or, more usually, indirectly by allowing impersonation of a role-specific service account.

For example, all CI/CD runners have an implicit Google Kubernetes Engine Workload Identity and that identity is allowed to impersonate the deployment service accounts in each environment-specific Google Project.

Each CI/CD job itself has a unique identity within Google Cloud. This can allow for very fine-grained IAM policies for CI/CD jobs. This is covered in more detail in dedicated documentation.

This diagram covers how the Google Cloud identities available in our service infrastructure interact. Identities which can appear in IAM policies are marked with "🆔".

The following IAM principals can be used in IAM policies to set permissions. Note that some of these require you knowing the gcp-product-factory or gitlab-runner-infrastructure product names.

  • GitLab CI/CD Job principals (documentation):
    • CI jobs running in a specific GitLab project: principalSet://iam.googleapis.com/projects/421963284348/locations/global/workloadIdentityPools/gitlab/attribute.gitlab_project_id/«numeric-gitlab-project-id»
    • CI jobs running in a specific GitLab group: principalSet://iam.googleapis.com/projects/421963284348/locations/global/workloadIdentityPools/gitlab/attribute.gitlab_namespace_id/«numeric-gitlab-namespace-id»
    • CI jobs running triggered by a specific GitLab user: principalSet://iam.googleapis.com/projects/421963284348/locations/global/workloadIdentityPools/gitlab/attribute.gitlab_user_id/«numeric-gitlab-user-id»
  • GitLab CI/CD Runner principals (documentation):
    • Any CI job running within a specific GitLab CI runner: serviceAccount:gitlab-runner-prod-22257483.svc.id.goog[«runner-product-name»/gke-ci-run]
  • Product-wide Google Project principals (documentation):
    • Generic CI service account for CI not based in GitLab or if CI/CD jobs prefer to impersonate: serviceAccount:gke-ci-run@«meta-project-id».iam.gserviceaccount.com
    • GitLab API token secret accessor: serviceAccount:gitlab-token-accessor@«meta-project-id».iam.gserviceaccount.com
  • Principals in the production, staging or development Google projects:
    • Terraform deployment service account: serviceAccount:terraform-deploy@«workspace-project-id».iam.gserviceaccount.com

Advanced use of identities

The default case is that all CI/CD jobs within a product can impersonate the GitLab API token accessor and terraform deployment service accounts. This may be viewed as "overly broad" for some services and some service delivery teams may want to additional restrict deployments to production.

These use cases are, at the moment, hypothetical and more testing is required. A possible example of advanced use of identities is covered below.

Our service has three GitLab projects: infrastructure contains terraform to deploy the service, webapp contains a web application which is part of the service and sharedlib contains code which is used by the webapp project.

We'll assume that we have disabled the "permissive" IAM policies which allow CI jobs to access all GitLab projects and deploy to all environments.

We want:

  • To restrict deployment to CI jobs running in infrastructure.
  • To allow CI jobs in webapp to open issues in sharedlib if compatibility issues are detected in testing jobs.

We can do this by adding the following:

  • An additional API token in gitlab-project-factory which is scoped to a sub-group containing the webapp and sharedlib projects.
  • An iam_policy for this token which allows access from CI/CD jobs running in the webapp project.
  • An IAM policy allowing CI/CD jobs running in infrastructure to impersonate the terraform deploy service account.

In diagram form, this looks like the following:

Summary

We can collect all of the infrastructure diagrams so far together to show the full set of inter-relations between our product factory configurations. As above, principals which can be specified in Google Cloud IAM policies are marked with "🆔". Solid lines represent the default relationships. Dotted lines and dotted boxes represent additional resources and relationships which are useful in advanced cases.

Next steps

After reading this guide, the following pages may be of interest: