Secretless resource access with Workload Identity Federation¶

Draft

This page is a work in progress and may be incomplete or change significantly. Treat its contents with caution until this notice is removed.

This explainer guide covers the terminology and technology behind secretless cross-cloud authentication and authorisation. You will need to use this technology if you want to, for example:

impersonate a Google Service Account in a workload running on Azure or AWS,
access a Google Secret Manager secret from GitLab CI,
grant permissions for an Entra ID user to user PowerBI to query a BigQuery table, or
publish a Python package to PyPI without using a static API token.

We'll start by explaining the background in general terms and then cover specific case studies: accessing a Google Secret Manager secret from GitLab CI, authenticating as an AWS IAM Role using Google Service Account credentials and authenticating to Google Cloud from Azure.

Background¶

Google Cloud, Azure, AWS, GitLab, and Entra, along with lots of other services, share a common language and set of technologies called OpenID Connect (OIDC). OpenID connect covers generating short-lived identity tokens for identities managed by a given system managed and authorising identity tokens for identities generated by other systems.

We'll cover specifics below but, for the moment, let's take a step back and think about what "authentication" and "authorisation" mean as terms:

Authentication: determining which identity is associated with an incoming request.
Authorisation: determining if the identity making the request is allowed to perform the requested action.

Almost anything can be an identity. Some examples:

the Google user abc123@cam.ac.uk,
the Google service account foo@bar.iam.gserviceaccount.com,
GitLab CI job 123 in pipeline 456 named "foo" running on "bar" in project 789,
the Azure Managed Identity associated with Service X,
the Roger Needham Building,
UIS DevOps,
Sir John Major, or
The Chancellor, Masters and Scholars of the University of Cambridge.

The key thing to note is that an identity generally has two things: a name and some metadata which describes it. In the OIDC ecosystem they are referred to as the "subject" and "claims":

Subject: a "name" for an identity. For example, an email address or a full legal name.
Claims: metadata about an identity. For example a GitLab CI job id or nationality.

Authentication and authorisation in the real world¶

In the real world, how does something or someone prove their identity to you? Usually you look at some documentation issued by a third-party which you trust. For example, we trust the UK Passport Office to vouch for the name of an individual—the "subject"—and to provide some data about them, e.g. a photo, a name and citizenship status—the "claim".

In the real world, how do we authorise an identity to perform some action? We have some policy which we enforce. E.g. UIS HR has a policy that an identity must have the right to work in the UK. That policy specifies:

trusted identity providers, for example "the UK Passport Office",
requirements for identity documents, for example "must not have expired", and
requirements for the claims within the document, for example "must be a British citizen".

Occasionally identity documents may also specify that they're intended for a specific audience. For example a concert ticket is a form of identity document—they can contain the name of the individual who should be allowed entry to the concert—but they also name the specific concert that they refer to. One is unlikely to gain admittance to a Taylor Swift concert while holding a ticket to see Katy Perry.

Identity documents containing information on the intended audience is rare in the real world but very common in the digital domain.

Authentication and authorisation in the digital world¶

The real world example above aligns with how we do things in the digital world. In the digital world, we use OIDC and the OIDC ecosystem defines the following terms:

Identity token (also "id token" or just "token"): an identity document with a known format which encodes a subject, a set of claims and information on when the token expires. The information contained in an identity token is collectively called the "payload".
Issuer: some issuer of identity tokens. The "OIDC discovery" protocol can be used to go from the name of an issuer to a set of public keys for that issuer. The issuer is also recorded in the identity token's payload. The identity token is cryptographically signed by the private half of one of the issuer's public keys.
Audience: a description of who is the intended audience of an identity. The audience is also recorded in the identity token payload.

If digital identity wants to prove their identity they do so by having an issuer generate an identity token for them. They then present the token to the service they want to use.

The service checks that:

the identity token has not expired,
the audience claim matches what the service expects,
the service trusts the issuer of the token,
the signature of the token matches the issuer's public key, and
the remaining claims match an identity which the service is willing to authenticate.

If all of those checks pass, the identity is authenticated and the service will return some short-lived "access token" which the identity can use to make requests.

When the identity make a request, the service:

checks that the access token is still valid,
retrieves information on the identity corresponding to the access token,
uses the identity claims and information on the request to evaluate an access policy, and
performs or denies the request according to the access policy.

Overall, making use of federated identity requires that we, ahead of time:

configure the service with the expected issuer, audience and allowable claims for an identity token, and
specify an access policy which authorises identities to perform actions on resources.

Case studies¶

In this section we'll cover specific examples

Case study: GitLab CI jobs¶

Let's consider how might we authorise a particular CI job to access a particular secret in Google Cloud.

Recall that the steps we'll need to take are, ahead of time:

Configure Google Cloud how to authenticate CI job identity tokens.
Add Google Cloud IAM policies specifying which secrets a given CI job can access.

When we want to actually access the secret in the CI job, the steps the CI job needs to perform are:

Get an identity token from GitLab which identities the job.
Pass the token to Google and get an access token in return.
Use the access token to access the secret.

We'll cover all those stages in more detail below.

Telling Google Cloud how to authenticate CI jobs¶

Ahead of time we need to configure Google to understand GitLab-issued identity tokens. Fortunately this has already been done for you!

The gcp-infra project configures a workload identity pool using terraform. You can see the full configuration in the gcp-infra project but the basic form is:

resource "google_iam_workload_identity_pool" "gitlab" {
  workload_identity_pool_id = "gitlab"
}

resource "google_iam_workload_identity_pool_provider" "gitlab_uis_devops" {
  workload_identity_pool_id          = google_iam_workload_identity_pool.gitlab.workload_identity_pool_id
  workload_identity_pool_provider_id = "uis-devops"

  # A mapping between claims in the identity token and attributes which can be used in Cloud IAM
  # policies.
  attribute_mapping = {
    "google.subject"                         = "assertion.sub"
    "attribute.gitlab_project_id"            = "assertion.project_id"
    # ... others ...
  }

  # Add an additional condition that identities in this pool must correspond to one of the projects
  # under uis/devops.
  attribute_condition = "assertion.project_path.startsWith('uis/devops/')"

  oidc {
    issuer_uri = "https://gitlab.developers.cam.ac.uk"
  }
}

This contains all the required information to specify which identity tokens Google will accept. Google will authenticate an identity token for a CI job if it matches the following requirements:

The token has not expired. (This is not explicitly configured because Google always does this.)
The token's audience matches the globally unique name for the workload identity pool provider. (This can be seen in the GOOGLE_WORKLOAD_IDENTITY_POOL_AUDIENCE CI variable set at uis/devops level in GitLab.)
The token must have an issuer matching https://gitlab.developers.cam.ac.uk.
The token is signed with the key corresponding to that issuer. The OIDC standard specifies that this information is recorded in a well-known location based on the issuer's name.
The project_path claim starts with the string uis/devops/.

If a GitLab CI job identity token passes all of those checks then Google will issue them an access token.

Configuring the Workload Identity Federation pool does not let CI jobs actually do anything. It only establishes trust for authentication; it does not grant permissions for the CI job to perform any actions. There is no authorisation implied by the Workload Identity Federation pool simply existing.

Authorising a CI job to access a secret¶

To authorise a CI job to perform an action, we use a Cloud IAM policy as per usual using a specially formatted IAM principal. A list of IAM principals corresponding to GitLab CI jobs can be found in the service infrastructure explainer but, in brief, to allow CI jobs running in GitLab project id "1234" to access a secret you'd use some terraform like the following:

resource "google_secret_manager_secret_iam_member" "secret_accessor" {
  project   = "..."
  secret_id = "..."

  role = "roles/secretmanager.secretAccessor"

  member = join("", [
    "principalSet://iam.googleapis.com/projects/421963284348/locations/global/",
    "workloadIdentityPools/gitlab/",
    "attribute.gitlab_project_id/1234"
  ])
}

GitLab CI jobs can request an identity token using id_token in their job configuration.

Ahead of time:

We create a "Workload Identity Pool" in Google Cloud. This specifies the expected issuer of identity tokens, the expected audience and any further restrictions on allowed payload contents.

The steps are:

The CI job gets a temporary token signed by the private half of a key known to gitlab.developers.cam.ac.uk which. The token contains information on the CI job, the "attributes" or "payload") and information on who the token is intended to be used by (the "audience").
The CI job gives the token to Google
We need to find some authority we trust to provide authentication credentials for a CI job. We can trust the GitLab instance providing gitlab.developers.cam.ac.uk since it is the ultimate authority on CI job metadata.
We need some way of a CI job requesting an identity document from gitlab.developers.cam.ac.uk. This can be done by using the id_token configuration for the CI job.
We need to configure Google Cloud to trust credentials provided by gitlab.developers.cam.ac.uk. This is done by creating a "Workload Identity Federation pool".
Google specifies how a CI job provides the identity document to Google Cloud to get back an API token the CI job can use to fetch the secret. This is usually implemented in some dedicated Google auth library but one can do it manually, e.g. via curl.
A Cloud IAM policy in Google Cloud specifies which identities can access a secret.

Let's break down the list a little.

How gitlab.developers.cam.ac.uk acts as an identity document issuer¶

If you visit https://gitlab.developers.cam.ac.uk/.well-known/openid-configuration, you'll see a JSON document which, in turn points to a set of public keys which gitlab.developers.cam.ac.uk can use to sign documents.

In OIDC Connect Discovery an "issuer" is simply a URL where, if you append "/.well-known/openid-configuration", you'll get that JSON document. It is called the "discovery document". The discovery document URL has to be served over https and so we piggy-back on the usual web CA-based trust to make sure that the keys we get from that URL belong to the real gitlab.developers.cam.ac.uk service.

One "trusts" gitlab.developers.cam.ac.uk as an issuer if one accepts identity documents signed by one of the public keys referenced in the discovery document.

How CI jobs get an identity document¶

There is a little dance CI jobs can do with the id_tokens field in their configuration to ask GitLab to make an identity document for it which contains information such as the CI job number, the CI pipeline number, the project id, etc, etc. The GitLab documentation describes the complete set of metadata added to the document.

The identity document is called the "identity token" and the set of metadata added to the document is called the "identity token payload".

One bit of metadata which all OIDC Connect identity tokens should contain is something called the "audience". This is who the identity issuer expects to consume the identity token. (This is how one stops an identity token intended to authenticate to Google Cloud being used to authenticate to AWS as well.)

The audience for the id token is configured in the id_tokens field within .gitlab-ci.yml.

How we configure Google to trust GitLab-issued identity documents¶

Google Cloud uses "Workload Identity Federation pools" to define a trust relationship with an external identity document issuer. The UIS DevOps team configures this pool in a terraform file within the gcp-infra project. This configuration specifies:

The identity issuer to trust (gitlab.developers.cam.ac.uk).
The expected metadata in the identity documents.

For our pool, we also require that:

the identity token audience must be exactly: //iam.googleapis.com/projects/421963284348/locations/global/workloadIdentityPools/gitlab/providers/uis-devops, and
the full path of the GitLab group which the CI job is running in must start uis/devops.

This group path requirement ensures that the WIF pool is scoped specifically for CI jobs running under uis/devops. Crucially, configuring a WIF pool only establishes trust; it does not grant permissions for the CI job to do anything yet.

Note that configuring a Workload Identity Federation pool doesn't, by itself, let CI jobs do anything. All it does is tell Google Cloud about gitlab.developers.cam.ac.uk as an identity token issuer, specifies which identity tokens we accept and what metadata we expect to find in the identity token payload.

How a CI job authenticates itself to Google¶

When a CI job wants to do the little dance to convert its gitlab.developers.cam.ac.uk-issed identity token into a Google Cloud API token, it needs to provide Google with the Workload Identity Federation pool it wants to use. Also, the audience of the identity token must match that configured in that Workload Identity Federation pool.

We provide this required audience and details on the Workload Identity Federation pool in CI variables set on the uis/devops group.

When it comes to actually doing the authentication dance, Google authentication libraries all implement the same process whereby they look at the value of the GOOGLE_APPLICATION_CREDENTIALS environment variable. If that exists, it should point to a JSON document in the file system. For Workload Identity Federation, that document needs to include the Workload Identity Federation pool id and a URL or file path which can be used to fetch the gitlab.developers.cam.ac.uk-issued identity token. We provide a little CI template fragment which arranges for an appropriately formatted JSON document to be present for CI jobs.

Note that the JSON credentials document is not, by itself, sensitive; it just contains the id of the Workload Identity Federation pool and a pointer to where to find the identity token.

Since the use of GOOGLE_APPLICATION_CREDENTIALS is implemented by all Google-provided authentication libraries, this Just Works™ with applications like terraform without any additional configuration.

So, in summary, by having CI job configuration which creates an identity token and a CI template fragment which creates the appropriate file pointed to by GOOGLE_APPLICATION_CREDENTIALS, anything using one of the Google-provided client libraries will transparently do the authentication dance to Google and, from the point of view of Google, will be acting as an identity associated with the Workload Identity Federation pool.

How we grant permissions to CI job identities in Google¶

This is, in some sense, the simplest piece of the puzzle since we already use Cloud IAM policies everywhere to, e.g., grant access to secrets, allow impersonation of the terraform deployment service accounts, etc, etc.

Rather than using one of the familiar group:..., user:... or serviceAccount:... policy members, we instead have to use a special principalSet://... member. These identities are listed on a guidebook page but if we're going to make greater use of them, we might want to promote the relevant section on that page to its own reference page.

Authorisation is managed via Cloud IAM policies, similar to how we grant access to service accounts or users. However, instead of the familiar group:..., user:... or serviceAccount:... policy members, we instead have to use a special principalSet://... member.

These specific identities are derived from the Workload Identity Federation attribute values. For example, to allow CI jobs running in GitLab project ID 12345 to access a secret, we grant the roles/secretmanager.secretAccessor role on said secret to the following IAM member:

    principalSet://iam.googleapis.com/projects/421963284348/locations/global/workloadIdentityPools/gitlab/attribute.gitlab_project_id/12345

We could similarly construct principalSet://... members which specify "only protected branches" or "only jobs named 'release'", etc.

Summary¶

Workload Identity Federation (WIF) is the core mechanism enabling secure, keyless authentication and authorisation for GitLab CI jobs to access Google Cloud resources. This removes the need for storing long-lived, sensitive credentials, making our Continuous Delivery process more secure and auditable.

The overall process is divided into three phases:

1. Central Setup (gcp-infra)¶

The UIS DevOps team configures the WIF pool. This defines the trust relationship with GitLab by specifying gitlab.developers.cam.ac.uk as the trusted identity issuer and imposing constraints on the identity token payload, such as requiring a specific audience and ensuring the CI job is running within a group path starting with uis/devops.

2. Product-Specific Configuration¶

For any product needing Google Cloud access:

Cloud IAM Policy: The product's Terraform deployment adds Cloud IAM policies that grant specific permissions (roles) to the CI job identity, using the principalSet://... member format.
CI Configuration (.gitlab-ci.yml): CI jobs are configured with id_token fields, specifying the required audience, and utilizing the standard template fragment to create the Google credentials file referenced by GOOGLE_APPLICATION_CREDENTIALS.

3. Runtime Authentication and Authorisation Flow¶

Identity Token Issuance: When a CI job runs, GitLab issues a signed identity token (OIDC ID Token) containing job metadata and makes it available as a CI variable.
Credentials File Generation: The CI template fragment generates the non-sensitive Google credentials JSON file, pointing to the WIF pool and the location of the identity token.
Token Exchange: A Google-provided client library (used by tools like Terraform) detects the GOOGLE_APPLICATION_CREDENTIALS file, fetches the GitLab identity token, and performs a "little dance" with Google. This process validates the identity token against the WIF pool's configuration and exchanges it for a short-lived Google Cloud API token associated with the CI job's unique identity.
Authorisation Check: The tool uses the API token to call a Google Cloud API. Google Cloud's IAM system checks the active Cloud IAM policies to confirm that the identity associated with the API token is authorized to perform the requested action. -