Skip to content

Integrating AWS with our infrastructure

This guide discusses in greater depth the rationale and design behind our AWS integration. There is a more how-to focused guide if all you want to do is add AWS integration to your product.

Why we would use AWS

Generally speaking we use Google Cloud for all cloud resources. However there are occasions where Google-provided solutions do not suffice. For example, Google provides no direct analogue of Amazon's Simple Email Service. In other cases we may need to integrate with third-party solutions which require the use of AWS.

Design aims

After a few prototypes of integrating with AWS, and noting the pain points which were exposed, we settled on the following design aims:

  • as our use of AWS is currently limited, accept some a manual AWS account setup so long as it does not preclude automation at a later stage,
  • do not require users to manage any additional credentials,
  • to align with current AWS best practice, do not have long-lived AWS access keys or ones which cannot be rotated automatically,
  • to avoid maintaining a parallel set of roles, make use of the existing Google Cloud permissions model when possible, and
  • preserve our existing permissions model:
    • allow existing "editor" users to access the AWS console and aws CLI tool, and
    • allow existing "deploy" users to be able to seamlessly manage AWS and Google Cloud resources within the same terraform configuration.

Differences between AWS and Google Cloud

AWS and Google have many similar concepts but often use different names for them. This table summarises the different concepts you'll need to understand in order to follow this guide:

Google AWS Description
Project Account Collection of all the resources corresponding to a single environment such as "production" or "staging"
Folder Organizational Unit (OU) Collection of all of the projects/accounts for a single product
User IAM User An identity corresponding to a "real person" using password authentication
Service account IAM Role An identity corresponding to a machine or process using other means of authentication
IAM Policy IAM Policy A document granting a set of permissions to one or more identities

In Google Cloud the namespace for Users is global; there can be only one spqr2@cam.ac.uk user across all projects. In AWS the namespace for IAM Users is per-account; each account can have its own spqr2@cam.ac.uk user which is independent of any others.

In addition, AWS has the concept of a "root user" which has no real analogue in Google. There is exactly one root user per AWS account and the root user namespace is global.

Ordinary IAM users in AWS sign in via an account-specific sign in page whereas root users sign in via a shared sign in page for all accounts.

Ideally you only ever sign in as the root user once in order to configure the account as described in this guide and then never again. Subsequent sign ins are done by impersonating a per-AWS account IAM Role.

Our existing permissions model

As our use of AWS is rare, we'd like to avoid having to develop an entire parallel permissions model. The model we use with Google Cloud is explained in a separate guide but in brief:

  • Real Person "admin" users sign in to Google with gcloud admin accounts which are managed by the gcp-workspace-management terraform configuration.
  • These accounts appear in "editor" or "deploy" groups for a product. For example, hr-editors@g.apps.cam.ac.uk are "editor" users for HR-related applications.
  • Members of the "editor" group can sign in to the Google Console and manage resources via the gcloud CLI tool.
  • There is a "terraform deploy" service account for each environment such as "production" or "staging" which is given rights to manage Google Cloud resources.
  • Members of the "editor" and "deploy" groups can impersonate the "terraform deploy" service account in order to apply the terraform configuration for a given product.

A design aim was to keep this model without having to add additional manually managed permissions. Specifically:

  • if you are in the "editor" group for a product, you should be able to access the AWS console with full permissions, and
  • if you can impersonate the terraform deploy service account, you can apply a terraform configuration which includes AWS resources.

AWS account creation

It reduces friction when interoperating between cloud providers if we make sure we arrange our deployments in similar ways to one another. As we create one Google project for each environment, we also create an AWS account for each environment.

A design aim was that this be manual for the moment but amenable to automation via terraform at a later date. The how-to guide on creating AWS accounts covers the process in depth.

User and Role management within AWS

Unlike Google, "users" in AWS are per-account rather than global. As such, if we wanted to create a real AWS IAM user for each member of the "editor" group, each member of that group would have to manage separate sign in credentials for each environment. This quickly becomes tiresome and error-prone.

Instead we make use of two AWS IAM Roles:

  • a "TerraformDeploy" role which is used by terraform to manage resources, and
  • an "Admin" role with all permissions and which is assumed by "editor" users when they want to access the console.

In order to meet our design aim we need a way for the Google-side "terraform deploy" service account to be able to assume the TerraformDeploy IAM Role and a way for "editor" group members to assume the "Admin" role and access the AWS console. Ideally neither of these should require additional credentials.

Federated authentication via web identity

AWS provides a mechanism known as OIDC federation which lets us achieve the goal of letting Google-side users and service accounts assume AWS-side IAM Roles.

An OpenID Connect (OIDC) token is a JWT-formatted token signed by an issuer. Crucially the token includes information within it such that the public keys for that issuer can be found via the OIDC discovery document.

You can see an example of this if you run the gcloud auth print-identity-token command and paste the result into https://jwt.ms/.

An example Google identity token being parsed and verified by https://jwt.ms.

Despite the jwt.ms tool being run by Microsoft and the identity token being issued by Google, the token includes enough information to retrieve the public key and verify the token.

Tokens include a number of "claims" within them. For our purposes we care about the sub and aud claims.

  • The sub or "subject" claim describes which IAM principal is being authenticated by the token. Each Google service account has a unique numeric id which is used to populate the sub claim.
  • The aud or "audience" claim describes which audience the token is intended for. This can be set to any value but it is useful to make sure that tokens created for one purpose are not used for another. For example, tokens used to authenticate to a Cloud Run service require that the audience match the URL of the service. This means that one cannot reuse a token created for one service to authenticate to another.

In the AWS how-to guide we cover the process in detail but, in brief, AWS lets us use these sort of tokens for authenticating as IAM Roles by setting something known as a trust policy. You specify the expected audience, issuer and subject of an identity token and AWS will take care of performing the necessary steps to verify the token. AWS calls this process AssumeRoleWithWebIdentity.

Within the trust policy we require an audience claim which matches the Amazon Resource Name (ARN) of the IAM Role itself. Again, this is a method to ensure that an identity token intended for assuming a particular role can't be reused to assume a different one.

The use of OIDC identity tokens meets another design aim of not requiring additional credentials: a sufficiently privileged ...@gcloudadmin.g.apps.cam.ac.uk user can obtain AWS credentials through service account impersonation and AWS's support for web identity authentication.

The "TerraformDeploy" IAM Role

When adding AWS to a product, we manually create the "TerraformDeploy" IAM Role and its trust policy. The expected subject claim corresponds to the appropriate terraform deploy Google service account and the expected issuer is set to accounts.google.com.

The TerraformDeploy IAM Role is granted permissions necessary to deploy resources within AWS. It is the analogue of the terraform-deploy@... service account in Google Cloud deployments.

When terraform runs, it acts as the TerraformDeploy role. As such you ideally want to give this role just enough permission to deploy your product but not too much. As you add more AWS resources to the deployment you will likely have to update the TerraformDeploy role's IAM policy to match.

The primary reason to restrict the TerraformDeploy role's permissions is to guard against inadvertent actions and not to guard against malicious actions. The default policy in the getting started how-to guide includes the iam:* permission for brevity. This permission is sufficient to configure the AWS console user role but may also be considered overly broad in some circumstances. For example, this permission would trivially enable the TerraformDeploy role to add additional permissions to itself.

Users comfortable with AWS permissions policies may want to, for example, add a condition to the policy which grants TerraformDeploy iam:* permissions but not for itself.

As the TerraformDeploy role AM policy is likely to be service-specific, it is recommended that you keep a copy of the TerraformDeploy role's IAM policy in the infrastructure project for reference.

Having terraform pull itself up by its bootstraps

As noted above, the iam:* permissions from the how-to guide allow terraform to modify its own IAM policy. Advanced users may wish to attempt to manage the TerraformDeploy role's policy in terraform itself although that requires careful use of depends_on or the like in order to ensure that the IAM policy is updated before terraform attempts to manage resources.

Terraform configuration

In order to configure terraform to manage AWS resources we perform a two-step process:

  1. Get a Google-signed identity token for the terraform deploy service account with the appropriate audience set.
  2. Exchange that token for temporary AWS credentials.

The first step is done via a data resource:

# Generate an id token for the terraform-deploy service account for the current workspace used to
# authenticate to AWS.
data "google_service_account_id_token" "aws_terraform_deploy" {
  target_service_account = local.workspace_config.terraform_sa_email
  target_audience        = local.aws_deploy_role_arn

  provider = google.impersonation
}

Note

If you're interested in how the various locals used here are defined, look at the full AWS how-to guide.

The token is then used with the AWS terraform provider to assume the TerraformDeploy role:

# AWS is authenticated using a role corresponding to the terraform deploy service account.
provider "aws" {
  region              = local.aws_region
  allowed_account_ids = local.aws_allowed_account_ids

  assume_role_with_web_identity {
    role_arn           = local.aws_deploy_role_arn
    web_identity_token = data.google_service_account_id_token.aws_terraform_deploy.id_token
  }

  default_tags {
    tags = {
      Project     = local.gcp_config.product_display_name
      Environment = terraform.workspace
    }
  }
}

The "Admin" IAM Role and AWS console access

The corresponding "Admin" IAM Role and associated trust policy is added by terraform itself. The IAM Role is granted full permissions on the AWS account which include access to the AWS console. See the AWS how-to guide for full details.

The Admin IAM Role is configured to trust identity tokens for a matching Google-side "AWS admin" service account. A Google Cloud IAM policy is added to allow members of the "editor" group to impersonate this account.

An "editor" user can access the AWS console in the following way:

  • use their ...@gcloudadmin.g.apps.cam.ac.uk account to obtain an OIDC token for the "AWS admin" Google service account via service account impersonation,
  • exchange that OIDC token for temporary AWS credentials via the AssumeRoleWithWebIdentity API,
  • use the AWS credentials to get a time-limited sign in token via the getSigninToken API,
  • construct an AWS Console URL using the sign in token, and
  • visit that URL in a web browser.

This can be done by hand with a combination of the gcloud, aws and curl tools and such a procedure is documented by AWS. However, that is tedious in the extreme and so we have written a tool called aws-helper which automates this little dance. Information on how to use the tool can be found in a dedicated guide.

As a nice side-effect, with a little more effort the aws-helper tool can also act as a credential helper for the aws CLI tool. This is covered in a separate how-to guide.

Summary

In summary, our design aims were:

  • use a manually created AWS account structure which does not preclude automation at a later stage,
  • do not require users to manage any additional credentials,
  • make use of the existing permissions model,
  • do not have long-lived AWS access keys,
  • allow existing "editor" users to access the AWS console and aws CLI tool, and
  • allow existing "deploy" users to be able to seamlessly manage AWS and Google Cloud resources within the same terraform configuration.

The creation of AWS accounts mirrors our Google project layout and the TerraformDeploy IAM Role mirrors the terraform deploy service account. As such later iterations of, e.g., the Google Cloud product factory could be extended to create parallel AWS accounts.

The use of federated OIDC identity tokens allows us to use our existing permissions model to gate actions behind impersonation of Google service accounts. These service accounts then map 1:1 to a corresponding AWS IAM Role which can be assumed using a service account OIDC identity token. Service account impersonation requires no additional credentials to be managed by users and we can set impersonation IAM policies which align with the existing "editor" and "deploy" roles in products.

See also