This page documents how we handle secret or sensitive values in our deployments. We also touch on how we handle secrets more generally within the team and the technologies we use.
Secret handling technologies are subtle and quick to anger. When attending a presentation where someone is describing their latest and greatest deployment methods, ask at the end: "How do you handle secrets?" We generally find that the answer is always some variant of "badly" followed by a desire to do it better.
Robust, simple and secure secret handling is an industry-wide challenge and we will always have ways we'd like to improve. This page documents where we are, not where we want to be.
Types of secret¶
Very broadly we find that the secrets we have to deal with can be divided into two types:
- Shared secrets are those where a secret value is shared between two ends of a connection. For example, an API key is a shared secret: the client and server must both know the API key to authenticate the request.
- Private secrets are those only one end of a connection needs to know. For example, a private key is needed to make use of a TLS certificate to host a website securely but it is only needed by the web server, not the client.
Both shared and private secrets can further be divided into two sub-types:
- External secrets are those which are provided by some external source. For example we may be given credentials for a database and those credentials are controlled by the database administrator.
- Internal secrets are those which we create. For example, if we are hosting both the database and web server, although database user credentials are shared between the web server and database, we are free to change the credentials at any point so long as we update both ends.
In general it is easy to rotate internal secrets but harder to rotate external ones. In some cases we may not even be able to rotate the external secret. It is also generally easier to regenerate an internal secret if a service needs to be re-deployed from scratch. By contrast, an external secret must be preserved if a service is re-deployed. As such it is more important to provide safeguards against accidental destruction of external secrets.
A real-world example of an issue with external secrets arose when we were integrating a legacy system with a vendor's product which provided a similar service over an API.
We inadvertently committed the API secret key to a source control repository and so attempted to reset the secret in the product's API console. We found that not only did the vendor not have a self-service mechanism for API key rotation, we could not have any overlap period when the old and new API keys would be valid. We therefore had to co-ordinate a manual switchover at the same time between both the vendor and ourselves.
We would prefer not to repeat this.
No one technology is ideal for dealing with all secrets and so below we'll make it clear which types of secret a particular technology is best at dealing with.
Every team member should take personal responsibility for maintaining the security of their credentials for 1Password. This includes:
- using a strong master password, and
- enabling two-factor authentication for their account.
We use 1password as the canonical source for external secrets. These include but are not limited to:
- shared account credentials for websites, API consoles or other applications which need to be accessed without using personal credentials,
- TOTP keys for shared account credentials where two-factor authentication is supported,
- external service access credentials minted for us by the service operators, and
- other secrets which cannot freely be rotated and must be persisted if a service needs re-deployment.
We maintain one vault per product. This allows us to limit access by only giving team members access to a vault when they need it. Additionally we can choose whether a given team member has write or read-only access on a case by case basis.
We have a single "team" vault for secrets which are shared among all products. For example, credentials for the GitLab bot user reside in that vault.
There is a monetary cost associated with adding team members to 1Password. If someone external to the team needs to be granted access to a vault for some reason they should be provisioned with a guest account.
Integration with terraform¶
While 1Password is considered the canonical source for external secrets, it is usually the case that our deployments cannot directly fetch secrets from 1Password. As such we have to copy an external secret from 1Password into some other secret storage to actually deploy a product.
We have a 1Password secrets
for terraform which supports loading secrets from 1Password at
apply-time so that they can be copied into the deployment.
Unfortunately our terraform deployments usually run within a container configured to use a service account identity. This does not play well with 1Password which thinks in terms of individual team members.
At the moment we square this particular circle by having a dedicated terraform configuration which is intended to be run outside of a container by the individual deploying the software. This configuration only fetches secrets from 1Password and copies them into the deployment. As such it needs only to be run when 1Password secrets change.
An example of this approach can be found in the Raven infrastructure project (DevOps only).
Some internal secrets need never leave the deployment and have their canonical source in a particular environment's terraform state.
For example, our Django webapps need a secret key which is used to encrypt the ephemeral state cookie. This secret is internal, private and may rotated with limited adverse effects; in most applications, changing the key has the relatively minor drawback of requiring users sign in once again.
Since the effect of re-creating this secret is relatively minor, we make use of the terraform state as the canonical source.
We use the same approach for internal shared secrets such as database user
credentials. When both the database user and the web application which uses the
database are deployed in the same terraform configuration, using a
resource to generate a password suffices. Rotating the password can be done by
terraform state rm-ing the old password resource and re-running
Google's Secret Manager¶
Google's Secret Manager is our default choice for where secrets will need to be accessed at runtime by applications. This is the case for most database and API credentials.
For example, with serverless web applications we store Django settings in a Secret Manager secret as a YAML-formatted document. This document is read at runtime by the web application.
Secret Manager secrets are protected by Cloud IAM roles which means we can separately control which applications can read secrets and which can write them.
Example: Raven device statistics
The Raven device statistics API makes use of a JSON document in a Cloud Storage bucket which is updated periodically by a script (DevOps only) run via Cloud Scheduler. This script loads its configuration and any required credentials from a Secret Manager secret.
Internal secrets generated by terraform are usually directly placed into Secret Manager secrets within the terraform configuration. External secrets are usually copied into Secret Manager secrets from 1Password by means of the "bootstrap" terraform configuration described above.
Kubernetes provides its own secret implementation. These are key-value pairs which ordinarily are not viewable in the console. Kubernetes secrets may be encrypted at rest and made subject to role-based access controls.
It is rare for a kubernetes secret to be the canonical source of a secret. Usually we either copy a 1Password-based or terraform-generated secret into a kubernetes secret for the sole purpose of allowing a kubernetes pod to read it.
Secrets which are internal to a cluster may occasionally be canonical. For example, the implementation of Raven SAML2 makes use of an ephemeral key for encrypting state. This key is stored in a kubernetes secret and is periodically rotated by a kubernetes CronJob (University members only). Since the secret is generated by one kubernetes pod and consumed by another, the kubernetes secret itself is the canonical source.
Ultimately, secrets need to appear in a place where our applications can see them. Historically we used environment variables to configure our applications. This works well for non-sensitive values but exposing sensitive values in environment variables is dangerous for at least two reasons:
- If an attacker can spawn a process on the container host, it is trivial
to inspect the environment variables of any process. We shouldn't view a
container as a security boundary but neither should we make all our secrets
visible to any process which can read
/proc/. This is doubly-true for serverless computing where we should not assume that our containers have exclusive use of the container host. Cloud Run provides container isolation via gVisor but, for portability, we should not assume that to be the case for all serverless hosting platforms.
- The value of the environment variables are visible in the console. Even if we trust all of the people who can view the service configuration, we don't want to provide an easy route for malicious exfiltration via screen loggers or inadvertent disclosure in screenshots.
To avoid sensitive values appearing in the environment, we have re-architected our applications to load some configuration at runtime from secrets.
- Secrets can be categorised as internal or external and private or shared.
- For external secrets and internal secrets which cannot be easily rotated we use 1Password as the canonical source.
- Internal secrets which can easily be rotated are generated dynamically in terraform and the terraform state is the canonical source.
- We use a single 1Password vault per product.
- Secrets which need to be read at runtime are usually kept in Google Secret Manager secrets or kubernetes secrets depending on how the application is deployed.
- Google Secret Manager secrets and kubernetes secrets are rarely the canonical source of secrets. Exceptions are usually secrets which are dynamically rotated by scheduled jobs.
- We prefer secrets be exposed to applications directly in Secret Manager secrets or by being mounted in the filesystem. Avoid exposing secrets in environment variables where possible.