Skip to content

Kubernetes Clusters

Kubernetes, often shortened to "kubernetes", is a cluster management and workload orchestration system which can be used to host container-based applications.

This page documents how we configure and make use of kubernetes clusters in our deployments.

When to use kubernetes

Kubernetes excels when your application is made up of multiple containers which need to interact with each other and/or maintain some shared state outside of a database. Kubernetes provides dedicated resources for these use cases which are tedious to replicate via other means.

That being said, we rarely make use of kubernetes in our deployments for the following reasons:

  • We need to dedicate at least one and typically three VMs along with associated storage for a minimal cluster. This prevents us from leveraging "scale to zero" optimisations.
  • Even for a fully occupied VM, the per second cost is unfavourable compared to solutions such as Cloud Run.
  • Leaving aside cluster size optimisation, configuring autoscaling within the cluster per application or per container pod is tricky.

As such we tend to use kubernetes only when:

  • The application we are deploying requires kubernetes, for example by being packaged as a helm chart. This is the case with GitLab which is deployed via terraform configuration in a Dedicated GitLab project (DevOps only).
  • We require specific container affinity for load balancing. This is the case for Raven SAML2. The Shibboleth software requires that conversational state always be maintained within a single container and so requires advanced load balancing configuration.
  • We require use of advanced kubernetes features such as sidecar containers or stateful sets. This is the case for GitLab.

We prefer the following technologies over kubernetes when possible:

  • Use Cloud Run for single-container hosting where that container listens via HTTP. We use these instead of kubernetes ReplicaSet or DaemonSet resources.
  • Use either Cloud Run's inbuilt HTTP load balancer or explicit Google Cloud Load Balancing resources for ingress. We use these instead of kubernetes Ingress resources.
  • Use Cloud Scheduler for triggering scheduled jobs. We use these instead of kubernetes CronJob resources.

In order to increase isolation and to aid migration of service management from one team to another, when we use kubernetes we create one cluster per environment and per service.

Creating the cluster

In Google Cloud, kubernetes clusters consist of one or more VMs. Clusters which have regional high-availability must have one VM per availability zone within the region. We use high-availability clusters for production service instances and single-VM clusters for test and development instances.

Cluster creation is usually via a single gke.tf file taking some values from the boilerplate's locals.tf file:

module "cluster" {
  source = "git::ssh://git@gitlab.developers.cam.ac.uk/uis/devops/infra/terraform/gke-cluster.git?ref=v2"

  project  = local.project

  # For single VM clusters, we need to use a zone like "europe-west2-a" rather
  # than a region.
  location = loca.is_production ? local.region : "${local.region}-a"

  # Usually we find ourselves needing to tweak the VM size to fit a given
  # application. This is simply an example of a 2 vCPU, 16GiB RAM machine.
  machine_type = "e2-custom-2-16384"

  # Google Cloud can associated a Google Cloud IAM identity with each workload
  # in the cluster. It is harmless to enable this and useful to be able to call
  # Google APIs without needing to pass additional credentials.
  enable_workload_identity = true
}

Our module can take other arguments as well. See the full list in the module project itself.

Kubernetes terraform provider

The standard kubernetes provider can be used to create some kubernetes resources as outlined in the provider documentation.

Our cluster module binds roles to the terraform Google Cloud user allowing it to perform cluster admin tasks. As such we can configure the provider to use the same credentials as the Google provider in providers.tf and versions.tf:

# providers.tf

# The google_client_config data source fetches a token from the Google Authorization
# server, which expires in 1 hour by default.
data "google_client_config" "default" {
}

provider "kubernetes" {
  host  = "https://${module.cluster[0].endpoint}"
  token = data.google_client_config.default.access_token
}

# versions.tf

terraform {
  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.7"
    }
  }
}

Examples of using the associated kubernetes_... resources can be found within the Raven SAML2 deployment (DevOps only).

Custom resources

Some kubernetes resources are not exposed directly by the terraform kubernetes provider. For these resources we use the kubernetes_manifest resource.

For example, to create a new Google managed certificate for example.apps.cam.ac.uk:

resource "kubernetes_manifest" "managed_certificates" {
  manifest = {
    apiVersion = "networking.gke.io/v1"
    kind       = "ManagedCertificate"

    metadata = {
      name = "example-cert"
    }

    spec = {
      domains = ["example.apps.cam.ac.uk"]
    }
  }
}

Examples of custom resources can be found within the Raven SAML2 deployment (DevOps only).

Monitoring

Applications hosted by kubernetes can be monitored in the usual fashion. In addition we add additional alerts for high memory, disk or CPU usage by individual nodes and pods. See the monitoring configuration for Raven SAML2 as an example (DevOps only).

Summary

In summary:

  • We use kubernetes only when we cannot make use of other cloud-hosting technologies.
  • We have a standard terraform module for creating kubernetes clusters.
  • Kubernetes resources are usually managed via terraform directly.
    • If a required resource type is supported by the hashicorp kubernetes provider, use the hashicorp provider's resource.
    • Unsupported resources can use the generic kubernetes_manifest resource.
  • We monitor node CPU, memory and disk usage and alert when any of these become high for a sustained period.
  • We are happy to use helm for third-party applications but have decided that it is one extra layer of indirection we don't need for our own applications.