Skip to content

Monitoring and Alerting

This section of the guidebook discusses how we implement proactive monitoring and alerting for web applications using the gcp-site-monitoring Terraform module. The module provides a standardised baseline for site health checks, alerting, and certificate monitoring.

This module is utilised directly by various products, as well as indirectly through its inclusion in gcp-cloud-run-app module.

Overview

The gcp-site-monitoring module provisions the following core monitoring features:

  • Uptime Check: Periodic probes to ensure application endpoints are responsive.
  • TLS Certificate Expiry Check: Alerts if the certificate is approaching expiry.
  • Absent Metric Alerting: Detects if uptime monitoring is misconfigured or silently failing.

These checks form the minimum common monitoring standard for web applications we manage.

Note

Alert severity customisation is available starting from version 6.1.0. See Advanced Customisation for more information.

Core Configuration

A minimal usage of the module only requires:

Example:

module "monitoring" {
  source  = "gitlab.developers.cam.ac.uk/uis/gcp-site-monitoring/devops"
  version = "~> 6.0"

  host                         = "www.example.com"
  alert_notification_channels  = ["projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]"]
}

Advanced Customisation

The module supports a range of optional parameters to tailor monitoring based on application specifics:

Uptime Checks

You can customise:

  • The request path (e.g. /healthz, /status, etc.)
  • Frequency and timeout of checks
  • Alert severity (valid values are "CRITICAL", "ERROR", or "WARNING")
  • Threshold for failed checks (default: 75% success)
  • Content matchers for verifying response content (e.g., JSON path checks)

Example:

uptime_check = {
  path                      = "/healthz"
  timeout                   = 60
  period                    = 60
  failed_severity           = "CRITICAL"
  missed_severity           = "ERROR"
  success_threshold_percent = 90
}

TLS Certificate Alerts

Ensure TLS certificates don’t expire unnoticed. You can customise:

  • Minimum age of certificate
  • Alert severity (valid values are "CRITICAL", "ERROR", or "WARNING")

Example:

tls_check = {
  alert_enabled = true
  minimum_age   = 15
  severity      = "WARNING"
}

Note

When using a proxy, TLS checks are disabled automatically because the proxy masks the real certificate.

Authentication Proxy for Protected Services

Some internal services require authentication. In those cases, we enable an authentication proxy using a Cloud Function to forward monitoring requests to internal endpoints.

authentication_proxy = {
  enabled                   = true
  cloud_run_project         = google_cloud_run_service.webapp.project
  cloud_run_region          = var.cloud_run_region
  cloud_run_service_name    = google_cloud_run_service.webapp.name
  egress_connector          = var.egress_connector
  egress_connector_settings = "PRIVATE_RANGES_ONLY"
}

Provider Configuration and Cloud Monitoring Workspaces

Note that the project with resources to be monitored must be in a Cloud Monitoring workspace configured by the Google Cloud Product Factory.

Cloud Monitoring distinguishes between workspaces and projects within those workspaces. Each workspace must have a "scoping" project and that project must be the default project of the google provider used by this module.

Most DevOps-managed products use a separate "meta" project as the "scoping" project for all project's environments. Therefore, this module usually needs a provider with rights on the "meta" project.

If the scoping project differs from the monitored project, use a provider alias:

provider "google" {
  project = "my-project"

  # ... some credentials for the *project* admin ...
}

provider "google" {
  project = "meta-project-hosting-cloud-monitoring-workspace"
  alias   = "monitoring"

  # ... some credentials for the *product* admin ...
}

module "monitoring" {
  # ... other parameters ...

  providers = {
    google = google.monitoring
  }
}

Notification Channels

Alerts are delivered via GCP Notification Channels, typically set up in the "meta" project. We reference them in the module by full resource ID. Typically, notification channel details can be found in the product configuration.

Summary

In summary:

This page explains how our DevOps team uses the gcp-site-monitoring Terraform module to implement standardised monitoring and alerting for web applications on Google Cloud. The module provides core features like uptime checks, TLS certificate expiry monitoring, and alerting on absent metrics, ensuring basic site health monitoring.

The page highlights how to configure the module minimally by specifying the host and alert notification channels, typically sourced from the product configuration. Advanced customisation options allow fine-tuning of uptime checks, TLS alerts, and authentication proxies for protected services.

The documentation also covers essential considerations for Cloud Monitoring workspaces, especially the need for a “scoping” project, usually a dedicated “meta” project. Provider aliasing is used when the monitoring workspace differs from the monitored project.

Overall, this approach ensures consistent, automated monitoring across products with flexible alerting and secure handling of internal services.