Monitoring and Alerting¶
This section of the guidebook discusses how we implement proactive monitoring and alerting for web applications using the gcp-site-monitoring Terraform module. The module provides a standardised baseline for site health checks, alerting, and certificate monitoring.
This module is utilised directly by various products, as well as indirectly through its inclusion in gcp-cloud-run-app module.
Overview¶
The gcp-site-monitoring module provisions the following core monitoring features:
- Uptime Check: Periodic probes to ensure application endpoints are responsive.
- TLS Certificate Expiry Check: Alerts if the certificate is approaching expiry.
- Absent Metric Alerting: Detects if uptime monitoring is misconfigured or silently failing.
These checks form the minimum common monitoring standard for web applications we manage.
Note
Alert severity customisation is available starting from version 6.1.0. See Advanced Customisation for more information.
Core Configuration¶
A minimal usage of the module only requires:
- The host to monitor (e.g.
) - One or more
alert_notification_channels
, typically using the channel created by Google Cloud Product Factory
Example:
module "monitoring" {
source = "gitlab.developers.cam.ac.uk/uis/gcp-site-monitoring/devops"
version = "~> 6.0"
host = "www.example.com"
alert_notification_channels = ["projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]"]
}
Advanced Customisation¶
The module supports a range of optional parameters to tailor monitoring based on application specifics:
Uptime Checks¶
You can customise:
- The request path (e.g.
/healthz
,/status
, etc.) - Frequency and timeout of checks
- Alert severity (valid values are "CRITICAL", "ERROR", or "WARNING")
- Threshold for failed checks (default: 75% success)
- Content matchers for verifying response content (e.g., JSON path checks)
Example:
uptime_check = {
path = "/healthz"
timeout = 60
period = 60
failed_severity = "CRITICAL"
missed_severity = "ERROR"
success_threshold_percent = 90
}
TLS Certificate Alerts¶
Ensure TLS certificates don’t expire unnoticed. You can customise:
- Minimum age of certificate
- Alert severity (valid values are "CRITICAL", "ERROR", or "WARNING")
Example:
tls_check = {
alert_enabled = true
minimum_age = 15
severity = "WARNING"
}
Note
When using a proxy, TLS checks are disabled automatically because the proxy masks the real certificate.
Authentication Proxy for Protected Services¶
Some internal services require authentication. In those cases, we enable an authentication proxy using a Cloud Function to forward monitoring requests to internal endpoints.
authentication_proxy = {
enabled = true
cloud_run_project = google_cloud_run_service.webapp.project
cloud_run_region = var.cloud_run_region
cloud_run_service_name = google_cloud_run_service.webapp.name
egress_connector = var.egress_connector
egress_connector_settings = "PRIVATE_RANGES_ONLY"
}
Provider Configuration and Cloud Monitoring Workspaces¶
Note that the project with resources to be monitored must be in a Cloud Monitoring workspace configured by the Google Cloud Product Factory.
Cloud Monitoring distinguishes between workspaces and projects within those workspaces. Each
workspace must have a "scoping" project and that project must be the default project of the
google
provider used by this module.
Most DevOps-managed products use a separate "meta" project as the "scoping" project for all project's environments. Therefore, this module usually needs a provider with rights on the "meta" project.
If the scoping project differs from the monitored project, use a provider alias:
provider "google" {
project = "my-project"
# ... some credentials for the *project* admin ...
}
provider "google" {
project = "meta-project-hosting-cloud-monitoring-workspace"
alias = "monitoring"
# ... some credentials for the *product* admin ...
}
module "monitoring" {
# ... other parameters ...
providers = {
google = google.monitoring
}
}
Notification Channels¶
Alerts are delivered via GCP Notification Channels, typically set up in the "meta" project. We reference them in the module by full resource ID. Typically, notification channel details can be found in the product configuration.
Summary¶
In summary:
This page explains how our DevOps team uses the gcp-site-monitoring Terraform module to implement standardised monitoring and alerting for web applications on Google Cloud. The module provides core features like uptime checks, TLS certificate expiry monitoring, and alerting on absent metrics, ensuring basic site health monitoring.
The page highlights how to configure the module minimally by specifying the host and alert notification channels, typically sourced from the product configuration. Advanced customisation options allow fine-tuning of uptime checks, TLS alerts, and authentication proxies for protected services.
The documentation also covers essential considerations for Cloud Monitoring workspaces, especially the need for a “scoping” project, usually a dedicated “meta” project. Provider aliasing is used when the monitoring workspace differs from the monitored project.
Overall, this approach ensures consistent, automated monitoring across products with flexible alerting and secure handling of internal services.