Skip to content

Data Backup Service

[Team | Cloud Team] [Tech Lead | rh841] [Service Owner | amc203] [Service Manager | ad2139] [Product Manager | TBC]

This page gives an overview of the Data Backup Service, describing its current status, where and how it's developed and deployed, and who is responsible for maintaining it.

Service Description

Data Backup Service performs data exports of products hosted in GCP. It currently supports the exporting of data from Cloud SQL databases to a GCP bucket.

Service Status

The Data Backup Service is currently live.

Contact

Technical queries and support should be directed to cloud@uis.cam.ac.uk and will be picked up by a member of the team working on the service. To ensure that you receive a response, always direct requests to cloud@uis.cam.ac.uk` rather than reaching out to team members directly.

Issues discovered in the service or new feature requests should be opened as GitLab issues in the application repository.

Environments

The Data Backup Service is currently deployed to the following environments:

Name Cloud Scheduler Backup Bucket
Production sql-backup data-backup-prod-63bdb80a
Development sql-backup data-backup-devel-37d7c4f2

Notification channel(s) for environments

Environment Display name Email / Teams Channel
Production Data Backup - DevOps Team email Channel cloud@uis.cam.ac.uk
Production Data Backup - Default monitoring pubsub channel Cloud Team - Notifications
Development Data Backup - DevOps Team email Channel cloud@uis.cam.ac.uk
Development Data Backup - Default monitoring pubsub channel Cloud Team - Notifications

Source code

The source code for the Data Backup Service is spread over the following repositories:

Repository Description
Application Server The source code for the main application server
Infrastructure Deployment The Terraform infrastructure code for deploying the application server to GCP

Technologies used

The following gives an overview of the technologies the Data Backup Service is built on.

Category Language Framework(s)
Server Python 3.11 FastAPI 0.110.3
GCP deployment Terraform 1.7 Google Cloud Platform

Operational documentation

The following gives an overview of how the Data Backup Service is deployed and maintained.

Which databases are exported

All databases discovered in any child GCP projects of the configured folders are exported. Exlusion patterns can be configured to exclude project databases from being exported, such as in development projects.

The configuration is controlled by the scheduled job in the Data Backup Service Terraform infrastructure and can be found in the following file:

How and where the Data Backup Service is deployed

Deployment is via our standard terraform deployment CI pipeline.

Deploying a new release

Making a new release of the application is done via release automation. In short: merged commits are collected together into a draft "next release" Merge Request. When merged, a new release tag is pushed to the repository along with a Docker image being pushed to Google's Artefact registry.

Deployment is done by:

  1. Updating the deployment project's repository with any changes, including bumping the deployed web application release.
  2. Using the "play" buttons in the CI pipeline to deploy to production when happy. (Deployments to staging happen automatically on merges to main.)

Monitoring

Monitoring is configured as per our standard Google Cloud Run application module.

The following alerts have been configured:

  • Cloud Scheduler: Cloud SQL Backup job failure (development/production)
  • Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)
  • Cloud Tasks: Data export not running for (development/production)

Cloud Scheduler: Cloud SQL Backup job failure (development/production)

This alert is triggered should the Cloud Scheduler job fail to run, which kicks off the whole process at 3am every day.

Actions:

  • Examine the Cloud Scheduler logs for the sql-backup job to see if it triggered the Cloud Run webapp service successfully.
  • Examine the logs for the Cloud Run webapp service to see if the job was triggered and why it failed.
  • Manually trigger the Cloud Scheduler sql-backup job.

Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)

This alert is triggered if an individual Cloud Task database export fails more than 3 times. Even if this does happen, the Cloud Task will retry for up to an hour, before giving up, so the export may eventually work.

Actions:

  • Examine the data-backup bucket for the project / database instance / database that failed. If there is a database export for the day in question, the job eventually succeeded.
  • Examine the logs for the Cloud Run webapp service to see why the task failed.

Cloud Tasks: Data export not running for (development/production)

This alert is triggered if there are no individual Cloud Task database export attempts. This is to cover the case where the Cloud Scheduler job runs, but no Cloud Tasks are created, so no databases are exported. This could be due to a misconfiguration in the Cloud Scheduler job.

Actions:

  • Examine the Cloud Scheduler logs for the sql-backup job to see if it triggered the Cloud Run webapp service successfully.
  • Examine the logs for the Cloud Run webapp service to see why no Cloud Tasks were created.

Additional documentation

Information about the overall architecture of the new Data Backup System can be found in:

Service Management

The Team responsible for this service is Cloud Team.

The Tech Lead for this service is rh841.

The Service Owner for this service is amc203.

The Service Manager for this service is ad2139.

The Product Manager for this service is TBC.

The following engineers have operational experience with this service and are able to respond to support requests or incidents: