Data Backup Service¶

[Team : Cloud Team] [Tech Lead : rh841] [Service Owner : amc203] [Service Manager : ad2139] [Product Manager : TBC]

This page gives an overview of the Data Backup Service, describing its current status, where and how it's developed and deployed, and who is responsible for maintaining it.

Service Description¶

Data Backup Service performs data exports of products hosted in GCP. It currently supports the exporting of data from Cloud SQL databases to a GCP bucket.

Service Status¶

The Data Backup Service is currently live.

Contact¶

Technical queries and support should be directed to cloud@uis.cam.ac.uk and will be picked up by a member of the team working on the service. To ensure that you receive a response, always direct requests to cloud@uis.cam.ac.uk` rather than reaching out to team members directly.

Issues discovered in the service or new feature requests should be opened as GitLab issues in the application repository.

Environments¶

The Data Backup Service is currently deployed to the following environments:

Name	Cloud Scheduler	Backup Bucket
Production	sql-backup	data-backup-prod-63bdb80a
Development	sql-backup	data-backup-devel-37d7c4f2

Notification channel(s) for environments¶

Environment	Display name	Email / Teams Channel
Production	Data Backup - DevOps Team email Channel	cloud@uis.cam.ac.uk
Production	Data Backup - Default monitoring pubsub channel	Cloud Team - Notifications
Development	Data Backup - DevOps Team email Channel	cloud@uis.cam.ac.uk
Development	Data Backup - Default monitoring pubsub channel	Cloud Team - Notifications

Source code¶

The source code for the Data Backup Service is spread over the following repositories:

Repository	Description
Application Server	The source code for the main application server
Infrastructure Deployment	The Terraform infrastructure code for deploying the application server to GCP

Technologies used¶

The following gives an overview of the technologies the Data Backup Service is built on.

Category	Language	Framework(s)
Server	Python 3.11	FastAPI 0.110.3
GCP deployment	Terraform 1.7	Google Cloud Platform

Operational documentation¶

The following gives an overview of how the Data Backup Service is deployed and maintained.

Which databases are exported¶

All databases discovered in any child GCP projects of the configured folders are exported. Exlusion patterns can be configured to exclude project databases from being exported, such as in development projects.

The configuration is controlled by the scheduled job in the Data Backup Service Terraform infrastructure and can be found in the following file:

scheduler.tf

How and where the `Data Backup Service` is deployed¶

Deployment is via our standard terraform deployment CI pipeline.

Deploying a new release¶

Making a new release of the application is done via release automation. In short: merged commits are collected together into a draft "next release" Merge Request. When merged, a new release tag is pushed to the repository along with a Docker image being pushed to Google's Artefact registry.

Deployment is done by:

Updating the deployment project's repository with any changes, including bumping the deployed web application release.
Using the "play" buttons in the CI pipeline to deploy to production when happy. (Deployments to staging happen automatically on merges to main.)

Monitoring¶

Monitoring is configured as per our standard Google Cloud Run application module.

The following alerts have been configured:

Cloud Scheduler: Cloud SQL Backup job failure (development/production)
Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)
Cloud Tasks: Data export not running for (development/production)

Cloud Scheduler: Cloud SQL Backup job failure (development/production)¶

This alert is triggered should the Cloud Scheduler job fail to run, which kicks off the whole process at 3am every day.

Actions:

Examine the Cloud Scheduler logs for the sql-backup job to see if it triggered the Cloud Run webapp service successfully.
Examine the logs for the Cloud Run webapp service to see if the job was triggered and why it failed.
Manually trigger the Cloud Scheduler sql-backup job.

Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)¶

This alert is triggered if an individual Cloud Task database export fails more than 3 times. Even if this does happen, the Cloud Task will retry for up to an hour, before giving up, so the export may eventually work.

Actions:

Examine the data-backup bucket for the project / database instance / database that failed. If there is a database export for the day in question, the job eventually succeeded.
Examine the logs for the Cloud Run webapp service to see why the task failed.

Cloud Tasks: Data export not running for (development/production)¶

This alert is triggered if there are no individual Cloud Task database export attempts. This is to cover the case where the Cloud Scheduler job runs, but no Cloud Tasks are created, so no databases are exported. This could be due to a misconfiguration in the Cloud Scheduler job.

Actions:

Examine the Cloud Scheduler logs for the sql-backup job to see if it triggered the Cloud Run webapp service successfully.
Examine the logs for the Cloud Run webapp service to see why no Cloud Tasks were created.

Additional documentation¶

Information about the overall architecture of the new Data Backup System can be found in:

Infrastructure Architecture.

Service Management¶

The Team responsible for this service is Cloud Team.

The Tech Lead for this service is rh841.

The Service Owner for this service is amc203.

The Service Manager for this service is ad2139.

The Product Manager for this service is TBC.

The following engineers have operational experience with this service and are able to respond to support requests or incidents:

Data Backup Service¶

Service Description¶

Service Status¶

Contact¶

Environments¶

Notification channel(s) for environments¶

Source code¶

Technologies used¶

Operational documentation¶

Which databases are exported¶

How and where the Data Backup Service is deployed¶

Deploying a new release¶

Monitoring¶

Cloud Scheduler: Cloud SQL Backup job failure (development/production)¶

Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)¶

Cloud Tasks: Data export not running for (development/production)¶

Additional documentation¶

Service Management¶

How and where the `Data Backup Service` is deployed¶