Data Backup Service¶
[Team | Cloud Team] [Tech Lead | rh841] [Service Owner | amc203] [Service Manager | ad2139] [Product Manager | TBC]
This page gives an overview of the Data Backup Service
, describing its current status, where and
how it's developed and deployed, and who is responsible for maintaining it.
Service Description¶
Data Backup Service performs data exports of products hosted in GCP. It currently supports the exporting of data from Cloud SQL databases to a GCP bucket.
Service Status¶
The Data Backup Service
is currently live
.
Contact¶
Technical queries and support should be directed to cloud@uis.cam.ac.uk and will be picked up by a member of the team working on the service. To ensure that you receive a response, always direct requests to cloud@uis.cam.ac.uk` rather than reaching out to team members directly.
Issues discovered in the service or new feature requests should be opened as GitLab issues in the application repository.
Environments¶
The Data Backup Service
is currently deployed to the following environments:
Name | Cloud Scheduler | Backup Bucket |
---|---|---|
Production | sql-backup | data-backup-prod-63bdb80a |
Development | sql-backup | data-backup-devel-37d7c4f2 |
Notification channel(s) for environments¶
Environment | Display name | Email / Teams Channel |
---|---|---|
Production | Data Backup - DevOps Team email Channel | cloud@uis.cam.ac.uk |
Production | Data Backup - Default monitoring pubsub channel | Cloud Team - Notifications |
Development | Data Backup - DevOps Team email Channel | cloud@uis.cam.ac.uk |
Development | Data Backup - Default monitoring pubsub channel | Cloud Team - Notifications |
Source code¶
The source code for the Data Backup Service
is spread over the following repositories:
Repository | Description |
---|---|
Application Server | The source code for the main application server |
Infrastructure Deployment | The Terraform infrastructure code for deploying the application server to GCP |
Technologies used¶
The following gives an overview of the technologies the Data Backup Service
is built on.
Category | Language | Framework(s) |
---|---|---|
Server | Python 3.11 | FastAPI 0.110.3 |
GCP deployment | Terraform 1.7 | Google Cloud Platform |
Operational documentation¶
The following gives an overview of how the Data Backup Service
is deployed and maintained.
Which databases are exported¶
All databases discovered in any child GCP projects of the configured folders are exported. Exlusion patterns can be configured to exclude project databases from being exported, such as in development projects.
The configuration is controlled by the scheduled job in the Data Backup Service
Terraform
infrastructure and can be found in the following file:
How and where the Data Backup Service
is deployed¶
Deployment is via our standard terraform deployment CI pipeline.
Deploying a new release¶
Making a new release of the application is done via release automation. In short: merged commits are collected together into a draft "next release" Merge Request. When merged, a new release tag is pushed to the repository along with a Docker image being pushed to Google's Artefact registry.
Deployment is done by:
- Updating the deployment project's repository with any changes, including bumping the deployed web application release.
- Using the "play" buttons in the CI pipeline to deploy to production when happy. (Deployments to
staging happen automatically on merges to
main
.)
Monitoring¶
Monitoring is configured as per our standard Google Cloud Run application module.
The following alerts have been configured:
- Cloud Scheduler: Cloud SQL Backup job failure (development/production)
- Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)
- Cloud Tasks: Data export not running for (development/production)
Cloud Scheduler: Cloud SQL Backup job failure (development/production)¶
This alert is triggered should the Cloud Scheduler job fail to run, which kicks off the whole process at 3am every day.
Actions:
- Examine the Cloud Scheduler logs for the
sql-backup
job to see if it triggered the Cloud Runwebapp
service successfully. - Examine the logs for the Cloud Run
webapp
service to see if the job was triggered and why it failed. - Manually trigger the Cloud Scheduler
sql-backup
job.
Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)¶
This alert is triggered if an individual Cloud Task database export fails more than 3 times. Even if this does happen, the Cloud Task will retry for up to an hour, before giving up, so the export may eventually work.
Actions:
- Examine the
data-backup
bucket for the project / database instance / database that failed. If there is a database export for the day in question, the job eventually succeeded. - Examine the logs for the Cloud Run
webapp
service to see why the task failed.
Cloud Tasks: Data export not running for (development/production)¶
This alert is triggered if there are no individual Cloud Task database export attempts. This is to cover the case where the Cloud Scheduler job runs, but no Cloud Tasks are created, so no databases are exported. This could be due to a misconfiguration in the Cloud Scheduler job.
Actions:
- Examine the Cloud Scheduler logs for the
sql-backup
job to see if it triggered the Cloud Runwebapp
service successfully. - Examine the logs for the Cloud Run
webapp
service to see why no Cloud Tasks were created.
Additional documentation¶
Information about the overall architecture of the new Data Backup System can be found in:
Service Management¶
The Team responsible for this service is Cloud Team.
The Tech Lead for this service is rh841.
The Service Owner for this service is amc203.
The Service Manager for this service is ad2139.
The Product Manager for this service is TBC.
The following engineers have operational experience with this service and are able to respond to support requests or incidents: