Data Backup Service¶
[Team : Cloud Team] [Tech Lead : rh841] [Service Owner : amc203] [Service Manager : ad2139] [Product Manager : TBC]
This page gives an overview of the Data Backup Service, describing its current status, where and
how it's developed and deployed, and who is responsible for maintaining it.
Service Description¶
Data Backup Service performs data exports of products hosted in Google Cloud Platform (GCP). It currently supports the exporting of data from Cloud SQL databases to a GCP bucket.
Service Status¶
The Data Backup Service is currently live.
The GCP Bucket S3 Backup feature is currently beta.
Contact¶
Technical queries and support should be directed to cloud@uis.cam.ac.uk and will be picked up by a member of the team working on the service. To ensure that you receive a response, always direct requests to cloud@uis.cam.ac.uk` rather than reaching out to team members directly.
Issues discovered in the service or new feature requests should be opened as GitLab issues in the application repository.
Environments¶
The Data Backup Service is currently deployed to the following environments in GCP:
| Name | Cloud Scheduler | Backup Bucket |
|---|---|---|
| Production | sql-backup | data-backup-prod-63bdb80a |
| Development | sql-backup | data-backup-devel-37d7c4f2 |
The GCP Bucket Backup to S3 feature is currently deployed to the following environments in AWS:
| Name | Account ID | S3 Bucket | Region |
|---|---|---|---|
| Production | 183129768585 | ucam-devops-data-backup-prod-1bcbba77 | Europe (London) eu-west-2 |
| Development | 482507208568 | ucam-devops-data-backup-devel-89ad6791 | Europe (London) eu-west-2 |
Notification channel(s) for environments¶
| Environment | Display name | Email / Teams Channel |
|---|---|---|
| Production | Data Backup - DevOps Team email Channel | cloud@uis.cam.ac.uk |
| Production | Data Backup - Default monitoring pubsub channel | Cloud Team - Notifications |
| Development | Data Backup - DevOps Team email Channel | cloud@uis.cam.ac.uk |
| Development | Data Backup - Default monitoring pubsub channel | Cloud Team - Notifications |
Source code¶
The source code for the Data Backup Service is spread over the following repositories:
| Repository | Description |
|---|---|
| Application Server | The source code for the main application server |
| Infrastructure Deployment | The Terraform infrastructure code for deploying the application server to GCP |
Technologies used¶
The following gives an overview of the technologies the Data Backup Service is built on.
| Category | Language | Framework(s) |
|---|---|---|
| Server | Python | FastAPI |
| GCP deployment | Terraform | Google Cloud Platform |
| AWS deployment | Terraform | Amazon Web Services |
Operational documentation¶
The following gives an overview of how the Data Backup Service is deployed and maintained.
Which databases are exported¶
All databases discovered in any child GCP projects of the configured folders are exported. Exlusion patterns can be configured to exclude project databases from being exported, such as in development projects.
The configuration is controlled by the scheduled job in the Data Backup Service Terraform
infrastructure and can be found in the following file:
Which GCP Buckets are backed up to S3¶
Any GCS bucket with the label ucam-devops-backup-destination set to production are backed up to
the single S3 backup bucket.
A daily CI/CD job checks all GCS buckets in projects under the DevOps folder in GCP for the label
and enables or disables backups as appropriate. The job outputs the current list of buckets
being backed up in the CI job log.
S3 Bucket Information¶
Production:
- ARN:
arn:aws:s3:::ucam-devops-data-backup-prod-1bcbba77 - Owning Account ID:
183129768585 - Region:
Europe (London) eu-west-2 - Retention Policy for current data:
10000 days - Retention Policy for non-current data:
365 days
Development:
- ARN:
arn:aws:s3:::ucam-devops-data-backup-devel-89ad6791 - Owning Account ID:
482507208568 - Region:
Europe (London) eu-west-2 - Retention Policy for current data:
365 days - Retention Policy for non-current data:
30 days
Info
GCP Bucket backups are stored in the following folder in the S3 bucket:
<gcp-project-id>/<gcp-bucket-name>/
How and where the Data Backup Service is deployed¶
Deployment is via our standard terraform deployment CI pipeline.
Deploying a new release¶
Making a new release of the application is done via release automation. In short: merged commits are collected together into a draft "next release" Merge Request. When merged, a new release tag is pushed to the repository along with a Docker image being pushed to Google's Artefact registry.
Deployment is done by:
- Updating the deployment project's repository with any changes, including bumping the deployed web application release.
- Using the "play" buttons in the CI pipeline to deploy to production when happy. (Deployments to
staging happen automatically on merges to
main.)
Monitoring¶
Monitoring is configured as per our standard Google Cloud Run application module.
The following alerts have been configured:
- Cloud Scheduler: Cloud SQL Backup job failure (development/production)
- Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)
- Cloud Tasks: Data export not running for (development/production)
Cloud Scheduler: Cloud SQL Backup job failure (development/production)¶
This alert is triggered should the Cloud Scheduler job fail to run, which kicks off the whole process at 3am every day.
Actions:
- Examine the Cloud Scheduler logs for the
sql-backupjob to see if it triggered the Cloud Runwebappservice successfully. - Examine the logs for the Cloud Run
webappservice to see if the job was triggered and why it failed. - Manually trigger the Cloud Scheduler
sql-backupjob.
Cloud Tasks: 1 or more Cloud SQL Export errors occurred in the last 1 day (development/production)¶
This alert is triggered if an individual Cloud Task database export fails more than 3 times. Even if this does happen, the Cloud Task will retry for up to an hour, before giving up, so the export may eventually work.
Actions:
- Examine the
data-backupbucket for the project / database instance / database that failed. If there is a database export for the day in question, the job eventually succeeded. - Examine the logs for the Cloud Run
webappservice to see why the task failed.
Cloud Tasks: Data export not running for (development/production)¶
This alert is triggered if there are no individual Cloud Task database export attempts. This is to cover the case where the Cloud Scheduler job runs, but no Cloud Tasks are created, so no databases are exported. This could be due to a misconfiguration in the Cloud Scheduler job.
Actions:
- Examine the Cloud Scheduler logs for the
sql-backupjob to see if it triggered the Cloud Runwebappservice successfully. - Examine the logs for the Cloud Run
webappservice to see why no Cloud Tasks were created.
Additional documentation¶
Information about the overall architecture of the new Data Backup System can be found in:
To find out which GCP buckets are being backed up to AWS S3 using the Data Backup Service, see:
Service Management¶
The Team responsible for this service is Cloud Team.
The Tech Lead for this service is rh841.
The Service Owner for this service is amc203.
The Service Manager for this service is ad2139.
The Product Manager for this service is TBC.
The following engineers have operational experience with this service and are able to respond to support requests or incidents: