Skip to content

How to restore a GCS bucket from an AWS S3 backup

This guide describes how to restore a Google Cloud Storage (GCS) bucket from its backup stored in an AWS S3 bucket.

Overview

The Data Backup Service provides an automated way to enroll a GCS bucket for regular backups to an AWS S3 bucket. This is useful for disaster recovery purposes, ensuring that critical data stored in GCS is also available in a different cloud provider.

Once a GCS bucket has been enabled in the backup service, you can restore its contents from the AWS S3 backup in the event of data loss or corruption.

Prerequisites

  • Your GCS bucket has been previously enrolled in the Data Backup Service and has existing backups in the AWS S3 bucket.
  • You have the necessary permissions to copy data to the GCS buckets you wish to restore. The role required for this is:
    • roles/storage.objectCreator
    • Use your gcloudadmin account to perform these operations which will require deploy or admin permissions.

Steps to restore a GCS bucket from AWS S3 backup

For this procedure, we'll assume the following:

  • GCP Project ID: my-gcp-project.
  • GCS Bucket to restore: my-gcs-bucket.
  • gcloudadmin group with admin permissions for the project is myteam-admin@gcloudadmin.g.apps.cam.ac.uk.

Perform the following steps:

  1. Find the GCP project ID.
  2. Find the GCS bucket name you wish to restore.
  3. Find the gcloudadmin group with admin permissions for the project.
  4. Checkout the latest main branch of data-backup/infrastructure repo.
  5. Create a new branch for your changes:

    git checkout -b restore-my-gcs-bucket
    
  6. Open restore.tf.

  7. Add a new object to the local variable restore_configuration for production as shown below:

    locals {
      restore_configuration = {
        production = [
          {
            project_id                  = "my-gcp-project",
            bucket_name                 = "my-gcs-bucket",
            iam_gcloudadmin_group_names = ["myteam-admin"]
          }
        ]
      }
    }
    
  8. Commit and push your changes for your branch and create a merge request to main.

  9. Check the pipeline for the merge request to ensure it passes all checks.
  10. Once the merge request has been approved and merged, apply the changes to the production environment using the CI pipeline for the merged main branch.
  11. This will start the restore process. Depending on the size of the GCS bucket being restored, this may take some time. You can check the progress of the restore in data-backup Storage Transfer Jobs page.
  12. The restore process will copy the contents of the AWS S3 backup bucket back to a restore bucket in the data-backup GCP project into a folder with the same project id and bucket name as shown below:
    • data-backup-prod-s3-restore-65434812/my-gcp-project/my-gcs-bucket
  13. The required IAM permissions for the specified gcloudadmin group are added to the restore bucket to allow copying the restored data to the original GCS bucket, either partially or fully, depending on the nature of the data loss or corruption.
  14. It is envisaged that the copying of data from the restore bucket to the original GCS bucket will be done manually by the service team responsible for the GCS bucket, using tools such as:
  15. Once the restore is complete, it is recommended to remove the restore configuration added in step 7 by deleting the relevant object from the local variable restore_configuration and creating a new merge request to main and applying the changes using the CI pipeline.