Automating Google Drive Access

This document provides some guidance on ways we in the DevOps Division have found to integrate Google Drive storage into automated processes and some Best Practice we have developed along the way.

When starting from scratch with a new project we usually follow the following steps:

  1. Register a new shared ordinary Google account for the service.
  2. Register a new project with Google using that shared Google account.
  3. Register a service account within the project which is to have the required Google Drive resources shared with it.
  4. Create credentials for the service account.
  5. Write scripts which interact with the Google APIs acting as the service account.

There are, somewhat confusingly, several forms of identity in the Google ecosystem:

An ordinary Google account is the sort of Google account you can register for free at google.com. All that you need to register an ordinary Google account is the ability to receive email at some address. An ordinary Google account is associated with an email address.

A GSuite account is like an ordinary Google account but the associated email address ends in “[email protected]”. GSuite accounts use Raven to authenticate rather than giving a password to Google. A GSuite account is associated with a single individual.

A service account is an account associated with a particular “project” in Google’s developer console. They are not associated with individuals. They do have email addresses associated with them which end with “...gserviceaccount.com”. There is no mailbox behind the email address.

Registering a new project with Google

We strongly recommend that you start by registering an ordinary Google account associated with some role address. For example, if you have an existing “[email protected]” role address you can create an ordinary Google account associated with that address. Since the credentials for that account may be shared within the team we also strongly recommend that the shared account be set up with 2FA.

Once registered, sign in as the shared user, open the Google developer console and create a new project for your service. Navigate to the IAM page of the Google developer console for your project and add some GSuite administrator users to the project. (Recall that users with email addresses of the form “{crsid}@cam.ac.uk” are part of our GSuite account and are authenticated via a Raven log in.) Administrator users can be added on the IAM page via the “Add” button at the top and should be given the “Project Owner” role.

At this point, you can log out from the shared role account and proceed using one of the GSuite accounts you just added to the project as an administrator.

We recommend that all interaction now be done as a GSuite user with users being added or removed from the set of administrators as appropriate. Sign in details for the shared account may be kept securely for disaster recovery purposes or for adding admins if all existing admins have left their positions or are otherwise incapacitated.

Creating a service account

New service accounts can be created on the service accounts page of the developer console. Once created you can create credentials for that service account in the form of a special JSON document by clicking the vertical ellipsis in the “Actions” column of the service account table and selecting “Create key”. Download the JSON credentials.

Keep them secret. Keep them safe.

Sharing resources with the service account

The service account will have an email address associated with it which ends with “...gserviceaccount.com”. This email address can be used in Google Drive just like any other. For example, to allow the service account to access a particular Google Sheet, share it with the service account’s email address. Similarly you can add the service account’s email address as the owner of a Google Shared Drive to give it access to all the files within it.

Interacting with the API

Google provide very good documentation for the Drive API. They provide “quickstarts” in multiple languages and have an interactive codelab for Python.

When writing Python clients, the library you’ll use to interact with the API will depend on the exact API (Drive, Sheets, Docs, etc) you are using. It’s best to look at the quickstarts in the API documentation to determine which library to use. All APIs will need authentication and we recommend the google-auth package’s support for service account credentials; it provides a single function which can load credentials directly from the JSON key file in a form which can be consumed by most Google API client libraries.

Examples

This section includes some examples of ad hoc Google Drive automation solutions which make use of the techniques outlined in this document.

2018 Winter Pool Prototype

A prototype automation of the Winter pool for Undergraduate Admissions was trialled in 2018. This involved a long-running automation process which notices uploads of PDF files to a shared Google Drive, processes the PDFs to produce a summary index and OCR-ed transcription and writes the resulting artefacts back to Google Drive.

We created a service account and set it as the owner of two shared drives: one for upload and one for processed artefacts. Pool users had their GSuite accounts added as collaborators (with read and write access) to the upload drive but with view-only permissions on the artefacts drive. The artefacts drive also had downloads disabled.

The upshot of this was that users could upload PDFs to a shared drive but the processed artefacts could only be viewed and were accessed by means of a special index document.

The prototype code written for this experiment is available on GitHub.

Lecture Capture Prototype

As part of a proof of concept development for Lecture Capture, we developed a simple scheduler process which would consume a list of lectures from a Google Sheet and schedule them automatically in an OpenCast lecture capture server.

We created a service account and shared some specially formatted Google Sheets with it. We also shared the Google Sheets with some GSuite users and had them enter data into the sheet. Behind the scenes, our API client authenticated as the service account and processed all the Google Sheets which it could see. In this case, the act of sharing the Google Sheet with the special service account address indicated that it should be used as a source of lecture capture data.

The prototype code written for this experiment is available on the University Developer Hub.

Best Practice

In general we’ve found the following to be Best Practice when automating applications which interact with Google Drive:

  1. Only use service accounts to represent the application in Google Drive. Do not be tempted to use credentials for a GSuite user to drive the API since GSuite users may be deactivated when the corresponding person leaves the University or it may become inappropriate for the GSuite user to retain full access to the Google Drive resources.
  2. Make use of the Google Drive permissions system. Only give the service account the permissions it actually needs in Google Drive. By default a freshly minted service account will not be able to see anything. Think carefully about what resources you share with the service account.
  3. Rotate JSON credentials often. A service account can have multiple JSON-formatted credentials associated with it. A new set of credentials can be created at any time in the developer console without invalidating the previous set. You can then update the credentials used by your application and disable the old set in the developer console.
  4. “Sharing” can be used for resource discovery. A service account can use the API to determine which resources have been shared with it in Google Drive. As such you can make “share with the service account” a method of resource discovery. This avoids having to hard-code in Google Drive resource ids into applications.
  5. “Shared Drives” are a security boundary which can easily be reasoned about. There is a temptation to attempt to share a folder in Google Drive with the service account. This is dangerous since users have great freedom to move items in and out of folders within one drive. Shared Drives can be created quickly and easily and provide better controls about who can move files in and out of them. It can also be easier to reason about them as security boundaries: “our system can read anything inside the ‘XXX’ Shared Drive” is easily communicated to users. The same Google Drive folder can have different names for different users and so there is no single human-friendly name you can use when describing the security boundary.

Finding out more

The DevOps division are always happy to talk to people and share anything we’ve learned from our mistakes. Our contact details are available elsewhere in this guidebook.