Skip to content

Cambridge Data (CamData)

This page gives an overview of the CamData applications, describing their current status, where and how they are developed and deployed, and who is responsible for maintaining them.

Legacy components/services are marked with the dodo (🦤) symbol.

Note that the service can be referred to as either "CamData" or "CamDATA" (the public CamData website uses both forms).

Service Description

The CamData website provides a publicly accessible archive of programme specifications for University awards, including those which have been discontinued, and of other courses such as Diplomas and Certificates.

CamData on Google Drive represents the service for managing programme specification PDF files to be made available on the CamData website. The files are actually hosted for public access by Google Cloud Platform (GCP) Cloud Storage, with synchronisation scripts periodically running to update Cloud Storage from Google Drive and then update the index used by Drupal 7 to provide search/sort functionality.

CamData SharePoint is the legacy document repository which previously served programme specification PDF files for the CamData website.

Service Status

The CamData service is currently live.

Environments

Google Drive

Name Drive URL
Production https://drive.google.com/drive/folders/0AAb2tRec0X1VUk9PVA
Staging https://drive.google.com/drive/folders/0ACyX-1h4wx41Uk9PVA
Development https://drive.google.com/drive/folders/0AMcH1XnA2Bu0Uk9PVA

Google Cloud Platform (GCP)

Name
Production GCP front page (production) GCP Cloud Storage public bucket (production)
Staging GCP front page (staging) GCP Cloud Storage public bucket (staging)
Development GCP front page (development) GCP Cloud Storage public bucket (development)
Meta
(common to all environments)
GCP front page (meta) n/a

Website (Drupal)

Name Service URL
Production Public website https://www.camdata.admin.cam.ac.uk/
Admin https://www.camdata.admin.cam.ac.uk/admin?q=user/login
Staging Public website http://camdata-2.staging.drupal.uis.cam.ac.uk/
Admin http://camdata-2.staging.drupal.uis.cam.ac.uk?q=user/login

🦤 SharePoint (legacy)

The legacy CamData SharePoint service is currently still deployed to the following environments:

Name Service Supporting VMs
🦤 Production (legacy)
Web Server spt-live-web1.internal.admin.cam.ac.uk
Application Server spt-live-app1.internal.admin.cam.ac.uk
Database spt-live-db1.internal.admin.cam.ac.uk
🦤 Staging (legacy)
Web Server spt-test-web1.internal.admin.cam.ac.uk
Application Server spt-test-app1.internal.admin.cam.ac.uk
Database spt-test-db1.internal.admin.cam.ac.uk

Source code

Source code for the CamData service is spread over the following repositories:

Technologies used

The CamData service is built using the following technologies:

Category Language Framework(s)
PDF hosting Terraform HCL 1.7 GCP Cloud Storage
PDF management n/a Google Drive
Synchronisation Python 3.11 and Terraform HCL 1.7 GCP Cloud Run Functions
Website PHP Drupal 7
🦤 PDF Upload (legacy) C# SharePoint Client-side Object Model
🦤 REST API (legacy) C# SharePoint REST API and Server-side Object Model

Operational documentation

The following sections provide an overview of how the CamData service is deployed and maintained.

PDF hosting

GCP Cloud Storage is used to host the programme specification PDFs via a publicly accessible bucket. The bucket also holds the JSON index document that is used by Drupal to provide search/sort functionality over the programme specifications.

GCP Cloud Storage buckets also exist for other purposes connected with the deployment of the CamData service (such as for holding environment settings and caching the code used when performing the synchronisation process).

The Terraform HCL to deploy the Cloud Storage buckets to the cloud is in the APSUA Cloud Infrastructure repository.

PDF management

Google Drive is used for managing the programme specification PDFs that are to be published on the public CamData website. A drive exists for each environment, and within each drive is a folder for each University year (actually covers two years, e.g. 2005-6). Each folder then contains the programme specification PDFs applicable to the University year indicated by the folder name.

To publish a programme specification PDF, a suitable PDF should be generated with appropriate PDF metadata describing the specification (see existing PDFs for reference). The PDF should then be placed in the folder for the appropriate year in the CamData Archive of Programme Specifications for University Awards drive (the production drive). The document should be available on the public CamData website the following day.

Roles should be managed for the drive so that only users who should be uploading programme specifications have permission to do so. Note that the script-gdrive-to-storage-sync service account must always have the Manager role in order for the synchronisation process to function correctly.

Synchronisation

This service is provided by Python code run periodically (1/hour) by GCP Cloud Scheduler. The code is run as a function on GCP Cloud Run Functions. Programme specification PDFs are synced from Google Drive to Cloud Storage for hosting (public access) and an index of the PDFs is generated (containing metadata).

The synchronisation process also ensures that there is a folder in Google Drive for the most recent University year that programme specification PDFs can be provided for.

The Python code to perform the synchronisation and the Terraform HCL to deploy it to the cloud are in the APSUA Cloud Infrastructure repository.

Website

The CamData website is hosted by servers (currently on-premise) running Drupal 7. The server periodically (1/day, overnight) fetches the index of PDFs from GCP Cloud Storage and updates its database of programme specifications. Information about the updated programme specifications will then appear in the table on the CamData Programme Specification Archive page, which queries the server queried via its API in order to populate the table and to implement search/sort functionality.

🦤 SharePoint 2013 PDF Upload (legacy)

This application is run once per year, upon request from users.

It is executed client-side, on a suitable Windows machine, by downloading and running the executable file found in its repository.

To run the app on its production server SPT-Live-App1, use the command-line console:

  • Go to app folder: C:\CommandLineTools\CamDataPDFUpload\ and run Live.bat with the following parameters:
  • Specify PDF folder as \\internal\general\RShare\Applications\SharePointShare\CAMData
  • Specify SharePoint site URL as https://spt-live.admin.cam.ac.uk/Sites/CamDataShare/

🦤 SharePoint 2013 REST API (legacy)

The repository contains PowerShell scripts to deploy this solution as a SharePoint Feature on the SharePoint server farm.

Service Management and Tech Lead

The Service Owner for the CamData service is TBA.

The Service Manager for the CamData service is TBA.

The Tech Lead for the CamData service is TBA.

The following engineers have operational experience with the CamData service and are able to respond to support requests or incidents: