Cambridge Data (CamData)¶
[Team | Jackson Team] [Tech Lead | TBC] [Service Owner | TBC] [Service Manager | TBC] [Product Manager | TBC]
This page gives an overview of the CamData applications, describing their current status, where and how they are developed and deployed, and who is responsible for maintaining them.
Legacy components/services are marked with the dodo () symbol.
Note that the service can be referred to as either "CamData" or "CamDATA" (the public CamData website uses both forms).
Service Description¶
The CamData website provides a publicly accessible archive of programme specifications for University awards, including those which have been discontinued, and of other courses such as Diplomas and Certificates.
CamData on Google Drive represents the service for managing programme specification PDF files to be made available on the CamData website. The files are actually hosted for public access by Google Cloud Platform (GCP) Cloud Storage, with synchronisation scripts periodically running to update Cloud Storage from Google Drive and then update the index used by Drupal 7 to provide search/sort functionality.
CamData SharePoint is the legacy document repository which previously served programme specification PDF files for the CamData website.
Service Status¶
The CamData service is currently live.
Environments¶
Google Drive¶
Name | Drive URL |
---|---|
Production | https://drive.google.com/drive/folders/0AAb2tRec0X1VUk9PVA |
Staging | https://drive.google.com/drive/folders/0ACyX-1h4wx41Uk9PVA |
Development | https://drive.google.com/drive/folders/0AMcH1XnA2Bu0Uk9PVA |
Google Cloud Platform (GCP)¶
Name | ||
---|---|---|
Production | GCP front page (production) | GCP Cloud Storage public bucket (production) |
Staging | GCP front page (staging) | GCP Cloud Storage public bucket (staging) |
Development | GCP front page (development) | GCP Cloud Storage public bucket (development) |
Meta (common to all environments) |
GCP front page (meta) | n/a |
Website (Drupal)¶
Name | Service | URL |
---|---|---|
Production | Public website | https://www.camdata.admin.cam.ac.uk/ |
Admin | https://www.camdata.admin.cam.ac.uk/admin?q=user/login | |
Staging | Public website | http://camdata-2.staging.drupal.uis.cam.ac.uk/ |
Admin | http://camdata-2.staging.drupal.uis.cam.ac.uk?q=user/login |
SharePoint (legacy)¶
The legacy CamData SharePoint service is currently still deployed to the following environments:
Name | Service | Supporting VMs |
---|---|---|
Production (legacy) | ||
Web Server | spt-live-web1.internal.admin.cam.ac.uk | |
Application Server | spt-live-app1.internal.admin.cam.ac.uk | |
Database | spt-live-db1.internal.admin.cam.ac.uk | |
Staging (legacy) | ||
Web Server | spt-test-web1.internal.admin.cam.ac.uk | |
Application Server | spt-test-app1.internal.admin.cam.ac.uk | |
Database | spt-test-db1.internal.admin.cam.ac.uk |
Source code¶
Source code for the CamData service is spread over the following repositories:
- Cloud Infrastructure repository for the Archive of Programme Specifications for University Awards (APSUA)
- PDF upload and document metadata extraction (legacy)
- SharePoint REST API for CamData document repository
Technologies used¶
The CamData service is built using the following technologies:
Category | Language | Framework(s) |
---|---|---|
PDF hosting | Terraform HCL 1.7 | GCP Cloud Storage |
PDF management | n/a | Google Drive |
Synchronisation | Python 3.11 and Terraform HCL 1.7 | GCP Cloud Run Functions |
Website | PHP | Drupal 7 |
PDF Upload (legacy) | C# | SharePoint Client-side Object Model |
REST API (legacy) | C# | SharePoint REST API and Server-side Object Model |
Operational documentation¶
The following sections provide an overview of how the CamData service is deployed and maintained.
PDF hosting¶
GCP Cloud Storage is used to host the programme specification PDFs via a publicly accessible bucket. The bucket also holds the JSON index document that is used by Drupal to provide search/sort functionality over the programme specifications.
GCP Cloud Storage buckets also exist for other purposes connected with the deployment of the CamData service (such as for holding environment settings and caching the code used when performing the synchronisation process).
The Terraform HCL to deploy the Cloud Storage buckets to the cloud is in the APSUA Cloud Infrastructure repository.
PDF management¶
Google Drive is used for managing the programme specification PDFs that are to be published on the public CamData website. A drive exists for each environment, and within each drive is a folder for each University year (actually covers two years, e.g. 2005-6). Each folder then contains the programme specification PDFs applicable to the University year indicated by the folder name.
To publish a programme specification PDF, a suitable PDF should be generated with appropriate PDF metadata describing the specification (see existing PDFs for reference). The PDF should then be placed in the folder for the appropriate year in the CamData Archive of Programme Specifications for University Awards drive (the production drive). The document should be available on the public CamData website the following day.
Roles should be managed for the drive so that only users who should be uploading programme
specifications have permission to do so. Note that the script-gdrive-to-storage-sync
service
account must always have the Manager role in order for the synchronisation process to function
correctly.
Synchronisation¶
This service is provided by Python code run periodically (1/hour) by GCP Cloud Scheduler. The code is run as a function on GCP Cloud Run Functions. Programme specification PDFs are synced from Google Drive to Cloud Storage for hosting (public access) and an index of the PDFs is generated (containing metadata).
The synchronisation process also ensures that there is a folder in Google Drive for the most recent University year that programme specification PDFs can be provided for.
The Python code to perform the synchronisation and the Terraform HCL to deploy it to the cloud are in the APSUA Cloud Infrastructure repository.
Website¶
The CamData website is hosted by servers (currently on-premise) running Drupal 7. The server periodically (1/day, overnight) fetches the index of PDFs from GCP Cloud Storage and updates its database of programme specifications. Information about the updated programme specifications will then appear in the table on the CamData Programme Specification Archive page, which queries the server queried via its API in order to populate the table and to implement search/sort functionality.
SharePoint 2013 PDF Upload (legacy)¶
This application is run once per year, upon request from users.
It is executed client-side, on a suitable Windows machine, by downloading and running the executable file found in its repository.
To run the app on its production server SPT-Live-App1
, use the command-line console:
- Go to app folder:
C:\CommandLineTools\CamDataPDFUpload\
and runLive.bat
with the following parameters: - Specify PDF folder as
\\internal\general\RShare\Applications\SharePointShare\CAMData
- Specify SharePoint site URL as
https://spt-live.admin.cam.ac.uk/Sites/CamDataShare/
SharePoint 2013 REST API (legacy)¶
The repository contains PowerShell scripts to deploy this solution as a SharePoint Feature on the SharePoint server farm.
Service Management¶
The Team responsible for this service is Jackson Team.
The Tech Lead for this service is TBC.
The Service Owner for this service is TBC.
The Service Manager for this service is TBC.
The Product Manager for this service is TBC.
The following engineers have operational experience with this service and are able to respond to support requests or incidents: