Skip to content

UCS Development Group Services

The University Computing Service (UCS) was one of the parent organisations that merged to form University Information Services (UIS).

The UCS development group wrote services, some of which are still in use today. As the UCS and it's dev group no longer exist, responsibility for running these has been transferred to DevOps.

The services run by DevOps currently includes:

Deploying changes to UCS development group services

Note

DevOps are currently in the process of migrating between SLES and RHEL. Please make sure you are using the correct method.

Current Deployments

Application Operating System HA provider
Lookup RedHat 8 Keepalived
Password Changing Application Redhat 8 Traffic Manager
Human Tissue Tracking Application Redhat 8 Traffic Manager
Network Access Tokens Application SLES 11 Pacemaker
Streaming Media Service SLES 11 Manual
University Training Booking System SLES 11 None

RedHat Deployments

New releases are deployed via Ansible. Each server should be set to maintenance mode then the run-ansible-playbook.sh should be run limited to the node in maintenance mode. e.g.

Note

Which node is on standby can be determined by running curl -s https://www.lookup.cam.ac.uk/adm/status | grep 'HOSTNAME:\|Overall status\|Application' | sed 's/.*<pre>'//g or looking at the service's /adm/status page.

  1. Put the first node in to maintenance mode by touching /maintenance_mode. This file is read by /usr/share/[app]/[app]-monitor-rhel. If the file exists, then the page returns a 500 result. Depending on the configuration, either keepalived or Traffic Manager polls this file (via [host]/adm/liveness) and will take the server out of service if the script return anything other than a 200 result.

    touch /maintenance_mode
    
  2. On client machine, You can now run the playbook limiting to only the server that is currently out of service. You must ensure you have run the correct setup before this command. See https://gitlab.developers.cam.ac.uk/uis/devops/grails-application-ansible-deployment#deployment for more details.

    ./run-ansible-playbook.sh -i ibis-production ibis-playbook.yml --diff --limit lookup-live1
    
  3. Bring the node back in to service. After the deployment is completed, you should then test the changes before bring the server back in to service.

    rm /maintenance_mode
    
  4. Repeat for 2nd node

    # On lookup-live2
    touch /maintenance_mode
    
    # On client machine
    ./run-ansible-playbook.sh -i ibis-production ibis-playbook.yml --diff --limit lookup-live2
    
    # On lookup-live2
    rm /maintenance_mode
    

Confirm which node is in service

You can confirm which notes are serving traffic with the following command:

curl -s https://www.lookup.cam.ac.uk/adm/status | grep 'HOSTNAME:\|Overall status\|Application' | sed 's/.*<pre>'//g

SLES Deployements

The majority of SLES deployments are clustered using Pacemaker.

To make a release without incurring downtime, the following steps can be taken. Examples, refer to Lookup service (ucs-ibis package) and need substituting as appropriate.

Start with the standby node

Note

Which node is on standby can be determined by running crm_mon -1 or looking at the service's /adm/status page.

  1. Put the cluster into maintenance mode, this ensures pacemaker does not try to make changes to the cluster in response to a node being unavailable.

    crm configure property maintenance-mode=true
    # Check for "unmanaged" status
    crm_mon -1
    
  2. Release the lock on the package.

    # List locked packages
    zypper ll
    # Release lock on appropriate package
    zypper rl ucs-ibis
    
  3. Run the software upgrade.

    # Refresh repositories
    zypper ref
    # List updates available
    zypper lu
    # Update application specific package
    zypper up ucs-ibis
    # Restart tomcat (a single "restart" doesn't always work)
    service tomcat6 stop
    service tomcat6 start
    
  4. Check the service is running on the node with update.

    • see https://{node url}/adm/status
    • check application functionality
  5. Move the service out of maintenance mode.

    crm configure property maintenance-mode=false
    # Check for removal of "unmanaged" status
    crm_mon -1
    
  6. Reapply lock to package.

    # Release lock on appropriate package
    zypper al ucs-ibis
    # List locked packages to check
    zypper ll
    

Move to current live node

  1. Move service to already updated standby node

    crm configure edit
    # shift node weights so that the current standby node is the preferred service
    # verify that the service has moved to the previous standby:
    crm_mon -1
    
  2. Repeat steps 1 to 6 above. i.e.

    • put the service back in maintenance mode
    • unlock the package
    • update the package
    • check success
    • remove the service from maintenance mode
    • relock the package
  3. Move service to back to this node

    crm configure edit
    # shift node weights so that this nde is the preferred service
    # verify that the service has moved back:
    crm_mon -1
    

Following the above should allow a software upgrade to be deployed without any downtime. These steps will not work if the upgrade includes a breaking change to an external data source, e.g. a database migration which is not compatible with the previous version of the software. In this case downtime may need to be scheduled.

TLS certificates on UCS development group services

Installation of TLS certificates on UCS dev group services is a manual process on SLES servers. The certificates on Redhat based servers are manged by ansible

Certificate locations

Some services seem to have directories (ssl.crt and ssl.key) created to hold the certificate and key files, other use the tomcat config directory, grep -i certificate /srv/www/tomcat6/base/conf/server.xml should show the path.

To install new certificates

Obtain the new certificates from the TLS certificate application.

Copy the new certificate and key files to the certificate location on the target system.

Update the tomcat configuration to use the new certificate, edit /srv/www/tomcat6/base/conf/server.xml.

Ensure that the certificate and key have the correct ownership and permissions with chown ucstomcat <file> and chmod 600 <file>.

To update the intermidiate certificate

Create a new file, qvsslg3.crt, in the certificate location containing the new intermediate certificate, remove any blank lines from it.

Ensure that the certificate and key have the correct ownership and permissions with chown ucstomcat <file> and chmod 600 <file>.

In /srv/www/tomcat6/base/conf/server.xml, edit the line that says (path might be different on your system):

certificateChainFile="/srv/www/tomcat6/base/conf/QuoVadisGlobalSSLICAG3.crt"

to:

certificateChainFile="/srv/www/tomcat6/base/conf/qvsslg3.crt"

Restart Tomcat:

service tomcat6 restart

Database backup and restore on UCS dev group services

Examples, refer to Lookup service (ucs-ibis package) and need substituting as appropriate.

Database backups and restores are managed by a pair of scripts in /usr/share/ibis/bin, ibis-backup and ibis-restore.

Database backups

Generally ibis-backup is run from a cronjob /etc/cron.d/ibis, which, by passing in an argument hour|day|week|month|year, outputs a bzip2 compressed database dump in the /usr/share/ibis/backup/hour|day|week|month|year directory. The ibis-backup script can also take the argument now which creates a database dump in the current directory.

Database restores

The ibis-restore script takes a bzip2 compressed backup file which was created by ibis-backup and optionally the name of the database to restore to.

ibis-restore does not restore to an existing database, if we want to replace an existing database we must do a DROP DATABASE first.

Configuration

It's a common pattern across all the dev-group apps to have a /usr/share/<app>/conf/params.yml file with most of the local config in. And then a page at http://<app>/adm/config to display the raw and parsed contents of that file, along with a button to reload it from disk.

In most of the apps, the page is publicly visible (though not advertised), but the reload button requires admin rights. In some apps (passwords), the page itself is protected for added security.

Each of the apps is different in how it manages roles and permissions. Top-level admins are often listed explicitly in params.yml. Some apps use lookup groups for finer-grained permissions. Some store permissions in their own databases. E.g. the UTBS has quite a detailed system for managing fine-grained access controls across the different service providers.