Skip to content

Identifiers

Warning

This page is provisional pending UIS-wide agreement.

"Naming things" is one of the great unsolved problems in Computer Science. Often we find that different systems need to refer to things. For example, almost all of our systems need to have a way to refer to a person who is in some sense a member of the University. Traditionally we have used the CRSid for this purpose.

On this page we use the term identifier to mean a data structure which names an entity in one system such that another system can refer to it in a long-lived manner.

Systems, both bespoke and off-the-shelf, have a bewildering array of formats for identifiers ranging from unstructured text through to rich data structures with rigid schemas.

We have taken a pragmatic approach in defining a common structure for identifiers but leaving the precise formatting of the structure as flexible to allow for integration with a wider range of systems. We discuss this in more detail below.

There is a separate page of well-known identifier scopes used within UIS. This page explains how we use identifiers in DevOps.

Scoped and unscoped identifiers

A CRSid is an example of an unscoped identifier. An unscoped identifier does not guarantee uniqueness between classes of entities. For example, a single person may have both a CHRIS payroll number and a CamSIS student number. At a glance it is not clear if the identifier "12345678" is their CHRIS payroll number of CamSIS student number.

A scoped identifier adds text to an unscoped identifier which makes the class of thing it refers to explicit. This text is called the scope.

Some examples of scoped identifiers we use in our systems:

  • abc123@v1.person.identifiers.cam.ac.uk - A CRSid referring to a person.
  • 12345678@person.v1.student-records.university.identifiers.cam.ac.uk - The Unique Student Number (USN) of a current or former student at the University.
  • UIS@insts.lookup.cam.ac.uk - The Lookup UIS institution.

Schemes

A rule detailing a scope and an allowed format for an unscoped identifier is called a scheme. For example the scheme for USNs could be:

  • Scope: person.v1.student-records.university.identifiers.cam.ac.uk.
  • Identifier: matches the regular expression ^[0-9]{8}$.

Formatting

In the example above we have used email-address style formatting: the local part corresponds to an identifier and the domain corresponds to the scope. Systems generally use their own internal methods to record the scope of an identifier and use some agreed interchange format for sharing scoped identifiers.

Our concept of a scoped identifier need not use email-address style scoping. For example, the identifiers above may also be rendered using Uniform Resource Name (URN)-style scoping:

  • urn:uniofcam:cam.ac.uk:abc123
  • urn:uniofcam:person.v1.student-records.university.identifiers.cam.ac.uk:12345678
  • urn:uniofcam:insts.lookup.cam.ac.uk:UIS
URN namespaces

Here we have used the URN namespace of uniofcam. This is not a formal namespace registered with the Internet Assigned Numbers Authority (IANA). General practice has evolved to tolerate the use of unregistered formal namespaces within an organisation but the uniofcam namespace should not be used outside of the University.

Some of our systems explicitly provide the scope and unscoped identifier part separately. For example a hypothetical identity system may return a JSON record for the query abc123@cam.ac.uk in the following format:

{
  "query": "abc123@cam.ac.uk",
  "identities": [
    { "scope": "cam.ac.uk", "identifier": "abc123" },
    { "scope": "person.v1.student-records.university.identifiers.cam.ac.uk", "identifier": "12345678" }
  ],
  "memberships": [
    { "scope": "insts.lookup.cam.ac.uk", "identifier": "UIS" }
  ],
}

Generally if we need to share identifiers as single text strings, we prefer email-address formatting since it often plays nicely with identity systems as recording a person in an identity system as abc123@cam.ac.uk is often well-supported.

Parsing identifiers

We have a standard library which can be used to parse email-address formatted identifiers. This is used by a number of our APIs to ensure consistency in identifier representation.