Task Deployment¶
This page documents how we deploy task runners built on ucam-faas to cloud run to support
asynchronous or scheduled workloads.
Task Containers¶
More details on the internal architecture of the tasks collections can be found in the [developer environment explanation for tasks collections].
Relevant to deploying these tasks is the core concept that each task is built into a docker image that runs a flask application, which is geared to respond to HTTP POST requests from a GCP PubSub subscription.
The task images are therefore deployed to Cloud Run in a similar manner to our web applications.
It is expected that each task runner is deployed into a separate cloud run instance.
The ucam-faas terraform module¶
To deploy a task a terraform module is provided. It is configured similarly to the standard web applications boilerplate. The container image for the task runner must be provided, along with some other minimal configuration.
A number of example function deployments can be seen in the dev directory of that repository.
Triggering the function¶
Tasks can be triggered either by PubSub messages being published to a topic, or regularly on a set cron-style schedule.
To trigger a task regularly, the module's triggers variable can be configured with a
cron_schedule key, for example:
module "faas_service_cron" {
source = "gitlab.developers.cam.ac.uk/uis/ucam-faas/devops"
version = "2.3.2"
name = "faas-test-cron"
function = {
container_image = "europe-west2-docker.pkg.dev/shared-code-meta-2daffe11/public/ucam-faas-python/example:0.21.2"
}
triggers = {
cron_schedules = ["00 0 */7 * *", "0 12 */ * *", ]
}
concurrency = {
max_concurrent_functions_per_instance = 1
}
alerting = {
monitoring_scoping_project = local.product_meta_project
notification_channels = local.notification_channels
}
}
Here we are configuring two schedules, so the function will run at 00:00 every 7 days and also at 12:00 every 7 days.
For PubSub triggered jobs, the triggers variable is instead given the name of the topic to
subscribe to:
resource "google_pubsub_topic" "faas_test" {
name = "faas-test"
project = local.project
}
module "faas_service" {
source = "gitlab.developers.cam.ac.uk/uis/ucam-faas/devops"
version = "2.3.2"
name = "faas-test-pubsub"
function = {
container_image = "europe-west2-docker.pkg.dev/shared-code-meta-2daffe11/public/ucam-faas-python/example:0.21.2"
}
triggers = {
pubsub_topic_id = google_pubsub_topic.faas_test.id
}
alerting = {
monitoring_scoping_project = local.product_meta_project
notification_channels = local.notification_channels
}
}
Now, whenever a message is published to the faas-test topic in GCP, the function will be triggered
and the contents of the message will be passed in to the function itself. More details on processing
PubSub messages can be found in the developer
explanation.
Concurrency¶
Tasks can also be configured with a concurrency block, that describes how many tasks can run at the same time. For tasks triggered regularly on a schedule, it is recommended to set the concurrency to a single function running at a time.
The default concurrency settings have been set to minimise cost for asynchronous workloads. By
default, up to 80 tasks can run concurrently and these will all run in a single cloud run instance.
For workloads requiring a higher concurrency than this, the concurrency variable will need to be
modified. The exact optimal settings for a task will depend on the amount of processing resource
required by that task and the time it takes to execute a single task.
Dead letters¶
When a function fails to execute after being triggered it is important to retain the triggering message. To support this, the terraform module automatically configures a "dead letter" queue and default subscription. By default these messages are retained in the dead letter queue for 14 days.
By default, alerting will be setup to notify the configured notification channels when a message enters the dead letter queue.
Long-running tasks¶
Task workloads that need to run for longer than 1 hour must run in a separate environment, as this is the maximum timeout for a Cloud Run request.
Warning
It is strongly recommended to avoid creating any task that runs for longer than an hour. In some cases it may be necessary, but typically requiring a task to run for longer than this is indicative of a design or architecture failure. This should be considered a "last resort" if system limitations require that a single task has to run for this long.
Long running tasks are deployed into a GCP Workflow environment. Long running tasks can be deployed using an equivalent dedicated terraform module.
Long running tasks can be configured in much the same way as normal tasks. Generally speaking, the long-running module attempts to acheive feature parity with the standard module.
Summary¶
In summary,
- Task containers run functions when triggered on a schedule or on messages published to a PubSub topic.
- Most workloads will deploy task containers into GCP Cloud Run using a dedicated terraform module.
- They can be configured with the triggering schedule or PubSub topic.
- Their function concurrency can be customised, but defaults to a cost-sensible configuration.
- Functions that need to run for longer than an hour should be avoided, but if necessary can be deployed into a GCP Workflow-based environment using a separate dedicated module.