Skip to content

Gunicorn tuning

We have a boilerplate web application. This comes with a default Dockerfile which packages the webapp to use gunicorn to handle incoming requests. This note discusses how gunicorn may be tuned in production.

Our default configuration is as follows:

  • gthread worker (--worker-class gthread)
  • 4 worker processes (--workers 4)
  • 1 thread per process (--threads 1)

The correct choice of --workers and --threads here is non-trivial and may need to be tweaked in production.

The rationale for using the gthread worker is not, as you might first suspect, because of its threading capability but because the threaded workers allow individual connections to live beyond the worker timeout. This is important as we were getting a fair chuck of log-spam when running gunicorn where workers were being repeatedly launched and culled due to inactivity timeout on lightly-loaded services.

The gthread worker does have the advantage of allowing us to separately tune the --workers setting (number of worker processes) and the --threads setting (number of threads within each worker). Django connection pooling is per thread so care must be taken when choosing these values. The minimum number of database connections will be "# workers" x "# threads". On db-f1-micro instances, the maximum number of connections is around 15 with some reserved for system use and so setting --workers 4 --threads 4 would immediately exhaust the connection limit.

The gunicorn docs suggest that one only needs a small handful of worker "entities" (threads or workers) to handle many thousand requests. As a "sensible" default given the connection limit constraints noted above, we aim for "# workers" x "# threads" to be around 4.

The choice of workers vs threads is further discussed in another section of the documentation and notes that the optimal value will depend on the relative performance of your Python implementation. CPython is historically fairly poor at threading due to the GIL.

The per-worker memory overhead is smaller with threads but the overhead is mainly due to in-kernel memory structures and non-shared pages. If you encounter memory pressure problems in production you may wish to decrease the ratio of worker-count to thread-count keeping the product the same and pay the small performance penalty instead.