Advanced Django Performance

24 Oct 2018

My latest work project has involved writing a custom Django API from scratch. Due to the numerous business logic and front-end requirements, something like Django Restful Framework wasn’t really a great option. I learned a great deal about the finer points of Django performance while delivering an API capable of delivering thousands of results quickly.

I’ve consolidated some of my tips below.

Model Managers are useful - but beware of chaining them with other queries

Be careful using model managers, especially when working with Django Prefetch data. You will incur additional lookup queries for the operations that your manager performs, as well as any other operations your manager performs on the data (exclude, order_by, filter, etc.).

Avoid bringing Python into it whenever possible

Do everything you can with properly written Models, queries, and prefetch objects. Once you start using Python, you will significantly impact the performance of your application.

Django is fast. Databases are fast. Python is slow.

Learning to use select_related and prefetch_related will save you a ton of time and debugging. It will also improve your query speeds! As I mentioned above, be careful mixing Model managers with these utilities - also, whenever you begin introducing multiple relationships in a query, you will want to use distinct() and order_by(). Having said that…

Watch out for distinct() gotchas

If you are using advanced Django queries that span multiple relationships, you may notice that duplicate rows are returned. No problem, we’ll just call .distinct() on the queryset, right?

If you only call distinct(), and you forget to call order_by() on your queryset, you will still receive duplicate results! This is a known Django “thing” - beware.

"When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order."
- Django Docs

Profile your Django queries

You can’t fix what you don’t measure. Make sure DEBUG=True in your Django settings.py file, and then drop this snippet into your code to output the queries being run.

from django.db import connection

# Add this block after your queries have been executed
if len(connection.queries) > 0:
    count, time = (0, 0)
    for query in connection.queries:
        count += 1
        print "%s: %s" % (count, query)
        time += float(query['time'])
    print 'Total queries: %s' % count
    print 'Total time: %s' % time