Advanced Django and Python Performance

24 Oct 2018

My latest work project has involved writing a custom Django API from scratch. Due to the numerous business logic and front-end requirements, something like Django Restful Framework wasn’t really a great option. I learned a great deal about the finer points of Python and Django performance while delivering an API capable of delivering thousands of results quickly.

I’ve consolidated some of my tips below.


Model Managers are useful - but beware of chaining them with other queries

Be careful using model managers, especially when working with Django Prefetch data. You will incur additional lookup queries for the operations that your manager performs, as well as any other operations your manager performs on the data (exclude, order_by, filter, etc.).

Avoid bringing Python into it whenever possible

Do everything you can with properly written Models, queries, and prefetch objects. Once you start using Python, you will significantly impact the performance of your application.

Django is fast. Databases are fast. Python is slow.

Learning to use select_related and prefetch_related will save you a ton of time and debugging. It will also improve your query speeds! As I mentioned above, be careful mixing Model managers with these utilities - also, whenever you begin introducing multiple relationships in a query, you will want to use distinct() and order_by(). Having said that…

Watch out for distinct() gotchas

If you are using advanced Django queries that span multiple relationships, you may notice that duplicate rows are returned. No problem, we’ll just call .distinct() on the queryset, right?

If you only call distinct(), and you forget to call order_by() on your queryset, you will still receive duplicate results! This is a known Django “thing” - beware.

"When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order."
- Django Docs

Profile your Django queries

You can’t fix what you don’t measure. Make sure DEBUG=True in your Django file, and then drop this snippet into your code to output the queries being run.

from django.db import connection

# Add this block after your queries have been executed
if len(connection.queries) > 0:
    count, time = (0, 0)
    for query in connection.queries:
        count += 1
        print "%s: %s" % (count, query)
        time += float(query['time'])
    print 'Total queries: %s' % count
    print 'Total time: %s' % time


Use map when performance matters AND the functions are complex AND you are using named functions. Use list comprehensions for everything else.

map is a built-in function written in C. Using map produces performance benefits over using list comprehensions in certain cases.

Please note that if you consume an anonymous lambda as your map function, rather than a named function, you lose the optimization benefits of map and it will in fact be much slower than an equivalent list comprehension. I will give you an example of this gotcha below.

def map_it(arr):
    return map(square_entry, arr)

def square_entry(x):
    return x ** 2

def list_comp(arr):
    return [square_entry(x) for x in arr]

def list_comp_lambda(arr):
    return [x ** 2 for x in arr]

def for_loop(arr):
    response = []
    a = response.append
    for i in arr:
        a(i ** 2)
    return response

To test the performance of these functions, we create an array with 10,000 numbers, and go through the array squaring each value. Pretty simple stuff. Check out the wild differences in runtime and performance:

  1. List Comprehension with anonymous lambda: 5 function calls in 0.001 seconds
  2. For Loop: 10005 function calls in 0.048 seconds
  3. List Comprehension using named function: 10005 function calls in 0.049 seconds
  4. map with named function: 10006 function calls in 0.050 seconds

Moral of the story? If you are doing simple list operations, use list comprehensions with anonymous lambdas. They are faster, more readable, and more pythonic.

When you’re munging complex data in Python, it’s a good idea to handle the data modification in a named function and then use map to call that function. You must always profile your code before and after using map to ensure that you are actually gaining performance and not losing it!

You might be asking so, when should I use map?

A good candidate for map is any long or complex function that will perform conditional operations on the provided arguments. map functions are great for iterating through objects and assigning properties based on data attributes, for example.

Here’s an example of map being significantly faster than list comprehensions (shamelessly taken from Stack Overflow):

$ python -mtimeit -s'xs=range(10)' 'map(hex, xs)'
100000 loops, best of 3: 4.86 usec per loop
$ python -mtimeit -s'xs=range(10)' '[hex(x) for x in xs]'
100000 loops, best of 3: 5.58 usec per loop

Abuse try/except when necessary - but be careful

If you’re using inline try/except statements (where it’s no big deal if the try block fails), just attempt to do the thing you want to do, rather than using extraneous if statements.

Here’s some sample code and real profiling results to guide your decisions.

import os
import profile
import pstats

# This is a typical example of extraneous if statements
def get_from_array_slow(array, index):
        # A typical `if` statement here might check to make sure
        # That our array is long enough for the index to be valid
        # A perfectly reasonable statement, right?
        if len(array) > index:
            # Unfortunately, we incur an unnecessary performance penalty due to calling len()
            return array[index]
            return None
        return None

# This is functionally the same at runtime,
# but without the additional len() operation
def get_from_array_fast(array, index):
        return array[index]
        return None

NUM_TRIALS = 10000

def with_if():
    for i in xrange(0, NUM_TRIALS):
        get_from_array_slow([], 99)  # Out of index

def without_if():
    for i in xrange(0, NUM_TRIALS):
        get_from_array_fast([], 99)  # Out of index

# This is a simple way of using the profile module available within Python
def profileIt(func_name):
    tmp_file = 'profiler'
    output_file = 'profiler'
    run_str = '%s()' % func_name
    tf = '%s_%s_tmp.tmp' % (tmp_file, func_name)
    of = '%s_%s_output.log' % (output_file, func_name), tf)
    p = pstats.Stats(tf)
    p.sort_stats('tottime').print_stats(30)  # Print stats to console
    with open(of, 'w') as stream:  # Save to file
        stats = pstats.Stats(tf, stream=stream)
    os.remove(tf)  # Remove the tmp file


Our profiler results are below - using an if took 0.098 seconds - using only try/except shaved off one-third of the compute time, down to 0.065 seconds


20004 function calls in 0.098 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10000    0.049    0.000    0.073    0.000
    1    0.025    0.025    0.098    0.098
10000    0.024    0.000    0.024    0.000 :0(len)
    1    0.000    0.000    0.000    0.000 :0(setprofile)
    1    0.000    0.000    0.098    0.098 profile:0(with_if())
    1    0.000    0.000    0.098    0.098 <string>:1(<module>)
    0    0.000             0.000          profile:0(profiler)



10004 function calls in 0.065 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10000    0.032    0.000    0.032    0.000
    1    0.032    0.032    0.064    0.064
    1    0.000    0.000    0.000    0.000 :0(setprofile)
    1    0.000    0.000    0.065    0.065 profile:0(without_if())
    1    0.000    0.000    0.064    0.064 <string>:1(<module>)
    0    0.000             0.000          profile:0(profiler)

Notice that our function using if incurs twice as many function calls as our plain old try/except block.