My latest work project has involved writing a custom Django API from scratch. Due to the numerous business logic and front-end requirements, something like Django Restful Framework wasn’t really a great option. I learned a great deal about the finer points of Python and Django performance while delivering an API capable of delivering thousands of results quickly.
I’ve consolidated some of my tips below.
Be careful using model managers, especially when working with Django
Prefetch data. You will incur additional lookup queries for the operations that your manager performs, as well as any other operations your manager performs on the data (
Do everything you can with properly written Models, queries, and prefetch objects. Once you start using Python, you will significantly impact the performance of your application.
Django is fast. Databases are fast. Python is slow.
Learning to use
prefetch_related will save you a ton of time and debugging. It will also improve your query speeds! As I mentioned above, be careful mixing Model managers with these utilities - also, whenever you begin introducing multiple relationships in a query, you will want to use
order_by(). Having said that…
If you are using advanced Django queries that span multiple relationships, you may notice that duplicate rows are returned. No problem, we’ll just call
.distinct() on the queryset, right?
If you only call
distinct(), and you forget to call
order_by() on your queryset, you will still receive duplicate results! This is a known Django “thing” - beware.
"When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order."
- Django Docs
You can’t fix what you don’t measure. Make sure
DEBUG=True in your Django
settings.py file, and then drop this snippet into your code to output the queries being run.
mapwhen performance matters AND the functions are complex AND you are using named functions. Use list comprehensions for everything else.
map is a built-in function written in C. Using
map produces performance benefits over using list comprehensions in certain cases.
Please note that if you consume an anonymous lambda as your
map function, rather than a named function, you lose the optimization benefits of
map and it will in fact be much slower than an equivalent list comprehension. I will give you an example of this gotcha below.
To test the performance of these functions, we create an array with 10,000 numbers, and go through the array squaring each value. Pretty simple stuff. Check out the wild differences in runtime and performance:
mapwith named function: 10006 function calls in 0.050 seconds
Moral of the story? If you are doing simple list operations, use list comprehensions with anonymous lambdas. They are faster, more readable, and more pythonic.
When you’re munging complex data in Python, it’s a good idea to handle the data modification in a named function and then use map to call that function. You must always profile your code before and after using
map to ensure that you are actually gaining performance and not losing it!
You might be asking so, when should I use map?
A good candidate for
map is any long or complex function that will perform conditional operations on the provided arguments.
map functions are great for iterating through objects and assigning properties based on data attributes, for example.
Here’s an example of
map being significantly faster than list comprehensions (shamelessly taken from Stack Overflow):
If you’re using inline
try/except statements (where it’s no big deal if the
try block fails), just attempt to do the thing you want to do, rather than using extraneous
Here’s some sample code and real profiling results to guide your decisions.
Our profiler results are below - using an
if took 0.098 seconds - using only
try/except shaved off one-third of the compute time, down to 0.065 seconds
Notice that our function using
if incurs twice as many function calls as our plain old