Advanced Python Performance

24 Oct 2018

Here are some performance hints I learned from doing a deep dive into Python for a work project.

Use map when performance matters AND the functions are complex AND you are using named functions. Use list comprehensions for everything else.

map is a built-in function written in C. Using map produces performance benefits over using list comprehensions in certain cases.

Please note that if you consume an anonymous lambda as your map function, rather than a named function, you lose the optimization benefits of map and it will in fact be much slower than an equivalent list comprehension. I will give you an example of this gotcha below.

def map_it(arr):
    return map(square_entry, arr)

def square_entry(x):
    return x ** 2

def list_comp(arr):
    return [square_entry(x) for x in arr]

def list_comp_lambda(arr):
    return [x ** 2 for x in arr]

def for_loop(arr):
    response = []
    a = response.append
    for i in arr:
        a(i ** 2)
    return response

To test the performance of these functions, we create an array with 10,000 numbers, and go through the array squaring each value. Pretty simple stuff. Check out the wild differences in runtime and performance:

  1. List Comprehension with anonymous lambda: 5 function calls in 0.001 seconds
  2. For Loop: 10005 function calls in 0.048 seconds
  3. List Comprehension using named function: 10005 function calls in 0.049 seconds
  4. map with named function: 10006 function calls in 0.050 seconds

Moral of the story? If you are doing simple list operations, use list comprehensions with anonymous lambdas. They are faster, more readable, and more pythonic.

When you’re munging complex data in Python, it’s a good idea to handle the data modification in a named function and then use map to call that function. You must always profile your code before and after using map to ensure that you are actually gaining performance and not losing it!

You might be asking so, when should I use map?

A good candidate for map is any long or complex function that will perform conditional operations on the provided arguments. map functions are great for iterating through objects and assigning properties based on data attributes, for example.

Here’s an example of map being significantly faster than list comprehensions (shamelessly taken from Stack Overflow):

$ python -mtimeit -s'xs=range(10)' 'map(hex, xs)'
100000 loops, best of 3: 4.86 usec per loop
$ python -mtimeit -s'xs=range(10)' '[hex(x) for x in xs]'
100000 loops, best of 3: 5.58 usec per loop

Abuse try/except when necessary - but be careful

If you’re using inline try/except statements (where it’s no big deal if the try block fails), just attempt to do the thing you want to do, rather than using extraneous if statements.

Here’s some sample code and real profiling results to guide your decisions.

import os
import profile
import pstats

# This is a typical example of extraneous if statements
def get_from_array_slow(array, index):
        # A typical `if` statement here might check to make sure
        # That our array is long enough for the index to be valid
        # A perfectly reasonable statement, right?
        if len(array) > index:
            # Unfortunately, we incur an unnecessary performance penalty due to calling len()
            return array[index]
            return None
        return None

# This is functionally the same at runtime,
# but without the additional len() operation
def get_from_array_fast(array, index):
        return array[index]
        return None

NUM_TRIALS = 10000

def with_if():
    for i in xrange(0, NUM_TRIALS):
        get_from_array_slow([], 99)  # Out of index

def without_if():
    for i in xrange(0, NUM_TRIALS):
        get_from_array_fast([], 99)  # Out of index

# This is a simple way of using the profile module available within Python
def profileIt(func_name):
    tmp_file = 'profiler'
    output_file = 'profiler'
    run_str = '%s()' % func_name
    tf = '%s_%s_tmp.tmp' % (tmp_file, func_name)
    of = '%s_%s_output.log' % (output_file, func_name), tf)
    p = pstats.Stats(tf)
    p.sort_stats('tottime').print_stats(30)  # Print stats to console
    with open(of, 'w') as stream:  # Save to file
        stats = pstats.Stats(tf, stream=stream)
    os.remove(tf)  # Remove the tmp file


Our profiler results are below - using an if took 0.098 seconds - using only try/except shaved off one-third of the compute time, down to 0.065 seconds


20004 function calls in 0.098 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10000    0.049    0.000    0.073    0.000
    1    0.025    0.025    0.098    0.098
10000    0.024    0.000    0.024    0.000 :0(len)
    1    0.000    0.000    0.000    0.000 :0(setprofile)
    1    0.000    0.000    0.098    0.098 profile:0(with_if())
    1    0.000    0.000    0.098    0.098 <string>:1(<module>)
    0    0.000             0.000          profile:0(profiler)



10004 function calls in 0.065 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10000    0.032    0.000    0.032    0.000
    1    0.032    0.032    0.064    0.064
    1    0.000    0.000    0.000    0.000 :0(setprofile)
    1    0.000    0.000    0.065    0.065 profile:0(without_if())
    1    0.000    0.000    0.064    0.064 <string>:1(<module>)
    0    0.000             0.000          profile:0(profiler)

Notice that our function using if incurs twice as many function calls as our plain old try/except block.