Python—how far have we come in terms of speed Link to heading

Edit: The article has been updated with Python version 3.13. From my testing, I didn’t find any performance gain from 3.13 over its previous versions.

Today, let’s investigate how far we have come with Python in terms of performance. Specifically, we are going to run a few benchmark programs on Python versions 3.9, 3.10, 3.11, 3.12, and 3.13 and see how much faster each version runs compared to previous versions. In addition, we are also going to compare against pypy, a drop-in replacement with faster implementation of Python.

Python is notorious for its slower execution speed. However, the Python community has made significant strides in improving Python’s performance with each new version. In particular, Microsoft hired a team of engineers, including the Python creator Guido van Rossum to make Python faster with the goal to achieve 5x speed up with various optimizations. Let’s see how far we have come so far in comparison to pypy, which is another direction of effort to make Python run faster since 2007. In this article, I will use the term Python to refer to CPython, the default implementation of Python that most people run.

Benchmark setup Link to heading

To be able to run different versions of Python without contaminating existing environment, we are going to setup a temporary environment with Python 3.9, 3.10, 3.11, 3.12, and 3.13 versions as well as pypy 3.10 installed using nix package manager:

# install different versions of python3 and pypy in a temporary shell environment
nix-shell -p python39 python310 python311 python312 python313 pypy310

Recursion benchmark Link to heading

We will start with a very simple Fibonacci number function that is implemented with recursion.

# fibo.py
import sys


def fibo(x: int) -> int:
    if x <= 1:
        return x
    return fibo(x-1) + fibo(x-2)


if __name__ == '__main__':
    print(fibo(int(sys.argv[1])))

Let’s run this simple script with different implementations of Python

for py in python3.9 python3.10 python3.11 python3.12 python3.13 pypy3.10; do echo $py; time $py fibo.py 40; done

Fibonacci recursion benchmark results

Few interesting notes

Python 3.9 runs faster than 3.10 while 3.12 runs faster than 3.13
Python 3.12 has significant performance boost over 3.9
pypy runs about 5x faster than Python 3.12

For a recursive function, pypy is expected to run much faster owing to its just-in-time (JIT) compilation technique.

Non-recursive benchmark Link to heading

Let’s run a bit more realistic non-recursive benchmark: Mandelbrot. Here is the fastest Python implementation from Debian Benchmark Game with a slight modification to provide explicit number of cpu_count as an argument

# mandelbrot.py

from contextlib import closing
from itertools import islice
from sys import argv, stdout

def pixels(y, n, abs):
    range7 = bytearray(range(7))
    pixel_bits = bytearray(128 >> pos for pos in range(8))
    c1 = 2. / float(n)
    c0 = -1.5 + 1j * y * c1 - 1j
    x = 0
    while True:
        pixel = 0
        c = x * c1 + c0
        for pixel_bit in pixel_bits:
            z = c
            for _ in range7:
                for _ in range7:
                    z = z * z + c
                if abs(z) >= 2.: break
            else:
                pixel += pixel_bit
            c += c1
        yield pixel
        x += 8

def compute_row(p):
    y, n = p

    result = bytearray(islice(pixels(y, n, abs), (n + 7) // 8))
    result[-1] &= 0xff << (8 - n % 8)
    return y, result

def ordered_rows(rows, n):
    order = [None] * n
    i = 0
    j = n
    while i < len(order):
        if j > 0:
            row = next(rows)
            order[row[0]] = row
            j -= 1

        if order[i]:
            yield order[i]
            order[i] = None
            i += 1

def compute_rows(cpu_count, n, f):
    row_jobs = ((y, n) for y in range(n))

    if cpu_count < 2:
        yield from map(f, row_jobs)
    else:
        from multiprocessing import Pool
        with Pool(cpu_count) as pool:
            unordered_rows = pool.imap_unordered(f, row_jobs)
            yield from ordered_rows(unordered_rows, n)

def mandelbrot(cpu_count, n):
    write = stdout.buffer.write

    with closing(compute_rows(cpu_count, n, compute_row)) as rows:
        write("P4\n{0} {0}\n".format(n).encode())
        for row in rows:
            write(row[1])

if __name__ == '__main__':
    mandelbrot(int(argv[1]), int(argv[2]))

Again, let’s run this and measure how fast each version of Python runs with different number of cores to utilize:

for n in 1 2 4; do for py in python3.9 python3.10 python3.11 python3.12 python3.13 pypy3.10; do echo $n $py; time $py mandelbrot.py $n > /dev/null; done; done

Benchmark with mandelbrot.py

This time, performance gain of Python 3.11 over 3.9, for example, is not as much as with recursive benchmark—only about 20% for 1-core and 10% for 4-cores. However, pypy is still much faster than CPython implementations at around 3.5x performance gain over 3.11 with 4-cores.

After running the benchmark, here is my take away

Python 3.11 is currently the fastest version of CPython, even faster than 3.13
However, the real winner is pypy, which runs 5x faster without any modification to the existing codebase
CPython has a long way to go to be able to catch up up with pypy