Python—how far have we come in terms of speed Link to heading
Edit: The article has been updated with
Pythonversion 3.13. From my testing, I didn’t find any performance gain from 3.13 over its previous versions.
Today, let’s investigate how far we have come with Python in terms of performance. Specifically, we are going to run a few benchmark programs on Python versions 3.9, 3.10, 3.11, 3.12, and 3.13 and see how much faster each version runs compared to previous versions. In addition, we are also going to compare against pypy, a drop-in replacement with faster implementation of Python.
Python is notorious for its slower execution speed. However, the Python community has made significant strides in improving Python’s performance with each new version. In particular, Microsoft hired a team of engineers, including the Python creator Guido van Rossum to make Python faster with the goal to achieve 5x speed up with various optimizations. Let’s see how far we have come so far in comparison to pypy, which is another direction of effort to make Python run faster since 2007. In this article, I will use the term Python to refer to CPython, the default implementation of Python that most people run.
Benchmark setup Link to heading
To be able to run different versions of Python without contaminating existing environment, we are going to setup a temporary environment with Python 3.9, 3.10, 3.11, 3.12, and 3.13 versions as well as pypy 3.10 installed using nix package manager:
# install different versions of python3 and pypy in a temporary shell environment
nix-shell -p python39 python310 python311 python312 python313 pypy310
Recursion benchmark Link to heading
We will start with a very simple Fibonacci number function that is implemented with recursion.
# fibo.py
import sys
def fibo(x: int) -> int:
if x <= 1:
return x
return fibo(x-1) + fibo(x-2)
if __name__ == '__main__':
print(fibo(int(sys.argv[1])))
Let’s run this simple script with different implementations of Python
for py in python3.9 python3.10 python3.11 python3.12 python3.13 pypy3.10; do echo $py; time $py fibo.py 40; done

Few interesting notes
Python3.9 runs faster than 3.10 while 3.12 runs faster than 3.13Python3.12 has significant performance boost over 3.9pypyruns about 5x faster than Python 3.12
For a recursive function, pypy is expected to run much faster owing to its just-in-time (JIT) compilation technique.
Non-recursive benchmark Link to heading
Let’s run a bit more realistic non-recursive benchmark: Mandelbrot. Here is the fastest Python implementation from Debian Benchmark Game with a slight modification to provide explicit number of cpu_count as an argument
# mandelbrot.py
from contextlib import closing
from itertools import islice
from sys import argv, stdout
def pixels(y, n, abs):
range7 = bytearray(range(7))
pixel_bits = bytearray(128 >> pos for pos in range(8))
c1 = 2. / float(n)
c0 = -1.5 + 1j * y * c1 - 1j
x = 0
while True:
pixel = 0
c = x * c1 + c0
for pixel_bit in pixel_bits:
z = c
for _ in range7:
for _ in range7:
z = z * z + c
if abs(z) >= 2.: break
else:
pixel += pixel_bit
c += c1
yield pixel
x += 8
def compute_row(p):
y, n = p
result = bytearray(islice(pixels(y, n, abs), (n + 7) // 8))
result[-1] &= 0xff << (8 - n % 8)
return y, result
def ordered_rows(rows, n):
order = [None] * n
i = 0
j = n
while i < len(order):
if j > 0:
row = next(rows)
order[row[0]] = row
j -= 1
if order[i]:
yield order[i]
order[i] = None
i += 1
def compute_rows(cpu_count, n, f):
row_jobs = ((y, n) for y in range(n))
if cpu_count < 2:
yield from map(f, row_jobs)
else:
from multiprocessing import Pool
with Pool(cpu_count) as pool:
unordered_rows = pool.imap_unordered(f, row_jobs)
yield from ordered_rows(unordered_rows, n)
def mandelbrot(cpu_count, n):
write = stdout.buffer.write
with closing(compute_rows(cpu_count, n, compute_row)) as rows:
write("P4\n{0} {0}\n".format(n).encode())
for row in rows:
write(row[1])
if __name__ == '__main__':
mandelbrot(int(argv[1]), int(argv[2]))
Again, let’s run this and measure how fast each version of Python runs with different number of cores to utilize:
for n in 1 2 4; do for py in python3.9 python3.10 python3.11 python3.12 python3.13 pypy3.10; do echo $n $py; time $py mandelbrot.py $n > /dev/null; done; done

This time, performance gain of Python 3.11 over 3.9, for example, is not as much as with recursive benchmark—only about 20% for 1-core and 10% for 4-cores. However, pypy is still much faster than CPython implementations at around 3.5x performance gain over 3.11 with 4-cores.
After running the benchmark, here is my take away
Python3.11 is currently the fastest version of CPython, even faster than 3.13- However, the real winner is
pypy, which runs 5x faster without any modification to the existing codebase - CPython has a long way to go to be able to catch up up with
pypy
**