Python Cython Tutorial – Speeding up your Code 1000x

As one of the most popular languages, Python is constantly compared and contrasted to other popular languages like C/C++. The most common complaint that is made against Python is how slow it is. You will often see performance benchmarks showing that C/C++ are 10x (or more) times faster than Python. In today’s tutorial, we will explore “Cython” which will allow us to significantly bridge the gap between Python and other languages in terms of performance.


Python Cython Tutorial

Throughout this tutorial we will teach you how to use Cython to “cythonize” your Python code, and also show you several benchmarks to prove this.

But first, what is Cython and how do we use it?


What is Cython?

Cython is a super-set of the Python programming language, which acts as a middle-man between Python and C/C++. In short, Cython gives us a way to compile our Python code to C/C++. So it’s not really optimizing Python directly, rather it’s compiling it to a lower level language which runs faster.

This of course means that Cython can never be faster than C/C++, rather it’s a bit slower due to overhead, and the fact that usually some Python elements remain in the code (like if only certain parts were converted to C/C++).

But its still 100% worth it, as it allows us to take write fast code using Python, without too much hassle.

Another interesting piece of trivia regarding Cython, that major Python libraries such as NumPy and Pandas already use Cython to improve performance. This shows just how much Cython is used in the industry, and should reassure you that learning it is really worth it.


How does Cython improve performance?

Saying that Cython simply compiles Python code to C/C++ is a bit of an over-simplification. As programmers, we should know how exactly Cython is achieving these performance gains and why it beats Python in all benchmarks.

Simply put, there are multiple optimizations applied by Cython. Most of it has to do with “typing information”. This is because Python is a dynamically typed language which means the type of the variables can change during run-time. This however, comes at the cost of performance, and in certain situations can cause performance to take a massive hit.

Cython gives us the ability to define static types for Python variables. So instead of:

x = 0

we now write:

cdef int x = 0

And just like statically typed languages, this will throw an error if we try to assign anything other than an int to Python.

Another optimization is when Cython initially compiles Python. This produces a slight performance benefit, even if you don’t use any Cython syntax. Other optimizations can be gained from using C/C++ compatible objects, such as arrays from Numpy.

Instead of just adding all the various optimizations at once, we will do them one at a time. This way we will be able to monitor how the performance is effected at each step. This will help you understand which optimizations have a greater effect, and most importantly you will understand how Cython improves performance.


Compiling a Python Program using Cython

Here we have some code to generate the Fibonacci series in Python. Let’s name the file this code is in, “program1.py”. We will explore more programs later in the tutorial.

def fib(n):
    n1, n2 = 0, 1

    for i in range(1, n + 1):
        temp = n1 + n2
        n1 = n2
        n2 = temp
    return n2

We won’t make any changes for now. Let’s just explore how to compile this using Cython first, and see if that has an impact on performance.

Setting up Cython can be rather annoying, but it’s going to be worth it.

  1. The first thing you need to do is install Cython, using pip install cython or any equivalent method.
  2. Secondly, make a duplicate of your Python file, and change the extension and name slightly to “program1_cy.pyx“. You can also choose to use the same name, but we are doing it this for benchmarking purposes as you will see later.
  3. Create a file called setup.py and paste the following code inside.
from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize("program1.pyx",
    compiler_directives = { "language_level" : "3"}),
)

The first parameter is the name of the file to being used to compile the C/C++, and the second defines whether we are using Python 2 or Python 3.

Now run the following command: (make sure this is all happening in the same directory)

python setup.py build_ext --inplace

This should generate the required files. You will notice a build folder, a .so (shared library) and a .c or .cpp file. Our code is now ready and compiled. Let’s try running it. In a new python file, we running the following code will give us our output.

import program1_cy

print(program1_cy.fib(100))

It will give us the value, 573147844013817084101 which is the correct output. But how do we know whether this was faster in Cython than it would be in Pure Python? Let’s do a benchmark test.


Benchmark #1

This is the first benchmark in this Python Cython Tutorial. We will make a new file called test.py where we will write the following code. We will be using the timeit library for benchmarking.

import timeit

python= timeit.timeit('program1.fib(10000)',
       setup='import program1',number=100)
cython= timeit.timeit('program1_cy.fib(10000)', 
       setup='import program1_cy',number=100)

print("Python Time: ", python)
print("Cython Time: ", cython)
print(f"{python/cython}x times faster")

The first benchmark we do will be for 10000, and will be done a 100 times each. (We perform 100 iterations of this to remove outliers and make our results more accurate).

Python Time:  0.3353964
Cython Time:  0.21417430000000004
1.565997414255585x times faster

Here we can already see an improvement of over 50%. Let’s run that again.

Python Time:  0.2971565
Cython Time:  0.3253751
0.9132736340303853x times faster

Here we see Cython lose out. This can happen sometimes randomly, but you will notice that Cython wins most tests.

Let’s increase the nth number of Fibonacci. This will tilt the result in Cython’s favor.

For the 100000th term:

Python Time:  3.9538857000000003
Cython Time:  1.1979654000000002
3.300500749019963x times faster

Here we see that Cython is more than three times faster! And this was without any changes from our side.


Adding Type Information with Cython

Now let’s begin adding typing information to Python using Cython. Normally Python has the def keyword, but Cython introduces two new ones called cdef and cpdef.

cdef is meant to only be used with C. When this declaration is used, only a C version of the function/object is generated.

Variables/Functions declared with cpdef can be used with both Python and C. There are some exceptions, such as when using C pointers, but we will discuss those in a later tutorial.

So which are we going to use? Well, from Cython 3.0 onwards, cpdef variables are no longer supported (as they behave no differently from cdef variables). So we will use cpdef for functions, and cdef for variables.


Now let’s add some type information. (Don’t forget to add type information for function argument)

cpdef int fib(int n):
    cdef int n1 = 0
    cdef int n2 = 1 
    cdef int temp, i

    for i in range(1, n + 1):
        temp = n1 + n2
        n1 = n2
        n2 = temp
    return n2

Here is our updated code. We have given the function a return type of int, and declared all the other variables as int too. The benefit here is that Python does not have to constantly ask itself, “what is the type of this variable?”.

This appears to be a very minor operation, and it is! But what happens when you have to constantly lookup the type of a variable 1000000 times? And since we have more than one variable inside the for loop, you can multiply that number by about 4 – 5.


Benchmark# 2

So how much of a performance increase does this give us?

For the 1000th term: (remember to recompile first)

Python Time:  0.0008732000000000045
Cython Time:  6.8000000000012495e-06
128.4117647058594x times faster

Wow! 128x times faster. Let’s try this for 10000th term:

Python Time:  0.0311598
Cython Time:  5.9000000000003494e-05
528.1322033897992x times faster

528 times faster! Even more incredible! Now one last time, for the 100000th term:

Python Time:  4.3451835
Cython Time:  0.0015122000000005187
2873.418529294081x times faster

This Cython benchmark results right here are the main part of our Tutorial, to show you just how much computing can be sped up in Python using Cython. We succeeded in writing code that was 2000 times faster than the original.


The best part about using Cython, is that you don’t need to drastically change your code or the way it is structured. Even the simple act of adding typing information results in a massive speed boost for your Python program.

So what do you think? It Cython worth it? How much did it speed up your Python code?


This marks the end of the Python Cython Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments