Cython vs CPython - Comparing the Speed Difference

In this Cython vs CPython Article, we will be conducting a speed comparison using 10 different benchmarks, covering diverse scenarios and edge cases.

Python is a popular programming language known for its simplicity and readability. However, it is an interpreted language, which can sometimes result in slower execution speeds compared to compiled languages like C. To address this limitation, developers have introduced Cython, an optimizing static compiler for Python. Cython allows you to write Python-like code that can be compiled to C, offering potential performance gains.

Table Of Contents

Testing Environment
Benchmark 1: Fibonacci Sequence
Benchmark 2: Fibonacci Sequence (Iterative)
Benchmark 3: Matrix Multiplication
Benchmark 4: Prime Number Generation
Benchmark 5: String Concatenation
Benchmark 6: Computing Column Means
Benchmark 7: Computing Column Means (Unoptimized)
Benchmark 8: Arithmetic Operations
Benchmark 9: File Operations
Benchmark 10: Linear Searching
Benchmark 11: Bubble Sorting
Conclusion:

Testing Environment

Before examining any benchmarks, you should be aware of what environment and what methods were used to conduct the testing. This helps in reproducibility, and in gaining a better understanding of the results (as results will vary from platform to platform).

Python version: 3.10

Hardware: Ryzen 7 5700U + 16GB RAM + SSD

Operating System: Windows 11

Benchmark Library: Timeit

Technique: We used the Python timeit library to apply several techniques to reduce randomness and improve our timing accuracy. The timeit library allows us to repeat the benchmarks “x” number of times, which we can then average out to reduce randomness (controlled by the repeat parameter). We can also consecutively perform the benchmarks “x” number of times and add all their times together to further reduce randomness (controlled by the number parameter).

Here is a code snippet of our testing code from one of our benchmarks.

import timeit
import program1_cy
from statistics import mean 

python= mean(timeit.repeat('program1.fibonacci(25)',
             setup='import program1',
             number=10, 
             repeat=10))

cython= mean(timeit.repeat('program1_cy.fibonacci(25)', 
             setup='import program1_cy',
             number=10, 
             repeat=10))

print(f"Python Time: {python:.10f}")
print(f"Cython Time: {cython:.10f}")

if python/cython > 1:
    print(f"Cython was {python/cython:.3f} times faster")
else:
    print(f"Cython was {python/cython:.3f} times slower")

Benchmark 1: Fibonacci Sequence

Python Code:

def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

Cython Code:

def fibonacci(int n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

Benchmark#1 Result

Both the Python and Cython versions of the Fibonacci sequence use recursive calls to calculate the value. However, the Cython code, with the explicit declaration of the integer type, allows for more efficient execution and avoids the interpreter overhead of Python, resulting in faster execution.

10th Fibonacci Number	25th Fibonacci Number
Python Time: 0.0001332700 Cython Time: 0.0000027100 Cython was 49.177 times faster	Python Time: 0.3970897400 Cython Time: 0.0028510200 Cython was 139.280 times faster

Cython is also just alot better at handling recursion than Python, which is why we see such a big performance boost. Doing this on the non-recursion version will not yield such a large difference, as we will see in the next benchmark.

Overall: Cython Wins

Benchmark 2: Fibonacci Sequence (Iterative)

Python Code:

def fib(n):
    n1, n2 = 0, 1

    for i in range(1, n + 1):
        temp = n1 + n2
        n1 = n2
        n2 = temp
    return n2

Cython Code:

cpdef int fib(int n):
    cdef int n1 = 0
    cdef int n2 = 1 
    cdef int temp, i

    for i in range(1, n + 1):
        temp = n1 + n2
        n1 = n2
        n2 = temp
    return n2

Benchmark#2 Result:

Here we can observe some massive speedups as well. Not as good as the recursive fibonacci (compare the 10th fibonacci benchmarks). We can’t go above 25th fibonacci number in recursive fibonacci, because it is incredibly slow (especially in a language like Python).

10th Fibonacci Number	100th Fibonacci Number	10000th Fibonacci Number
Python Time: 0.0000183300 Cython Time: 0.0000019300 Cython was 9.497 times faster	Python Time: 0.0060053901 Cython Time: 0.0000407100 Cython was 147.516 times faster	Python Time: 1.5474137500 Cython Time: 0.0002609300 Cython was 5930.379 times faster

Overall: Cython Wins

Benchmark 3: Matrix Multiplication

Python Code:

import numpy as np

def matrix_multiply(a, b):
    return np.dot(a, b)

Cython Code:

import numpy as np
cimport numpy as np

cpdef np.ndarray[np.float64_t, ndim=2] matrix_multiply(np.ndarray[np.float64_t, ndim=2] a, np.ndarray[np.float64_t, ndim=2] b):
    return np.dot(a, b)

Benchmark#3 Result:

In this benchmark, both the Python and Cython codes use NumPy’s dot product function to perform matrix multiplication. Cython here actually performs worse than the native python code.

5×5 Matrix	100×100 Matrix	500×500 Matrix
Python Time: 0.0000171100 Cython Time: 0.0000182500 Cython was 0.938 times faster	Python Time: 0.0027420900 Cython Time: 0.0025540200 Cython was 1.074 times faster	Python Time: 0.0296036600 Cython Time: 0.0290222200 Cython was 1.020 times faster

The explanation for this result, would be that numpy is already a highly optimized library written in C/C++. Thus, it has similar performance to Cython, but without the overhead Cython has, causing it the native Python version to take the lead.

As the size of the matrix increases, we can see Cython overtake the native Python version a bit. This is because the overhead of using Cython is becoming smaller (in comparison to the time taken for the operation)

We can learn from this, that Cython will not help too much when using already optimized libraries such as numpy. If we were to remove numpy arrays here, and use normal python lists, perhaps the results would change significantly.

Overall: Draw

Benchmark 4: Prime Number Generation

Python Code:

def generate_primes(n):
    primes = []
    for num in range(2, n):
        if all(num % i != 0 for i in range(2, int(num**0.5) + 1)):
            primes.append(num)
    return primes

Cython Code:

cpdef generate_primes(int n):
    cdef list primes = []
    cdef int num
    cdef int i

    for num in range(2, n):
        for i in range(2, int(num**0.5) + 1):
            if num % i == 0:
                break
        else:
            primes.append(num)

    return primes

Benchmark#4 Result:

The Cython version of the prime number generation code benefits from the use of static typing. By declaring the variable types explicitly, the Cython code eliminates the dynamic type checking overhead of Python.

All primes till 100	All primes till 10000
Python Time: 0.0017563000 Cython Time: 0.0001182100 Cython was 14.857 times faster	Python Time: 0.3009807000 Cython Time: 0.0220213001 Cython was 13.668 times faster

This leads to significant speed improvements, especially when dealing with large prime numbers. The larger the prime number is, the more iterations are needed. Loops and Iterations benefit greatly from static type checking, as can be seen here.

Interestingly however, the rate of improvement does not improve as we increase the range of prime numbers.

Overall: Cython Wins

Benchmark 5: String Concatenation

Python Code:

def concatenate_strings(a, b):
    return a + b

Cython Code:

cpdef str concatenate_strings(str a, str b):
    return a + b

Benchmark#5 Result:

The string concatenation benchmark demonstrates a small difference between Python and Cython. Since both Python and Cython handle string operations in a similar manner, the performance gains are not very significant. In fact, performance seems to decrease as the length of strings increase.

100 length strings	1000 length strings
Python Time: 0.0000041500 Cython Time: 0.0000029400 Cython was 1.412 times faster	Python Time: 0.0000084700 Cython Time: 0.0000071500 Cython was 1.185 times faster

Overall: Cython Wins

Benchmark 6: Computing Column Means

Python Code:

import numpy as np

def compute_column_means(data):
    num_cols = len(data[0])
    means = np.average(data, axis=1)
    return means

Cython Code:

import numpy as np
cimport numpy as np

def compute_column_means(np.ndarray[np.float64_t, ndim=2] data):
    cdef int num_cols = data.shape[1]
    cdef np.ndarray[np.float64_t] means = np.zeros(num_cols)

    means = np.average(data, axis=1)
    return means

Benchmark#6 Result:

The above code was deliberately designed to be as optimized as possible using numpy and some of it’s highly optimized functions (written in C/C++).

rows = 100, cols = 10	rows = 1000, cols = 100
Python Time: 0.0002119100 Cython Time: 0.0002120200 Cython was 0.999 times faster	Python Time: 0.0008491300 Cython Time: 0.0008754800 Cython was 0.970 times faster

By looking at the results, we can observe that Cython is slower (overhead). This is the second benchmark where we have observed that highly optimizing our code renders any beenft by Cython null.

Overall: Draw

Benchmark 7: Computing Column Means (Unoptimized)

Python Code:

def compute_column_means(data):
    num_cols = len(data[0])
    means = [0.0] * num_cols

    for col in range(num_cols):
        total = 0.0
        for row in data:
            total += row[col]
        means[col] = total / len(data)

    return means

Cython Code:

cpdef list compute_column_means(data):
    cdef int num_cols = len(data[0])
    cdef list means = [0.0] * num_cols
    cdef double total
    cdef list row
    cdef int col

    for col in range(num_cols):
        total = 0.0
        for row in data:
            total += row[col]
        means[col] = total / len(data)

    return means

Benchmark#7 Result:

What we have done here, is created an unoptimized version of Benchmark#6 without the use of numpy. Now we will observe that Cython pulls ahead of regular Python by a significant margin.

row = 100, col = 10	row = 1000, col = 10
Python Time: 0.0006929700 Cython Time: 0.0005150800 Cython was 1.345 times faster	Python Time: 0.0864737500 Cython Time: 0.0592738300 Cython was 1.459 times faster

Overall: Cython Wins

Benchmark 8: Arithmetic Operations

Python Code:

import math

def compute_math():
    result = 0
    for i in range(10_000_000):
        result += math.sin(i) + math.cos(i)
    return result

Cython Code:

import math

def compute_math():
    cdef double result = 0
    for i in range(10_000_000):
        result += math.sin(i) + math.cos(i)
    return result

Benchmark#8 Result:

Here we have compiled a few common arithmetic and geometric operations, without the presence of loops. Some operations like division are actually more computationally expensive than you think. The goal of this benchmark was to check whether the use of Cython can speed these up (which it clearly can).

Mathematical Computations
Python Time: 0.0000170500 Cython Time: 0.0000124900 Cython was 1.365 times faster

Overall: Cython Wins

Benchmark 9: File Operations

Python Code:

def read_file(filename):
    with open(filename, 'r') as f:
        contents = f.read()
    return contents

Cython Code:

def read_file(filename):
    cdef str contents
    with open(filename, 'r') as f:
        contents = f.read()
    return contents

Benchmark#9 Result:

In the file operations benchmark, both Python and Cython exhibit similar performance since the file reading operation itself relies on low-level system calls. Therefore, the performance difference between the two is negligible. The overhead actually causes Cython to be a bit slower than native Python.

500 words text file	5000 words text file
Python Time: 0.0016529600 Cython Time: 0.0018463700 Cython was 0.895 times faster	Python Time: 0.0042956000 Cython Time: 0.0040852899 Cython was 1.051 times faster

Strangely enough, at 5000+ words the speed gap between Cython and Python decreased quite a bit, and Cython even outperformed Python sometimes (after running the tests multiple times). This might be because of the overhead becoming negligible.

Overall: Draw

Benchmark 10: Linear Searching

Python Code:

def linear_search(lst, target):
    for i, element in enumerate(lst):
        if element == target:
            return i
    return -1

Cython Code:

cpdef int linear_search(list lst, int target):
    cdef int n = len(lst)
    for i in range(n):
        if lst[i] == target:
            return i
    return -1

Benchmark#10 Result:

Here we have a program for linear searching. This involves no arithmetic operations, just some loops and a single comparison operations. This is the kind of code which really benefits from the use of Cython.

1000 numbers	10,000 numbers	100,000 numbers
Python Time: 0.0012754300 Cython Time: 0.0000313500 Cython was 40.684 times faster	Python Time: 0.0041481200 Cython Time: 0.0000133900 Cython was 309.792 times faster	Python Time: 0.0438640100 Cython Time: 0.0000172000 Cython was 2550.231 times faster

Overall: Cython Wins

Benchmark 11: Bubble Sorting

Python Code:

def bubbleSort(arr): 
    n = len(arr)
   
    for i in range(n-1): 
        for j in range(0, n-1): 
            if arr[j] > arr[j+1] :
                temp = arr[j]
                arr[j]=  arr[j+1]
                arr[j + 1] = temp
    return arr

Cython Code:

cpdef list bubbleSort(list arr): 
    cdef int n = len(arr)
    cdef int i, j
   
    for i in range(n-1): 
        for j in range(0, n-1): 
            if arr[j] > arr[j+1] :
                temp = arr[j]
                arr[j]=  arr[j+1]
                arr[j + 1] = temp
    return arr

Benchmark#11 Result:

Here we have the classic bubblesort algorithm. As we can see, Cython is able to complete these operations several times faster than regular Python. Another great place to be using Cython, especially because using libraries like numpy will not help much over here (since the majority of the work goes into iterations and comparisons).

100 numbers	1000 numbers
Python Time: 0.0053411199 Cython Time: 0.0014683400 Cython was 3.638 times faster	Python Time: 0.2775652600 Cython Time: 0.0572400799 Cython was 4.849 times faster

Sorting algorithms which use recursion will benefit even more from the use of Cython, as we saw in our very first benchmark.

Conclusion:

In conclusion, Cython offers significant speed improvements over CPython in certain scenarios. By leveraging static typing, explicit variable declaration, and efficient use of libraries like NumPy, Cython can eliminate the interpreter overhead and enhance performance.

However, it is important to note that not all benchmarks exhibit substantial speed gains with Cython. The choice between Python and Cython depends on the specific requirements of the application, the complexity of the code, and the need for performance optimization.

Code that is already heavily optimized or uses C/C++ libraries under the hood will not see significant performance boosts.

I would recommend programmers to focus more on actually optimizing their code first, before resorting to techniques like Cython. Cython does not negatively impact performance (overhead is negligible in large operations), and thus can be applied at the very end as the cherry on top.

This marks the end of the Cython vs CPython Speed Comparison. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content be asked in the comments section below.

Share on Facebook

5 thoughts on “Cython vs CPython – Comparing the Speed Difference”

Marcos DIone

June 12, 2023 at 7:56 am

Hi. Thanks for the post. What about a final graph showing the numbers?
Pingback: Python Cython Tutorial – Speeding up your Code 1000x
Simone Flavio

March 27, 2024 at 8:25 pm

I was hoping to find a comparison between Cython and Python/C API (CPython). Not a comparison between native Python vs Cython.

But a great post.
Greg Mackinnon

January 16, 2025 at 10:17 pm
I tried to replicate your results, and I think the benchmarks that you have reported for the Fibonacci and Prime number functions are misleading. Most of the performance benefit you are seeing is happening because Cython is overflowing the C “int” data type.

In the case of the Fibonacci iterative test, any value of “n” beyond 46 will cause an overflow. The function will continue to run, but will return incorrect values. It will run MUCH FASTER than pure python, but will return VERY INCORRECT data.

You can get up to the 92nd sequence using the “unsigned long long” data type (cython.ulonglong). After that… same problem.

If you stick to the pure python “int” data type, Cython and Python will return the same results to the millionth sequence and beyond, but you don’t get much performance benefit, and there are diminishing performance returns as the numbers get larger.

My benchmark results:

n = 1

Python: 0.0000027880 seconds

Cython: 0.0000014210 seconds

Cython was 1.962 times faster than Python.

n = 10

Python: 0.0000021490 seconds

Cython: 0.0000009590 seconds

Cython was 2.241 times faster than Python.

n = 100

Python: 0.0000052910 seconds

Cython: 0.0000032840 seconds

Cython was 1.611 times faster than Python.

n = 1000

Python: 0.0000516110 seconds

Cython: 0.0000305500 seconds

Cython was 1.689 times faster than Python.

n = 10000

Python: 0.0012270410 seconds

Cython: 0.0009932760 seconds

Cython was 1.235 times faster than Python.

n = 100000

Python: 0.0761558350 seconds

Cython: 0.0758864920 seconds

Cython was 1.004 times faster than Python.

n = 1000000

Python: 6.3353421940 seconds

Cython: 6.6032461620 seconds

Cython was 1.042 times slower than Python.

For what it is worth, I am using Cython in “Pure Python” mode.

To get to the 92nd sequence:
```
'''Fibonacci functions implemented in Cython.'''
import cython


def fib(n: cython.int) -> cython.ulonglong:
   """Print the Fibonacci series up to the 92-nd sequence."""
   a: cython.ulonglong = 0
   b: cython.ulonglong = 1
   _i: cython.int
   temp: cython.ulonglong

   for _i in range(1, n + 1):
       temp = a
       a = b
       b = temp + b
       # print(b, end=' ')

   return b
```
To go beyond that:
```
'''Fibonacci functions implemented in Cython.'''
import cython


def fib(n: cython.int) -> cython.ulonglong:
    """Print the Fibonacci series up to the n-th sequence."""
    a: int = 0
    b: int = 1
    _i: cython.int

    for _i in range(1, n + 1):
        temp: int = a
        a = b
        b = temp + b

    return b
```
Greg Mackinnon

January 16, 2025 at 10:25 pm

I should note the benchmarks below were for the second version of the script, that uses the native python int type. I do see that native C data types outperform native python, at least up to the 92nd sequence number. Here are the results when using the “ulonglong” data type, up to the 92nd Fibonacci number:

n = 1
Python: 0.0000023590 seconds
Cython: 0.0000014430 seconds
Cython was 1.635 times faster than Python.
n = 10
Python: 0.0000021600 seconds
Cython: 0.0000008260 seconds
Cython was 2.615 times faster than Python.
n = 20
Python: 0.0000023800 seconds
Cython: 0.0000008050 seconds
Cython was 2.957 times faster than Python.
n = 40
Python: 0.0000026560 seconds
Cython: 0.0000009920 seconds
Cython was 2.677 times faster than Python.
n = 80
Python: 0.0000049050 seconds
Cython: 0.0000007610 seconds
Cython was 6.445 times faster than Python.
n = 92
Python: 0.0000046840 seconds
Cython: 0.0000007490 seconds
Cython was 6.254 times faster than Python.

So Cython does it faster… as long as the numbers you are calculating are in the range of your selected data types. If your numbers are “really big”, we get into messy territory and need things like the GNU Multiple Precision Arithmetic Library (GMP). At that point, I would rather stick with pure python, or maybe switch to Golang.

Testing Environment

Benchmark 1: Fibonacci Sequence

Benchmark 2: Fibonacci Sequence (Iterative)

Benchmark 3: Matrix Multiplication

Benchmark 4: Prime Number Generation

Benchmark 5: String Concatenation

Benchmark 6: Computing Column Means

Benchmark 7: Computing Column Means (Unoptimized)

Benchmark 8: Arithmetic Operations

Benchmark 9: File Operations

Benchmark 10: Linear Searching

Benchmark 11: Bubble Sorting

Conclusion:

5 thoughts on “Cython vs CPython – Comparing the Speed Difference”

Leave a Comment Cancel reply