Cython vs CPython – Comparing the Speed Difference

In this Cython vs CPython Article, we will be conducting a speed comparison using 10 different benchmarks, covering diverse scenarios and edge cases.

Python is a popular programming language known for its simplicity and readability. However, it is an interpreted language, which can sometimes result in slower execution speeds compared to compiled languages like C. To address this limitation, developers have introduced Cython, an optimizing static compiler for Python. Cython allows you to write Python-like code that can be compiled to C, offering potential performance gains.

Testing Environment

Before examining any benchmarks, you should be aware of what environment and what methods were used to conduct the testing. This helps in reproducibility, and in gaining a better understanding of the results (as results will vary from platform to platform).

Python version: 3.10

Hardware: Ryzen 7 5700U + 16GB RAM + SSD

Operating System: Windows 11

Benchmark Library: Timeit

Technique: We used the Python timeit library to apply several techniques to reduce randomness and improve our timing accuracy. The timeit library allows us to repeat the benchmarks “x” number of times, which we can then average out to reduce randomness (controlled by the repeat parameter). We can also consecutively perform the benchmarks “x” number of times and add all their times together to further reduce randomness (controlled by the number parameter).

Here is a code snippet of our testing code from one of our benchmarks.

import timeit
import program1_cy
from statistics import mean 

python= mean(timeit.repeat('program1.fibonacci(25)',
             setup='import program1',

cython= mean(timeit.repeat('program1_cy.fibonacci(25)', 
             setup='import program1_cy',

print(f"Python Time: {python:.10f}")
print(f"Cython Time: {cython:.10f}")

if python/cython > 1:
    print(f"Cython was {python/cython:.3f} times faster")
    print(f"Cython was {python/cython:.3f} times slower")

Benchmark 1: Fibonacci Sequence

Python Code:

def fibonacci(n):
    if n <= 1:
        return n
        return fibonacci(n-1) + fibonacci(n-2)

Cython Code:

def fibonacci(int n):
    if n <= 1:
        return n
        return fibonacci(n-1) + fibonacci(n-2)

Benchmark#1 Result

Both the Python and Cython versions of the Fibonacci sequence use recursive calls to calculate the value. However, the Cython code, with the explicit declaration of the integer type, allows for more efficient execution and avoids the interpreter overhead of Python, resulting in faster execution.

10th Fibonacci Number25th Fibonacci Number
Python Time: 0.0001332700
Cython Time: 0.0000027100
Cython was 49.177 times faster
Python Time: 0.3970897400
Cython Time: 0.0028510200
Cython was 139.280 times faster

Cython is also just alot better at handling recursion than Python, which is why we see such a big performance boost. Doing this on the non-recursion version will not yield such a large difference, as we will see in the next benchmark.

Overall: Cython Wins

Benchmark 2: Fibonacci Sequence (Iterative)

Python Code:

def fib(n):
    n1, n2 = 0, 1

    for i in range(1, n + 1):
        temp = n1 + n2
        n1 = n2
        n2 = temp
    return n2

Cython Code:

cpdef int fib(int n):
    cdef int n1 = 0
    cdef int n2 = 1 
    cdef int temp, i

    for i in range(1, n + 1):
        temp = n1 + n2
        n1 = n2
        n2 = temp
    return n2

Benchmark#2 Result:

Here we can observe some massive speedups as well. Not as good as the recursive fibonacci (compare the 10th fibonacci benchmarks). We can’t go above 25th fibonacci number in recursive fibonacci, because it is incredibly slow (especially in a language like Python).

10th Fibonacci Number100th Fibonacci Number10000th Fibonacci Number
Python Time: 0.0000183300
Cython Time: 0.0000019300
Cython was 9.497 times faster
Python Time: 0.0060053901
Cython Time: 0.0000407100
Cython was 147.516 times faster
Python Time: 1.5474137500
Cython Time: 0.0002609300
Cython was 5930.379 times faster

Overall: Cython Wins

Benchmark 3: Matrix Multiplication

Python Code:

import numpy as np

def matrix_multiply(a, b):
    return, b)

Cython Code:

import numpy as np
cimport numpy as np

cpdef np.ndarray[np.float64_t, ndim=2] matrix_multiply(np.ndarray[np.float64_t, ndim=2] a, np.ndarray[np.float64_t, ndim=2] b):
    return, b)

Benchmark#3 Result:

In this benchmark, both the Python and Cython codes use NumPy’s dot product function to perform matrix multiplication. Cython here actually performs worse than the native python code.

5×5 Matrix100×100 Matrix500×500 Matrix
Python Time: 0.0000171100
Cython Time: 0.0000182500
Cython was 0.938 times faster
Python Time: 0.0027420900
Cython Time: 0.0025540200
Cython was 1.074 times faster
Python Time: 0.0296036600
Cython Time: 0.0290222200
Cython was 1.020 times faster

The explanation for this result, would be that numpy is already a highly optimized library written in C/C++. Thus, it has similar performance to Cython, but without the overhead Cython has, causing it the native Python version to take the lead.

As the size of the matrix increases, we can see Cython overtake the native Python version a bit. This is because the overhead of using Cython is becoming smaller (in comparison to the time taken for the operation)

We can learn from this, that Cython will not help too much when using already optimized libraries such as numpy. If we were to remove numpy arrays here, and use normal python lists, perhaps the results would change significantly.

Overall: Draw

Benchmark 4: Prime Number Generation

Python Code:

def generate_primes(n):
    primes = []
    for num in range(2, n):
        if all(num % i != 0 for i in range(2, int(num**0.5) + 1)):
    return primes

Cython Code:

cpdef generate_primes(int n):
    cdef list primes = []
    cdef int num
    cdef int i

    for num in range(2, n):
        for i in range(2, int(num**0.5) + 1):
            if num % i == 0:

    return primes

Benchmark#4 Result:

The Cython version of the prime number generation code benefits from the use of static typing. By declaring the variable types explicitly, the Cython code eliminates the dynamic type checking overhead of Python.

All primes till 100All primes till 10000
Python Time: 0.0017563000
Cython Time: 0.0001182100
Cython was 14.857 times faster
Python Time: 0.3009807000
Cython Time: 0.0220213001
Cython was 13.668 times faster

This leads to significant speed improvements, especially when dealing with large prime numbers. The larger the prime number is, the more iterations are needed. Loops and Iterations benefit greatly from static type checking, as can be seen here.

Interestingly however, the rate of improvement does not improve as we increase the range of prime numbers.

Overall: Cython Wins

Benchmark 5: String Concatenation

Python Code:

def concatenate_strings(a, b):
    return a + b

Cython Code:

cpdef str concatenate_strings(str a, str b):
    return a + b

Benchmark#5 Result:

The string concatenation benchmark demonstrates a small difference between Python and Cython. Since both Python and Cython handle string operations in a similar manner, the performance gains are not very significant. In fact, performance seems to decrease as the length of strings increase.

100 length strings1000 length strings
Python Time: 0.0000041500
Cython Time: 0.0000029400
Cython was 1.412 times faster
Python Time: 0.0000084700
Cython Time: 0.0000071500
Cython was 1.185 times faster

Overall: Cython Wins

Benchmark 6: Computing Column Means

Python Code:

import numpy as np

def compute_column_means(data):
    num_cols = len(data[0])
    means = np.average(data, axis=1)
    return means

Cython Code:

import numpy as np
cimport numpy as np

def compute_column_means(np.ndarray[np.float64_t, ndim=2] data):
    cdef int num_cols = data.shape[1]
    cdef np.ndarray[np.float64_t] means = np.zeros(num_cols)

    means = np.average(data, axis=1)
    return means

Benchmark#6 Result:

The above code was deliberately designed to be as optimized as possible using numpy and some of it’s highly optimized functions (written in C/C++).

rows = 100, cols = 10rows = 1000, cols = 100
Python Time: 0.0002119100
Cython Time: 0.0002120200
Cython was 0.999 times faster
Python Time: 0.0008491300
Cython Time: 0.0008754800
Cython was 0.970 times faster

By looking at the results, we can observe that Cython is slower (overhead). This is the second benchmark where we have observed that highly optimizing our code renders any beenft by Cython null.

Overall: Draw

Benchmark 7: Computing Column Means (Unoptimized)

Python Code:

def compute_column_means(data):
    num_cols = len(data[0])
    means = [0.0] * num_cols

    for col in range(num_cols):
        total = 0.0
        for row in data:
            total += row[col]
        means[col] = total / len(data)

    return means

Cython Code:

cpdef list compute_column_means(data):
    cdef int num_cols = len(data[0])
    cdef list means = [0.0] * num_cols
    cdef double total
    cdef list row
    cdef int col

    for col in range(num_cols):
        total = 0.0
        for row in data:
            total += row[col]
        means[col] = total / len(data)

    return means

Benchmark#7 Result:

What we have done here, is created an unoptimized version of Benchmark#6 without the use of numpy. Now we will observe that Cython pulls ahead of regular Python by a significant margin.

row = 100, col = 10row = 1000, col = 10
Python Time: 0.0006929700
Cython Time: 0.0005150800
Cython was 1.345 times faster
Python Time: 0.0864737500
Cython Time: 0.0592738300
Cython was 1.459 times faster

Overall: Cython Wins

Benchmark 8: Arithmetic Operations

Python Code:

import math

def compute_math():
    result = 0
    for i in range(10_000_000):
        result += math.sin(i) + math.cos(i)
    return result

Cython Code:

import math

def compute_math():
    cdef double result = 0
    for i in range(10_000_000):
        result += math.sin(i) + math.cos(i)
    return result

Benchmark#8 Result:

Here we have compiled a few common arithmetic and geometric operations, without the presence of loops. Some operations like division are actually more computationally expensive than you think. The goal of this benchmark was to check whether the use of Cython can speed these up (which it clearly can).

Mathematical Computations
Python Time: 0.0000170500
Cython Time: 0.0000124900
Cython was 1.365 times faster

Overall: Cython Wins

Benchmark 9: File Operations

Python Code:

def read_file(filename):
    with open(filename, 'r') as f:
        contents =
    return contents

Cython Code:

def read_file(filename):
    cdef str contents
    with open(filename, 'r') as f:
        contents =
    return contents

Benchmark#9 Result:

In the file operations benchmark, both Python and Cython exhibit similar performance since the file reading operation itself relies on low-level system calls. Therefore, the performance difference between the two is negligible. The overhead actually causes Cython to be a bit slower than native Python.

500 words text file5000 words text file
Python Time: 0.0016529600
Cython Time: 0.0018463700
Cython was 0.895 times faster
Python Time: 0.0042956000
Cython Time: 0.0040852899
Cython was 1.051 times faster

Strangely enough, at 5000+ words the speed gap between Cython and Python decreased quite a bit, and Cython even outperformed Python sometimes (after running the tests multiple times). This might be because of the overhead becoming negligible.

Overall: Draw

Benchmark 10: Linear Searching

Python Code:

def linear_search(lst, target):
    for i, element in enumerate(lst):
        if element == target:
            return i
    return -1

Cython Code:

cpdef int linear_search(list lst, int target):
    cdef int n = len(lst)
    for i in range(n):
        if lst[i] == target:
            return i
    return -1

Benchmark#10 Result:

Here we have a program for linear searching. This involves no arithmetic operations, just some loops and a single comparison operations. This is the kind of code which really benefits from the use of Cython.

1000 numbers10,000 numbers100,000 numbers
Python Time: 0.0012754300
Cython Time: 0.0000313500
Cython was 40.684 times faster
Python Time: 0.0041481200
Cython Time: 0.0000133900
Cython was 309.792 times faster
Python Time: 0.0438640100
Cython Time: 0.0000172000
Cython was 2550.231 times faster

Overall: Cython Wins

Benchmark 11: Bubble Sorting

Python Code:

def bubbleSort(arr): 
    n = len(arr)
    for i in range(n-1): 
        for j in range(0, n-1): 
            if arr[j] > arr[j+1] :
                temp = arr[j]
                arr[j]=  arr[j+1]
                arr[j + 1] = temp
    return arr

Cython Code:

cpdef list bubbleSort(list arr): 
    cdef int n = len(arr)
    cdef int i, j
    for i in range(n-1): 
        for j in range(0, n-1): 
            if arr[j] > arr[j+1] :
                temp = arr[j]
                arr[j]=  arr[j+1]
                arr[j + 1] = temp
    return arr

Benchmark#11 Result:

Here we have the classic bubblesort algorithm. As we can see, Cython is able to complete these operations several times faster than regular Python. Another great place to be using Cython, especially because using libraries like numpy will not help much over here (since the majority of the work goes into iterations and comparisons).

100 numbers1000 numbers
Python Time: 0.0053411199
Cython Time: 0.0014683400
Cython was 3.638 times faster
Python Time: 0.2775652600
Cython Time: 0.0572400799
Cython was 4.849 times faster

Sorting algorithms which use recursion will benefit even more from the use of Cython, as we saw in our very first benchmark.


In conclusion, Cython offers significant speed improvements over CPython in certain scenarios. By leveraging static typing, explicit variable declaration, and efficient use of libraries like NumPy, Cython can eliminate the interpreter overhead and enhance performance.

However, it is important to note that not all benchmarks exhibit substantial speed gains with Cython. The choice between Python and Cython depends on the specific requirements of the application, the complexity of the code, and the need for performance optimization.

Code that is already heavily optimized or uses C/C++ libraries under the hood will not see significant performance boosts.

I would recommend programmers to focus more on actually optimizing their code first, before resorting to techniques like Cython. Cython does not negatively impact performance (overhead is negligible in large operations), and thus can be applied at the very end as the cherry on top.

This marks the end of the Cython vs CPython Speed Comparison. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content be asked in the comments section below.

5 thoughts on “Cython vs CPython – Comparing the Speed Difference”

  1. I was hoping to find a comparison between Cython and Python/C API (CPython). Not a comparison between native Python vs Cython.

    But a great post.

  2. I tried to replicate your results, and I think the benchmarks that you have reported for the Fibonacci and Prime number functions are misleading. Most of the performance benefit you are seeing is happening because Cython is overflowing the C “int” data type.

    In the case of the Fibonacci iterative test, any value of “n” beyond 46 will cause an overflow. The function will continue to run, but will return incorrect values. It will run MUCH FASTER than pure python, but will return VERY INCORRECT data.

    You can get up to the 92nd sequence using the “unsigned long long” data type (cython.ulonglong). After that… same problem.

    If you stick to the pure python “int” data type, Cython and Python will return the same results to the millionth sequence and beyond, but you don’t get much performance benefit, and there are diminishing performance returns as the numbers get larger.

    My benchmark results:

    n = 1

    Python: 0.0000027880 seconds

    Cython: 0.0000014210 seconds

    Cython was 1.962 times faster than Python.

    n = 10

    Python: 0.0000021490 seconds

    Cython: 0.0000009590 seconds

    Cython was 2.241 times faster than Python.

    n = 100

    Python: 0.0000052910 seconds

    Cython: 0.0000032840 seconds

    Cython was 1.611 times faster than Python.

    n = 1000

    Python: 0.0000516110 seconds

    Cython: 0.0000305500 seconds

    Cython was 1.689 times faster than Python.

    n = 10000

    Python: 0.0012270410 seconds

    Cython: 0.0009932760 seconds

    Cython was 1.235 times faster than Python.

    n = 100000

    Python: 0.0761558350 seconds

    Cython: 0.0758864920 seconds

    Cython was 1.004 times faster than Python.

    n = 1000000

    Python: 6.3353421940 seconds

    Cython: 6.6032461620 seconds

    Cython was 1.042 times slower than Python.

    For what it is worth, I am using Cython in “Pure Python” mode.

    To get to the 92nd sequence:

    '''Fibonacci functions implemented in Cython.'''
    import cython
    def fib(n: -> cython.ulonglong:
       """Print the Fibonacci series up to the 92-nd sequence."""
       a: cython.ulonglong = 0
       b: cython.ulonglong = 1
       temp: cython.ulonglong
       for _i in range(1, n + 1):
           temp = a
           a = b
           b = temp + b
           # print(b, end=' ')
       return b

    To go beyond that:

    '''Fibonacci functions implemented in Cython.'''
    import cython
    def fib(n: -> cython.ulonglong:
        """Print the Fibonacci series up to the n-th sequence."""
        a: int = 0
        b: int = 1
        for _i in range(1, n + 1):
            temp: int = a
            a = b
            b = temp + b
        return b
  3. I should note the benchmarks below were for the second version of the script, that uses the native python int type. I do see that native C data types outperform native python, at least up to the 92nd sequence number. Here are the results when using the “ulonglong” data type, up to the 92nd Fibonacci number:

    n = 1
    Python: 0.0000023590 seconds
    Cython: 0.0000014430 seconds
    Cython was 1.635 times faster than Python.
    n = 10
    Python: 0.0000021600 seconds
    Cython: 0.0000008260 seconds
    Cython was 2.615 times faster than Python.
    n = 20
    Python: 0.0000023800 seconds
    Cython: 0.0000008050 seconds
    Cython was 2.957 times faster than Python.
    n = 40
    Python: 0.0000026560 seconds
    Cython: 0.0000009920 seconds
    Cython was 2.677 times faster than Python.
    n = 80
    Python: 0.0000049050 seconds
    Cython: 0.0000007610 seconds
    Cython was 6.445 times faster than Python.
    n = 92
    Python: 0.0000046840 seconds
    Cython: 0.0000007490 seconds
    Cython was 6.254 times faster than Python.

    So Cython does it faster… as long as the numbers you are calculating are in the range of your selected data types. If your numbers are “really big”, we get into messy territory and need things like the GNU Multiple Precision Arithmetic Library (GMP). At that point, I would rather stick with pure python, or maybe switch to Golang.


Leave a Comment