In this Cython vs CPython Article, we will be conducting a speed comparison using 10 different benchmarks, covering diverse scenarios and edge cases.
Python is a popular programming language known for its simplicity and readability. However, it is an interpreted language, which can sometimes result in slower execution speeds compared to compiled languages like C. To address this limitation, developers have introduced Cython, an optimizing static compiler for Python. Cython allows you to write Python-like code that can be compiled to C, offering potential performance gains.
- Testing Environment
- Benchmark 1: Fibonacci Sequence
- Benchmark 2: Fibonacci Sequence (Iterative)
- Benchmark 3: Matrix Multiplication
- Benchmark 4: Prime Number Generation
- Benchmark 5: String Concatenation
- Benchmark 6: Computing Column Means
- Benchmark 7: Computing Column Means (Unoptimized)
- Benchmark 8: Arithmetic Operations
- Benchmark 9: File Operations
- Benchmark 10: Linear Searching
- Benchmark 11: Bubble Sorting
- Conclusion:
Testing Environment
Before examining any benchmarks, you should be aware of what environment and what methods were used to conduct the testing. This helps in reproducibility, and in gaining a better understanding of the results (as results will vary from platform to platform).
Python version: 3.10
Hardware: Ryzen 7 5700U + 16GB RAM + SSD
Operating System: Windows 11
Benchmark Library: Timeit
Technique: We used the Python timeit library to apply several techniques to reduce randomness and improve our timing accuracy. The timeit library allows us to repeat the benchmarks “x” number of times, which we can then average out to reduce randomness (controlled by the repeat parameter). We can also consecutively perform the benchmarks “x” number of times and add all their times together to further reduce randomness (controlled by the number parameter).
Here is a code snippet of our testing code from one of our benchmarks.
import timeit
import program1_cy
from statistics import mean
python= mean(timeit.repeat('program1.fibonacci(25)',
setup='import program1',
number=10,
repeat=10))
cython= mean(timeit.repeat('program1_cy.fibonacci(25)',
setup='import program1_cy',
number=10,
repeat=10))
print(f"Python Time: {python:.10f}")
print(f"Cython Time: {cython:.10f}")
if python/cython > 1:
print(f"Cython was {python/cython:.3f} times faster")
else:
print(f"Cython was {python/cython:.3f} times slower")
Benchmark 1: Fibonacci Sequence
Python Code:
def fibonacci(n):
if n <= 1:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)
Cython Code:
def fibonacci(int n):
if n <= 1:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)
Benchmark#1 Result
Both the Python and Cython versions of the Fibonacci sequence use recursive calls to calculate the value. However, the Cython code, with the explicit declaration of the integer type, allows for more efficient execution and avoids the interpreter overhead of Python, resulting in faster execution.
10th Fibonacci Number | 25th Fibonacci Number |
---|---|
Python Time: 0.0001332700 Cython Time: 0.0000027100 Cython was 49.177 times faster | Python Time: 0.3970897400 Cython Time: 0.0028510200 Cython was 139.280 times faster |
Cython is also just alot better at handling recursion than Python, which is why we see such a big performance boost. Doing this on the non-recursion version will not yield such a large difference, as we will see in the next benchmark.
Overall: Cython Wins
Benchmark 2: Fibonacci Sequence (Iterative)
Python Code:
def fib(n):
n1, n2 = 0, 1
for i in range(1, n + 1):
temp = n1 + n2
n1 = n2
n2 = temp
return n2
Cython Code:
cpdef int fib(int n):
cdef int n1 = 0
cdef int n2 = 1
cdef int temp, i
for i in range(1, n + 1):
temp = n1 + n2
n1 = n2
n2 = temp
return n2
Benchmark#2 Result:
Here we can observe some massive speedups as well. Not as good as the recursive fibonacci (compare the 10th fibonacci benchmarks). We can’t go above 25th fibonacci number in recursive fibonacci, because it is incredibly slow (especially in a language like Python).
10th Fibonacci Number | 100th Fibonacci Number | 10000th Fibonacci Number |
---|---|---|
Python Time: 0.0000183300 Cython Time: 0.0000019300 Cython was 9.497 times faster | Python Time: 0.0060053901 Cython Time: 0.0000407100 Cython was 147.516 times faster | Python Time: 1.5474137500 Cython Time: 0.0002609300 Cython was 5930.379 times faster |
Overall: Cython Wins
Benchmark 3: Matrix Multiplication
Python Code:
import numpy as np
def matrix_multiply(a, b):
return np.dot(a, b)
Cython Code:
import numpy as np
cimport numpy as np
cpdef np.ndarray[np.float64_t, ndim=2] matrix_multiply(np.ndarray[np.float64_t, ndim=2] a, np.ndarray[np.float64_t, ndim=2] b):
return np.dot(a, b)
Benchmark#3 Result:
In this benchmark, both the Python and Cython codes use NumPy’s dot product function to perform matrix multiplication. Cython here actually performs worse than the native python code.
5×5 Matrix | 100×100 Matrix | 500×500 Matrix |
---|---|---|
Python Time: 0.0000171100 Cython Time: 0.0000182500 Cython was 0.938 times faster | Python Time: 0.0027420900 Cython Time: 0.0025540200 Cython was 1.074 times faster | Python Time: 0.0296036600 Cython Time: 0.0290222200 Cython was 1.020 times faster |
The explanation for this result, would be that numpy is already a highly optimized library written in C/C++. Thus, it has similar performance to Cython, but without the overhead Cython has, causing it the native Python version to take the lead.
As the size of the matrix increases, we can see Cython overtake the native Python version a bit. This is because the overhead of using Cython is becoming smaller (in comparison to the time taken for the operation)
We can learn from this, that Cython will not help too much when using already optimized libraries such as numpy. If we were to remove numpy arrays here, and use normal python lists, perhaps the results would change significantly.
Overall: Draw
Benchmark 4: Prime Number Generation
Python Code:
def generate_primes(n):
primes = []
for num in range(2, n):
if all(num % i != 0 for i in range(2, int(num**0.5) + 1)):
primes.append(num)
return primes
Cython Code:
cpdef generate_primes(int n):
cdef list primes = []
cdef int num
cdef int i
for num in range(2, n):
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
break
else:
primes.append(num)
return primes
Benchmark#4 Result:
The Cython version of the prime number generation code benefits from the use of static typing. By declaring the variable types explicitly, the Cython code eliminates the dynamic type checking overhead of Python.
All primes till 100 | All primes till 10000 |
---|---|
Python Time: 0.0017563000 Cython Time: 0.0001182100 Cython was 14.857 times faster | Python Time: 0.3009807000 Cython Time: 0.0220213001 Cython was 13.668 times faster |
This leads to significant speed improvements, especially when dealing with large prime numbers. The larger the prime number is, the more iterations are needed. Loops and Iterations benefit greatly from static type checking, as can be seen here.
Interestingly however, the rate of improvement does not improve as we increase the range of prime numbers.
Overall: Cython Wins
Benchmark 5: String Concatenation
Python Code:
def concatenate_strings(a, b):
return a + b
Cython Code:
cpdef str concatenate_strings(str a, str b):
return a + b
Benchmark#5 Result:
The string concatenation benchmark demonstrates a small difference between Python and Cython. Since both Python and Cython handle string operations in a similar manner, the performance gains are not very significant. In fact, performance seems to decrease as the length of strings increase.
100 length strings | 1000 length strings |
---|---|
Python Time: 0.0000041500 Cython Time: 0.0000029400 Cython was 1.412 times faster | Python Time: 0.0000084700 Cython Time: 0.0000071500 Cython was 1.185 times faster |
Overall: Cython Wins
Benchmark 6: Computing Column Means
Python Code:
import numpy as np
def compute_column_means(data):
num_cols = len(data[0])
means = np.average(data, axis=1)
return means
Cython Code:
import numpy as np
cimport numpy as np
def compute_column_means(np.ndarray[np.float64_t, ndim=2] data):
cdef int num_cols = data.shape[1]
cdef np.ndarray[np.float64_t] means = np.zeros(num_cols)
means = np.average(data, axis=1)
return means
Benchmark#6 Result:
The above code was deliberately designed to be as optimized as possible using numpy and some of it’s highly optimized functions (written in C/C++).
rows = 100, cols = 10 | rows = 1000, cols = 100 |
---|---|
Python Time: 0.0002119100 Cython Time: 0.0002120200 Cython was 0.999 times faster | Python Time: 0.0008491300 Cython Time: 0.0008754800 Cython was 0.970 times faster |
By looking at the results, we can observe that Cython is slower (overhead). This is the second benchmark where we have observed that highly optimizing our code renders any beenft by Cython null.
Overall: Draw
Benchmark 7: Computing Column Means (Unoptimized)
Python Code:
def compute_column_means(data):
num_cols = len(data[0])
means = [0.0] * num_cols
for col in range(num_cols):
total = 0.0
for row in data:
total += row[col]
means[col] = total / len(data)
return means
Cython Code:
cpdef list compute_column_means(data):
cdef int num_cols = len(data[0])
cdef list means = [0.0] * num_cols
cdef double total
cdef list row
cdef int col
for col in range(num_cols):
total = 0.0
for row in data:
total += row[col]
means[col] = total / len(data)
return means
Benchmark#7 Result:
What we have done here, is created an unoptimized version of Benchmark#6 without the use of numpy. Now we will observe that Cython pulls ahead of regular Python by a significant margin.
row = 100, col = 10 | row = 1000, col = 10 |
---|---|
Python Time: 0.0006929700 Cython Time: 0.0005150800 Cython was 1.345 times faster | Python Time: 0.0864737500 Cython Time: 0.0592738300 Cython was 1.459 times faster |
Overall: Cython Wins
Benchmark 8: Arithmetic Operations
Python Code:
import math
def compute_math():
result = 0
for i in range(10_000_000):
result += math.sin(i) + math.cos(i)
return result
Cython Code:
import math
def compute_math():
cdef double result = 0
for i in range(10_000_000):
result += math.sin(i) + math.cos(i)
return result
Benchmark#8 Result:
Here we have compiled a few common arithmetic and geometric operations, without the presence of loops. Some operations like division are actually more computationally expensive than you think. The goal of this benchmark was to check whether the use of Cython can speed these up (which it clearly can).
Mathematical Computations |
---|
Python Time: 0.0000170500 Cython Time: 0.0000124900 Cython was 1.365 times faster |
Overall: Cython Wins
Benchmark 9: File Operations
Python Code:
def read_file(filename):
with open(filename, 'r') as f:
contents = f.read()
return contents
Cython Code:
def read_file(filename):
cdef str contents
with open(filename, 'r') as f:
contents = f.read()
return contents
Benchmark#9 Result:
In the file operations benchmark, both Python and Cython exhibit similar performance since the file reading operation itself relies on low-level system calls. Therefore, the performance difference between the two is negligible. The overhead actually causes Cython to be a bit slower than native Python.
500 words text file | 5000 words text file |
---|---|
Python Time: 0.0016529600 Cython Time: 0.0018463700 Cython was 0.895 times faster | Python Time: 0.0042956000 Cython Time: 0.0040852899 Cython was 1.051 times faster |
Strangely enough, at 5000+ words the speed gap between Cython and Python decreased quite a bit, and Cython even outperformed Python sometimes (after running the tests multiple times). This might be because of the overhead becoming negligible.
Overall: Draw
Benchmark 10: Linear Searching
Python Code:
def linear_search(lst, target):
for i, element in enumerate(lst):
if element == target:
return i
return -1
Cython Code:
cpdef int linear_search(list lst, int target):
cdef int n = len(lst)
for i in range(n):
if lst[i] == target:
return i
return -1
Benchmark#10 Result:
Here we have a program for linear searching. This involves no arithmetic operations, just some loops and a single comparison operations. This is the kind of code which really benefits from the use of Cython.
1000 numbers | 10,000 numbers | 100,000 numbers |
---|---|---|
Python Time: 0.0012754300 Cython Time: 0.0000313500 Cython was 40.684 times faster | Python Time: 0.0041481200 Cython Time: 0.0000133900 Cython was 309.792 times faster | Python Time: 0.0438640100 Cython Time: 0.0000172000 Cython was 2550.231 times faster |
Overall: Cython Wins
Benchmark 11: Bubble Sorting
Python Code:
def bubbleSort(arr):
n = len(arr)
for i in range(n-1):
for j in range(0, n-1):
if arr[j] > arr[j+1] :
temp = arr[j]
arr[j]= arr[j+1]
arr[j + 1] = temp
return arr
Cython Code:
cpdef list bubbleSort(list arr):
cdef int n = len(arr)
cdef int i, j
for i in range(n-1):
for j in range(0, n-1):
if arr[j] > arr[j+1] :
temp = arr[j]
arr[j]= arr[j+1]
arr[j + 1] = temp
return arr
Benchmark#11 Result:
Here we have the classic bubblesort algorithm. As we can see, Cython is able to complete these operations several times faster than regular Python. Another great place to be using Cython, especially because using libraries like numpy will not help much over here (since the majority of the work goes into iterations and comparisons).
100 numbers | 1000 numbers |
---|---|
Python Time: 0.0053411199 Cython Time: 0.0014683400 Cython was 3.638 times faster | Python Time: 0.2775652600 Cython Time: 0.0572400799 Cython was 4.849 times faster |
Sorting algorithms which use recursion will benefit even more from the use of Cython, as we saw in our very first benchmark.
Conclusion:
In conclusion, Cython offers significant speed improvements over CPython in certain scenarios. By leveraging static typing, explicit variable declaration, and efficient use of libraries like NumPy, Cython can eliminate the interpreter overhead and enhance performance.
However, it is important to note that not all benchmarks exhibit substantial speed gains with Cython. The choice between Python and Cython depends on the specific requirements of the application, the complexity of the code, and the need for performance optimization.
Code that is already heavily optimized or uses C/C++ libraries under the hood will not see significant performance boosts.
I would recommend programmers to focus more on actually optimizing their code first, before resorting to techniques like Cython. Cython does not negatively impact performance (overhead is negligible in large operations), and thus can be applied at the very end as the cherry on top.
This marks the end of the Cython vs CPython Speed Comparison. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content be asked in the comments section below.