Python Ctypes is a foreign function library that allows Python code to call C or C++ functions directly. This can be useful for improving the performance of Python code, particularly when working with large data sets or computationally intensive tasks. In this Article we will be doing a Performance Comparison between Native Python Code and Python + Ctypes Code.
Testing Environment
In order to make this comparison as fair and accurate as possible, we will be using 3 different test cases.
- For-loops (finding sum)
- Recursion-heavy Code
- Quicksort Algorithm
Each test case will be benchmarked appropriately using the timeit library. This library produces accurate results by running the same code many times and averaging the time taken. We will be running each test case about 100 times.
These tests were performed on a fairly powerful and modern laptop (Ryzen 7 5700U, 8 cores, 16 threads). Execution will vary based on the system being used. All driver code is provided so you can try out the code yourselves too.
We have three types of files being used. “Driver” files with the benchmarking code, “Python” files with the python functions and “Ctypes” files with the C-library code.
We use the following notation in our code, “Python_1” to indicate we are accessing the Python file for the first test case. “Ctypes_1” similarly has the same meaning. “Ctypes_2” means the C-code for the second test case.
Now let’s begin this Ctypes and Python Comparison!
Ctypes and Python Comparision#1
Here we have our Python code for calculating the sum of numbers from 1 to “n”.
def calc_sum(n):
sum = 0
for i in range(n):
sum += i
return sum
Here is the same in the C programming language.
int calc_sum(int n) {
int sum = 0;
for (int i = 0; i < n; i++)
sum += i;
return sum;
}
And here is our driver code (you can skip this part if you aren’t interested and just go to results)
import timeit
from python_1 import calc_sum
code = '''\
calc_sum(100)
'''
setup='''\
from __main__ import calc_sum'''
times = timeit.repeat(setup = setup,
stmt = code,
number = 1,
repeat = 100)
print('Python: calculate sum: {}'.format(sum(times) / 100))
code = '''\
path = os.getcwd()
clibrary = ctypes.CDLL(os.path.join(path, 'ctypes_1.so'))
clibrary.calc_sum(100)
'''
setup='''\
import os
import ctypes'''
times = timeit.repeat(setup = setup,
stmt = code,
number = 1,
repeat = 100)
print('Ctypes: calculate sum: {}'.format(sum(times) / 100))
Results
First we have the results for n = 100
. Don’t let the format of the numbers fool you. Python is actually faster here (look at the exponent which has a higher negative power). Remember, less time == faster.
# Calculate sum of all numbers till 100
Python: calculate sum: 6.319014355540275e-06
Ctypes: calculate sum: 2.1877996623516082e-05
So why is Python faster here? Well that’s because of some basic overhead incurred by Ctypes when calling the C-function.
We can verify that some basic overhead exists by trying out another Performance Comparison between Python and Ctypes on higher numbers.
# Calculate sum of all numbers till 1000
Python: calculate sum: 7.438200525939465e-05
Ctypes: calculate sum: 5.2979998290538785e-05
At n = 1000
we can see that the performance gains by using Ctypes are now greater than the overhead involved. Ctypes is about 30% faster than Python here.
Let’s try some higher numbers to make this difference more obvious. At higher number of operations, the overhead caused by Ctypes becomes more and more diluted and more negligible.
Here are the results for 100000. Here we can see a much larger performance increase. Ctypes is almost 15x faster than Python here.
# Calculate sum of all numbers till 100000
Python: calculate sum: 0.007767976995091885
Ctypes: calculate sum: 0.00047917700838297607
Python is famously quite slow when it comes to loops, and this clearly shows from the above results. The difference will only grow larger as you increase the number of iterations.
Ctypes and Python Comparision#2
Next up we have a slightly odd scenario. We will be comparing Ctypes and Python on the recursive implementation of the Fibonacci series. (Yes, i know it’s inefficient, this is just for testing purposes).
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
Here is the same in the C language.
int fib(int n) {
if (n <= 1)
return n;
return fib(n - 1) + fib(n - 2);
}
Here is our driver code.
import timeit
from python_2 import fib
code = '''\
fib(40)
'''
setup='''\
from __main__ import fib'''
times = timeit.repeat(setup = setup,
stmt = code,
number = 1,
repeat = 100)
print('Python: calculate sum: {}'.format(sum(times) / 100))
code = '''\
path = os.getcwd()
clibrary = ctypes.CDLL(os.path.join(path, 'ctypes_2.so'))
clibrary.fib(40)
'''
setup='''\
import os
import ctypes'''
times = timeit.repeat(setup = setup,
stmt = code,
number = 1,
repeat = 100)
print('Ctypes: calculate sum: {}'.format(sum(times) / 100))
Results
# Calculate 10th Fibonacci number
Python: calculate sum: 1.4245007187128066e-05
Ctypes: calculate sum: 2.6811982970684766e-05
From our first result we can see that Python is faster than Ctypes. As we observed earlier, we should try a higher number.
Calculating the 30th Fibonacci number shows that Ctypes performs much better than Python in recursive situations. Ctypes here is a whopping 20x times faster than Python.
# Calculate 30th Fibonacci number
Python: calculate sum: 0.20831336600938813
Ctypes: calculate sum: 0.010219879988580942
Calculating the 40th Fibonacci number widens the gap even further to almost 100x times!
# Calculate 40th Fibonacci number
Python: calculate sum: 44.40553419990465
Ctypes: calculate sum: 0.5199361999984831
From this we can safely conclude that Python is not very efficient when it comes to recursion.
Notes: I also tried this for the 50th Fibonacci number, but gave up after 30 mins of waiting. Also if you try this for memoized versions of the Fibonacci code, it will give very similar results to the First Test Case we discussed.
Ctypes and Python Comparision#3
Finally we have the Quick Sort algorithm on which we will conduct our final test. It also makes use of Recursion, but not as extensive as Fibonacci (which was O(2^n)
) . We also how loops here along with quite a bit of “regular” code, so this will be a good test.
Here is our Python code as usual.
def partition(array, low, high):
pivot = array[high]
i = low - 1
for j in range(low, high):
if array[j] <= pivot:
i = i + 1
(array[i], array[j]) = (array[j], array[i])
(array[i + 1], array[high]) = (array[high], array[i + 1])
return i + 1
def quick_sort(array, low, high):
if low < high:
pi = partition(array, low, high)
quick_sort(array, low, pi - 1)
quick_sort(array, pi + 1, high)
And here is the same in the C programming.
void swap(int *a, int *b) {
int t = *a;
*a = *b;
*b = t;
}
int partition(int array[], int low, int high) {
int pivot = array[high];
int i = (low - 1);
for (int j = low; j < high; j++) {
if (array[j] <= pivot) {
i++;
swap(&array[i], &array[j]);
}
}
swap(&array[i + 1], &array[high]);
return (i + 1);
}
void quick_sort(int array[], int low, int high) {
if (low < high) {
int pi = partition(array, low, high);
quick_sort(array, low, pi - 1);
quick_sort(array, pi + 1, high);
}
}
Finally our driver code.
import timeit
from python_3 import quick_sort
code = '''\
new_array = []
for i in range(100000):
new_array.append(randint(0, 1000))
quick_sort(new_array, 0, 99999)
'''
setup='''\
from __main__ import quick_sort
from random import randint'''
times = timeit.repeat(setup = setup,
stmt = code,
number = 1,
repeat = 100)
print('Python: calculate sum: {}'.format(sum(times) / 100))
code = '''\
path = os.getcwd()
clibrary = ctypes.CDLL(os.path.join(path, 'ctypes_3.so'))
array = (ctypes.c_int * 100000)()
for i in range(100000):
array[i] = randint(0, 1000)
clibrary.quick_sort(array, 0, 99999)
'''
setup='''\
import os
import ctypes
from random import randint'''
times = timeit.repeat(setup = setup,
stmt = code,
number = 1,
repeat = 100)
print('Ctypes: calculate sum: {}'.format(sum(times) / 100))
Results
Here are the results for an array of size 100. For once, Ctypes actually outperformed Python in an initial test. It’s not by a very big margin though.
# Quick sort on array of size 100
Python: calculate sum: 0.0001310600037686527
Ctypes: calculate sum: 0.00010179100325331093
Let’s try and observe how much the margin between Python and Ctypes grows as we increase the size of the array.
# Quick sort on array of size 1000
Python: calculate sum: 0.0016809910046868025
Ctypes: calculate sum: 0.0007084359927102923
At size = 1000, we can observe a 2.5x speed of Ctypes over Python.
# Quick sort on array of size 10000
Python: calculate sum: 0.02391906899632886
Ctypes: calculate sum: 0.007066867975518107
At size = 10000, we can observe a 3.5x times speed up over Python.
# Quick sort on array of size 100000
Python: calculate sum: 1.3064013520046138
Ctypes: calculate sum: 0.21337365100858732
Finally for an array size of 100000 we begin seeing some big gains of Ctypes over Python. We now have a 6.5x times speed up!
The difference is not as great as we observed in the Fibonacci code, or even the Array sum code, but it is still quite substantial. And it will only grow further as we increase the size of the Array.
You can suggest some additional test-cases in the comments section below and we might add them! Let us know what you think.
This marks the end of the Ctypes vs Python – Performance Comparison Article. Any suggestions or contributions for CodersLegacy are more than welcome. Questions about the tutorial content can be asked in the comments section below.