Numba Tutorial: Accelerating Python Code with JIT Compilation

In today’s data-driven world, performance optimization plays a crucial role in computational tasks. Python, being an interpreted language, may not always provide the desired speed for computationally intensive operations. However, Numba, a powerful library, comes to the rescue by offering just-in-time (JIT) compilation capabilities, significantly accelerating Python code execution. This tutorial aims to provide a comprehensive understanding of Numba functionality, installation, usage, and common challenges faced when utilizing it.

Table Of Contents

JIT Compilers: How They Work
Tutorial: What is Numba?
Installing and Importing Numba
Numba Tutorial: Understanding the Inner Workings of Numba
Numba Tutorial: Explicit Type declaration
Numba Tutorial: Parallelization
- When to use Numba, and when Not to:
Common Challenges and Solutions with Numba

JIT Compilers: How They Work

Just-in-Time (JIT) compilation is a technique used by certain programming language implementations to improve the runtime performance of code. While traditional compilers translate the entire source code into machine code before execution, JIT compilers take a different approach.

JIT compilers work by combining aspects of both interpretation and compilation. When a program is executed, the JIT compiler analyzes the code at runtime, identifies frequently executed portions (hotspots), and dynamically compiles them into machine code. This process occurs just before the code is executed, hence the term “just-in-time.”

The JIT compilation process involves several steps:

Parsing and Lexical Analysis: The source code is parsed and converted into a syntax tree or an intermediate representation (IR).
IR Optimization: The IR is optimized to eliminate redundancies, perform constant folding, and apply other optimizations.
Just-in-Time Compilation: The optimized IR is translated into machine code specific to the target platform.
Code Execution: The compiled machine code is executed, providing significant performance improvements over interpreted code.

JIT compilers offer the advantage of dynamically optimizing code based on runtime information, enabling them to adapt to specific execution patterns and make tailored optimizations.

Tutorial: What is Numba?

Numba is a just-in-time (JIT) compiler specifically designed for Python. It aims to enhance the performance of Python code by compiling it to efficient machine code, thus eliminating the overhead associated with Python’s interpreted execution.

Numba achieves this by leveraging the LLVM compiler infrastructure. When a Python function decorated with @jit is encountered, Numba analyzes the function’s bytecode and performs type inference to determine the optimal data types to use. Numba then generates optimized machine code for the identified hotspots, replacing the interpreted execution with compiled execution.

Numba offers two compilation modes:

Nopython Mode, is the default compilation mode in Numba and aims to achieve the highest performance gains. When a function is compiled in nopython mode, Numba tries to generate machine code without relying on the Python runtime. In this mode, the function and its dependencies must be written in a subset of Python that can be fully compiled to machine code.
Object mode, on the other hand, provides more flexibility at the cost of potential performance optimizations. In this mode, Numba retains the full Python runtime semantics and falls back to using Python objects and runtime calls when necessary. Object mode is useful when working with code that cannot be fully compiled to machine code due to dynamic or unsupported Python features.

Unlike some other JIT compilers, Numba seamlessly integrates with NumPy, allowing efficient execution of NumPy array operations. By combining the power of NumPy’s vectorized operations with Numba’s JIT compilation, developers can achieve significant speedups in numerical computations.

Furthermore, Numba supports parallel execution through the prange function, enabling developers to take advantage of multi-threading. By parallelizing loops or array operations, Numba distributes the workload across multiple threads, further enhancing performance in scenarios where parallelization is applicable.

Installing and Importing Numba

To begin with, let’s install Numba on your system. Numba can be easily installed using pip or conda package managers. Open your terminal and run the appropriate command based on your preferred package manager. For pip, use:

pip install numba

For conda, use:

conda install numba

Once installed, import the necessary modules and functions for Numba in your Python script or notebook. Importing Numba is as simple as:

import numba
from numba import jit

Numba Tutorial: Understanding the Inner Workings of Numba

Numba’s core functionality revolves around the @jit decorator, which stands for “just-in-time.” By applying the @jit decorator to Python functions, Numba compiles them to machine code, resulting in performance improvements. Here’s an example:

import numba
from numba import jit

# Use nopython=True where possible for best performance
@jit(nopython=True)
def square(x):
    return x ** 2

In this case, the square function will be compiled by Numba for optimized execution. If we run some benchmarks for comparing Numba vs regular Python, we get the following results.

Time taken with Numba JIT: 0.9326467 seconds
Time taken without Numba JIT: 0.0000041 seconds

As you can see here, Numba performs much worse than the regular Python code. Why is this?

We mentioned earlier, that Numba does an initial compile when executing an function. This initial compile adds some overhead to the execution for a single function call using Numba. Is there a way of checking how long it takes for compiling the Numba function? Yes there is!

All we need to do is run a second execution, and measure the time of the second execution. After performing some benchmarks, we were left with the following results.

Numba JIT (compilation + execution) = 0.5045291
Numba JIT (only execution) = 0.0000022
Normal Python = 0.0000033

As we can see here, the execution time of the compiled Numba JIT is a tiny fraction of the compilation. It is almost 50% faster than the native Python code, even though the code is extremely simple with little room optimization.

We will present a solution to this “compilation time” problem in the next section.

For reference, here is the benchmarking code.

@jit(nopython=True)
def square(x):
    return x ** 2

def normal_square(x):
    return x ** 2

x = 100

start = time.perf_counter()
square(x)
end = time.perf_counter()
print(f"Numba JIT (compilation + execution) = {end - start:.7f}")

start = time.perf_counter()
square(x)
end = time.perf_counter()
print(f"Numba JIT (only execution) = {end - start:.7f}")

start = time.perf_counter()
normal_square(x)
end = time.perf_counter()
print(f"Normal Python = {end - start:.7f}")

Numba Tutorial: Explicit Type declaration

In Python, variables are dynamically typed, meaning their types can change at runtime. While this flexibility is convenient, it can also result in performance overhead. Numba mitigates this by allowing developers to explicitly specify the types of variables, enabling the compiler to generate specialized machine code tailored to those types.

Explicitly typing variables in Numba offers several advantages (Read carefully)

Improved Performance: When Numba has type information, it can generate highly optimized machine code that bypasses Python’s dynamic typing system. This leads to significant performance improvements, especially in computationally intensive tasks.
Reduced Overhead: With type information, Numba eliminates the need for runtime type checks and conversions. In simpler terms, this means that the compilation to machine code will occur during the very beginning of the program execution, not when the function is first called.
Code Safety: Explicit typing helps catch potential errors at compile time, allowing you to identify and fix type-related issues early in the development process.

Here is a code snippet from the documentation of Numba, to which we will apply “explicit type hinting” and compare the differences.

@jit(nopython=True)
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

Benchmarking this code, produces the following results:

Numba JIT (compilation + execution) = 0.6488045
Numba JIT (only execution) = 0.0000237
Normal Python = 0.0001621

Now let’s add some type information.

@jit('float64[:,:](float64[:,:])', nopython=True)
def go_fast(a): 
    trace = 0.0
    for i in range(a.shape[0]):   
        trace += np.tanh(a[i, i]) 
    return a + trace

Benchmarking this updated code, produces the following results:

Numba JIT (compilation + execution) = 0.0002219
Numba JIT (only execution) = 0.0000535
Normal Python = 0.0010343

As we can see here, the compilation + execution and only execution times are very similar to each other now. There is some randomness in the results, which is causing their values to be different. But the main point here is that the compilation time has disappeared (leaving only the execution time).

So where did the compilation time go?

The compilation was handled in the very beginning of the program, before this function was ever called. Since Numba had the type information available, it was able to generate the machine code before hand. Otherwise it would have to wait until the function call was made, determine the types being used, and then generate the machine code based on that information.

Numba Tutorial: Parallelization

Numba also excels at optimizing loops and array operations. By using Numba’s capabilities to parallelize execution, you can further boost performance by leveraging multiple threads. Here’s an example showcasing the parallelization with Numba:

from numba import jit
import numpy as np
from numba import prange, int64

@jit(parallel=True, nopython=True)
def parallel_sum(a):
    result = 0
    for i in prange(len(a)):
        result += a[i]
    return result

x = np.random.randint(0, 1001, size=1000, dtype=np.int64)

By benchmarking the above code, we get the following timings.

Numba JIT (compilation + execution) = 0.8292531
Numba JIT (only execution) = 0.0000407
Normal Python = 0.0003943

As we can see from these benchmarks, Numba proved to be 10x times faster than Python. The only downside here is the initial compilation time.

Another important thing to keep in mind is the overhead introduced by parallel code. For example, in the above code, for arrays of length 1000 or less, parallel=False yields better performance. For larger sized arrays (10000+) parallel=True performs better.

With an array of size 1,000,000 we observed the following benchmarks:

Numba JIT (parallel + execution only) = 0.0002723
Numba JIT (execution only) = 0.0005995

When to use Numba, and when Not to:

Numba is typically used when there is a need for accelerating the execution of numerical computations in Python. It is especially useful when working with large arrays, mathematical operations, and loops. Numba achieves performance improvements by just-in-time (JIT) compiling the Python code, resulting in efficient execution on the CPU or GPU.

Don’t forget our lesson from earlier where we learned not to use Numba on a small operations, as the overhead makes it slower.

However, Numba’s focus is primarily on computation, and it does not provide significant optimizations for I/O operations such as reading or writing files. Therefore, if your code involves extensive I/O operations, other libraries or approaches may be more suitable.

Common Challenges and Solutions with Numba

While Numba provides substantial performance gains, there are a few challenges that users may encounter. One common challenge is compatibility issues with CPython, the reference implementation of Python. Numba’s just-in-time compilation relies on low-level LLVM infrastructure, which may not support all Python features. In such cases, alternative solutions or workarounds are required.

For instance, Numba does not support certain Python constructs, such as generators or some types of nested functions, directly within JIT-compiled functions. To overcome this, it is recommended to refactor the code or use Numba’s object mode, which allows more flexibility but may not offer the same level of optimization.

Additionally, Numba’s performance heavily relies on type inference. Explicit typing can enhance performance further, but it may require additional effort. Sometimes, Numba’s type inference may fail due to complex code logic or unsupported Python constructs, requiring manual type specification. By using explicit typing, you can provide type information to Numba, ensuring optimal performance.

This marks the end of the Numba Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Share on Facebook