Converting PDF to Images using pypdfium2

In this tutorial, we’ll walk through how to convert a PDF into images using the Python library pypdfium2. Whether you need to extract individual pages as images for further processing, web display, or archiving, this guide will help you get started with a simple, yet very powerful code example.


Introduction

pypdfium2 is a Python binding for PDFium—a fast, open-source PDF rendering engine originally developed by Google. It allows you to load, manipulate, and render PDF pages with ease.

Before diving into the code, make sure you have the following libraries installed.

pypdfium2 Library:

Bash
pip install pypdfium2

Pillow (PIL Fork): pypdfium2 renders PDF pages as Pillow images. If you don’t already have Pillow, install it as well:

Bash
pip install Pillow

Code Example

Below is a complete Python script that opens a PDF, renders each page as an image, and saves them as PNG files.

Bash
import pypdfium2 as pdfium

# Load the PDF document
pdf = pdfium.PdfDocument("my_pdf_file.pdf")

# Get the total number of pages
n_pages = len(pdf)

# Iterate through each page and render it as an image
for page_number in range(n_pages):
    page = pdf.get_page(page_number)
    pil_image = page.render_topil(
        scale=1,                         # Render at original size. Increase to zoom in.
        rotation=0,                      # No rotation applied. Change to 90, 180, or 270 as needed.
        crop=(0, 0, 0, 0),               # No cropping. Adjust if you want to trim margins.
        colour=(255, 255, 255, 255),     # Background color set to white (with full opacity).
        annotations=True,                # Include annotations (e.g., comments, highlights) if any.
        greyscale=False,                 # Render in color. Set True for greyscale images.
        optimise_mode=pdfium.OptimiseMode.NONE,  # No optimisations applied.
    )
    # Save the rendered image to a file
    pil_image.save(f"image_{page_number+1}.png")

Parameters Explained:

  • scale: Controls the zoom level. A value of 1 means the original size.
  • rotation: Rotates the page. Set to 0 for no rotation.
  • crop: A tuple (left, top, right, bottom) to crop the image. Here, no cropping is applied.
  • colour: Background color defined as an RGBA tuple. (255, 255, 255, 255) represents white.
  • annotations: When True, any annotations in the PDF are rendered.
  • greyscale: Set to False to render in color.
  • optimise_mode: Controls rendering optimizations. pdfium.OptimiseMode.NONE means no special optimizations.

You can also perform some further processing before saving it, since it’s a Pillow image. These type of images are compatible with various other python libraries as well.


Conclusion

Using pypdfium2, converting a PDF into images is both straightforward and customizable. This tutorial provided a clear example and explained each part of the code, making it easier to adapt the script for your specific requirements. Experiment with different parameters to see how they affect the output, and integrate this technique into your projects where PDF-to-image conversion is needed.

Happy coding!