How to read password-protected Excel files with Pandas

Owing to their familiar interface and use of structured data, Excel spreadsheets are widely used for data storage and analysis. However, in order to protect sensitive information, many Excel files are password-protected, restricting access to unauthorized individuals. This strengthens data privacy and integrity but can lead to more challenges for the average user. Thankfully, Pandas provides several different techniques to read password-protected Excel files.

Since we have already covered how to read Excel files, we’ll focus more on how we can decrypt protected Excel files.

The sample Excel file that we will be using for this tutorial is titled “ProtectedExcel.xlsx”. It contains only one sheet and its contents can be found below. It is also password-protected with the password being “123”.

Image for "How to read password-protected Excel files with Pandas"

Read password-protected Excel files with msoffcrypto & Pandas

Let’s start off by using the msoffcrypto module and Pandas to read password-protected Excel files. msoffcrypto is used for decrypting MS Office files that have been encrypted using a password or a private key.

Prerequisites:

First, before we can use it, we’ll have to install the msoffcrypto library. To do this, open your command prompt and type the following command.

pip install msoffcrypto-tool

After all the necessary packages are downloaded, restart your code editor.

Specifying the path of the Excel file:

Using the os module, we assign the current directory’s path to cdir and then use the join() method to add the Excel file’s name. Finally, we store the full file path in fpath.

import os

cdir = os.path.dirname(__file__)
fpath = os.path.join(cdir, "ProtectedExcel.xlsx")

Creating a BytesIO object:

The io module provides a set of tools and classes for working with input and output operations. After importing it, we need to access the BytesIO class.

The BytesIO class provides a flexible and convenient way to work with these I/O operations in Python. It is a specialized class that allows you to create in-memory binary buffers and treat them as file-like objects. It provides a way to read from and write to a byte-based buffer in a manner similar to reading from and writing to a file.

In our case, we’ll create a BytesIO object called unlocked_file, where we will later store our decrypted file.

import io

unlocked_file = io.BytesIO()

Decrypting the Excel file:

Now let’s start the decryption process. First, we’ll use the with statement to create a new code block where we open the Excel file through fpath and assign its file object to file. We’ll also use the "rb" mode to allow us to open a file in read mode and read its contents as binary data.

Then we’ll utilize the OfficeFile() method which takes a file object as a parameter and examines its file format to return a corresponding office file object.

Next, we’ll implement the load_key() and decrypt() methods. load_key() takes password as a parameter so we’ll pass “123”. decrypt() will take the BytesIO object (unlocked_file) which we already created and store the new decrypted Excel file in it.

import msoffcrypto

with open(fpath, "rb") as file:
    excel_file = msoffcrypto.OfficeFile(file)
    excel_file.load_key(password = "123")
    excel_file.decrypt(unlocked_file)

Reading the Excel file:

Finally, we’ll read the new decrypted Excel file using the read_excel() method.

import pandas as pd

df = pd.read_excel(unlocked_file)

Compiled code:

You can find the full code below:

import os
import io
import msoffcrypto
import pandas as pd

cdir = os.path.dirname(__file__)
fpath = os.path.join(cdir, "ProtectedExcel.xlsx")

unlocked_file = io.BytesIO()

with open(fpath, "rb") as file:
    excel_file = msoffcrypto.OfficeFile(file)
    excel_file.load_key(password = "123")
    excel_file.decrypt(unlocked_file)

df = pd.read_excel(unlocked_file)
print(df)

Output:

      Name  Age  Gender           City      Occupation
0  Vincent   25    Male       New York    Software Eng
1      Ada   32  Female    Los Angeles    Data Analyst
2  Gilbert   41    Male        Chicago      Accountant
3   Elliot   28    Male  San Francisco  Marketing Spec

Read password-protected Excel files with win32com.client & Pandas

Now we’ll move on to discussing how to read password-protected Excel files with win32com.client and Pandas. The win32com.client module is a Python library that allows you to interact with Windows APIs. This allows us to control various software applications, such as Microsoft Office Excel using Python code.

Prerequisites:

Start off by downloading the necessary files. Type the following command in your command prompt.

pip install pypiwin32

Restart your code editor after the dependencies are installed.

Specifying the paths of the original and new Excel files:

Unlike the previous technique where we stored the decrypted contents of “ProtectedExcel.xlsx” in an in-memory buffer, we will now store these contents in a completely new Excel file. Thus we will need to specify the file paths of the original and new Excel files. For the sake of simplicity, we’ll store the new Excel file, titled “UnlockedExcel.xlsx” in the same directory as “ProtectedExcel.xlsx”.

Aside from this small change, we will use the same methods showcased in the previous technique. fpath represents the file path to “ProtectedExcel.xlsx” and newfpath represents the file path to “UnlockedExcel.xlsx”

import os

cdir = os.path.dirname(__file__)
fpath = os.path.join(cdir, "ProtectedExcel.xlsx")
newfpath = os.path.join(cdir, "UnlockedExcel.xlsx")

Decrypting the original Excel file:

After importing win32com.client, we’ll use the Dispatch() method to create a connection to the Excel application by passing “Excel.Application” as a parameter. Dispatch() will return an object that represents the Excel application which we will store in excel_file to interact with the application’s functionalities.

Moreover, we’ll use Workbooks.Open() and pass fpath and the password (“123”) along with some other technical parameters to decrypt and open “ProtectedExcel.xlsx”. We’ll store the returned file-like object in unlocked_file.

import win32com.client as win32

excel_file = win32.Dispatch("Excel.Application")
unlocked_file = excel_file.Workbooks.Open(fpath, False, False, None, "123")

Creating a new Excel file:

To utilize this decrypted Excel file, we will save it as a new file. The path for this new file is stored in newfpath. The SaveAs() method will be used to create a new Excel file using the file path we pass into it.

The application may display alerts during this process. We’ll use the DisplayAlert property to control whether or not the application displays alert messages/prompts to the user. Before using SaveAs(), we’ll turn alerts off and then set it back on after we’re done using it.

Lastly, we’ll quit the unlocked Excel file after we are done reading from it.

excel_file.DisplayAlerts = False
unlocked_file.SaveAs(newfpath, None, "", "")
excel_file.DisplayAlerts = True

excel_file.Quit()

Reading the new Excel file:

Similar to the previous technique, we’ll call read_excel() to obtain a Pandas DataFrame containing the contents of the new Excel file.

import pandas as pd

df = pd.read_excel(newfpath)

Deleting the new Excel file:

This final step is optional because it deletes the new Excel file to avoid unnecessary data duplication. If you open a password-protected Excel file multiple times, you don’t want to create multiple copies of the same Excel file.

The os module contains remove() which takes a file path as a parameter and deletes the file stored at that file path.

os.remove(newfpath)

Compiled code:

The full code for this technique can be found below:

import os
import win32com.client as win32
import pandas as pd

cdir = os.path.dirname(__file__)
fpath = os.path.join(cdir, "ProtectedExcel.xlsx")
newfpath = os.path.join(cdir, "UnlockedExcel.xlsx")

excel_file = win32.Dispatch("Excel.Application")
unlocked_file = excel_file.Workbooks.Open(fpath, False, False, None, "123")

excel_file.DisplayAlerts = False
unlocked_file.SaveAs(newfpath, None, "", "")
excel_file.DisplayAlerts = True
excel_file.Quit()

df = pd.read_excel(newfpath)

os.remove(newfpath)

print(df)

Output:

      Name  Age  Gender           City      Occupation
0  Vincent   25    Male       New York    Software Eng
1      Ada   32  Female    Los Angeles    Data Analyst
2  Gilbert   41    Male        Chicago      Accountant
3   Elliot   28    Male  San Francisco  Marketing Spec

This marks the end of the “How to read password-protected Excel files with Pandas” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments