In the event that we need to permanently store data, we can easily do so by writing it into a Pickle file. In this tutorial, we will explore how to write a Pandas DataFrame to a Pickle file. Pickle is a file format used in Python for serializing and deserializing Python objects.
Pickle files allow us to directly access any object it currently stores while preserving the contents of the original data. It supports the majority of Python data types such as lists and dictionaries. Furthermore, it is also platform-independent meaning you can pickle an object on one platform and unpickle it on another without any compatibility issues.
For this tutorial, we will be using the following Pandas DataFrame:
import pandas as pd
df = pd.DataFrame([["Break","London","25/09/1986"],
["Xai","Toronto","22/04/1971"],
["Rufus","Paris","08/08/1998"]])
Using to_pickle() to write a Pandas DataFrame to a Pickle File:
This is arguably the best and simplest procedure. The to_pickle()
method takes path
as a parameter, which refers to the file path of the pickle file we wish to create. All we have to do is simply call it through the Pandas DataFrame object and pass the file path as a parameter.
Example 1:
For this example, we will simply store the contents of df
to a pickle file named "sample.pkl"
.
df.to_pickle("sample.pkl")
As a result, a new file will be created in our current directory titled "sample.pkl"
.
Example 2:
to_pickle()
also takes additional optional parameters. These include compression
, protocol
, and storage_options
.
compression
specifies the compression algorithm to be used when pickling an object and saving it to a file. It accepts a string or a dictionary as values, and its default value is “infer”. This means that Python will automatically determine the most appropriate compression based on the available information. Remember, when reading from the compressed pickle file, we must specify the compression algorithm we used when writing the file. Possible options are “infer”, “gzip”, “bzip”, “zip”.
protocol
specifies the protocol to be used by the pickler. It accepts an integer as a value and its default value is 5 which is the highest protocol.
storage_options
is used to configure various storage-specific options when saving the pickled object to a file. Its usage is primarily relevant for cloud storage services like Amazon S3, Google Cloud Storage, or Azure Blob Storage. It accepts a dictionary as a value and its default value is None
.
df.to_pickle(path = "sample.pkl", compression = "zip", protocol = 4)
Using the pickle module to write a Pandas DataFrame to a Pickle File:
The pickle module allows to serialize and de-serialize Python objects into byte streams that can be stored into pickle files. It contains a variety of methods but for now, we will only concern ourselves with the dump()
method. We pass the Pandas DataFrame object and the pickle file’s object as parameters.
Before we can use the pickle module, we need to create a pickle file. To do so, we pass "sample.pkl"
as the name of the pickle file and "wb"
(writing in binary mode) as the file mode. This results in the creation of pfile
, a new file object. Then we call the dump()
method, and pass df
and pfile
as parameters. This writes df
into the sample.pkl
. Finally, we close the pickle file.
Example:
import pickle
pfile = open("sample.pkl", "wb")
pickle.dump(df, pfile)
pfile.close()
Writing a Pandas Series to a Pickle File:
We have discussed how to write a DataFrame to a Pickle file. Writing a Series involves the same steps. Use the to_pickle()
method and specify the path
. You may also specify the compression
, protocol
, and storage_options
if need be.
Example:
ser = pd.Series(["Rufus", "Paris", "08/08/1998"])
ser.to_pickle("series_sample.pkl")
This marks the end of the “Write Pandas DataFrame to Pickle file” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.