Python dill Library Tutorial (Pickle Extension)

The Python Dill Library is a special Library used for Object Serialization. It is an extension of the popular Python Pickle library, with extra features and support for some complex situations that may arise. It’s also faster and more efficient when it comes to storing the serialized data.

We’ll start off with the basic features of dill, and towards the end discuss some differences between it, and it’s parent library pickle.


What is Object Serialization?

Serialization is the process of converting an object to a byte stream, and the inverse of which is converting a byte stream back to a python object.

In simpler words, Object Serialization is the process of converting actual Python Objects into bytes, allowing the whole object to be preserved (with all it’s current values). This is commonly known as “pickling” or “dumping”, where we save the byte stream into a file.

The reverse process of this is where we convert these bytes back into a Python Object.

Object Serialization is super useful in many scenarios, such as creating save files to store things like game data or training models for AI/Machine Learning problems. It can take a long time for AI algorithms to generate a model, so instead of doing it every time you run the program, you could just dump it to a file once, and then read it from there each time, potentially speeding up your program by 100x times.


Dumping (Pickling) Objects

Now let’s take a look at how we can use the Python dill library to serialize data into byte streams. Typically we save these byte streams into a file, which we might read later on when we need it that Python object(s).

The below code is basically our “Testing Data”. We’ll be creating 3 Objects out of the Class Car, and then serialize this to a file. We’ve appended these 3 objects to a list, so we just have to dump a single object (the list). Otherwise we would have to dump each Car object individually (Having a container object like a list is highly recommended).

import dill
import random as rand

class Car:
    def __init__(self, model, year, color, name):
        self.model = model
        self.year = year
        self.color = color
        self.name = name

    def display(self):
        print("Name: ", self.name)
        print("Model:", self.model)
        print("Year:", self.year)
        print("Color:",self.color,"\n")

data = []
data.append(Car("Regular", 2017, "Grey", "Toyota"))
data.append(Car("Special", 2019, "White", "BMV"))
data.append(Car("Limited", 2016, "Green", "Honda"))

Method# 1

In this first method, we’ll take a look at how to directly dump this data into a file using a file stream.

ofile = open("BinaryData", "wb")
dill.dump(data, ofile)
ofile.close()

Let’s take a look at the above code step-by-step. First we opened up a file stream using the standard open() function in python used in File Handling. The second parameter in the open() function is the mode. We’ve used the “wb” mode, which stands for “Binary write”. Normally this would be “w”, but we are dealing with Binary Data here.

The second line is where we use the dump() function in Dill, where the first parameter is the Object to be dumped, and the second is the file stream to which it is to be written.

The third line is just us closing the file stream.


Method# 2

Before we move on the ‘Un-Pickling” or “Loading” part, let’s take a look at a slightly alternative way of converting Python objects to a byte stream.

print(dill.dumps(data))

Instead of using dill.dump() we can use dill.dumps() which converts the Python object passed into it’s parameters into a byte stream in the form of a string. You may find this useful if you are looking to use the byte stream in your program such as sending it over a network.

The output of the above code looks something like this: (Only the first few lines are included)

b'\x80\x03]q\x00(cdill._dill\n_create_type\nq\x01(cdill._dill\n_load_type\nq\x02X\x04\x00\x00\x00typeq\x03\x85q\x04Rq\x05X\x03\x00\x00\x00Carq\x06h\x02X\x06\x00\x00\x00objectq\x07\x85q\x08Rq\t\x85q\n}q\x0b(X\n\x00\x00\x00__module__q\x0cX\x08\x00\x00\x00__main__q\rX\x08\x00\x00\x00__init__q\x0ecdill._dill\n_create_function\nq\x0f(cdill._dill\n_create_code\nq\x10(K\x05K\x00K\x05K\x02KCC\x1c|\x01|\x00_\x00|\x02|\x00_\x01|\x03|\x00_\x02|\x04|\x00_\x03d\x00S\x00q\x11N\x85q\x12(X\x05\x00\x00\x00modelq\x13X\x04\x00\x00\x00yearq\x14X\x05\x00\x00\x00colorq\x15X\x04\x00\x00\x00nameq\x16tq\x17(X\x04\x00\x00\x00selfq\x18h\x13h\x14h\x15h\x16tq\x19X$\x00\x00\x00

This is basically what your Python objects look like when converted to byte form. Re-loading these strange characters into Python using dill/pickle or any other serializing library, will create actual Python objects with the same values they had when they were dumped.


Loading (Un-Pickling) Objects

Now let’s explore how to Loading objects back

Method# 1

Just like how there is a dill.dump() function, there is also dill.load(). This function takes only a single parameter which is the file stream. It will return a single “Pickled” or “Dumped” object. This is rather important to remember if you have made multiple dumps to the same file.

Basically the number of times you called dill.dump() should equal the number of times you call dill.load() to read all the data.

ifile = open("BinaryData7", "rb")
newdata = dill.load(ifile)
ifile.close()

for x in newdata:
    x.display()

The output of the above code, displaying the objects we saved earlier:

Name:  Toyota
Model: Regular
Year: 2017
Color: Grey 

Name:  BMV
Model: Special
Year: 2019
Color: White 

Name:  Honda
Model: Limited
Year: 2016
Color: Green

Method# 2

Just like how there is a dill.dumps() function, there is also a dill.loads() function. Instead of taking a file stream object as parameter though, it takes a string. It’s basically meant to convert a direct binary stream to a Python object, rather than read that binary stream from a file first.

pickledData = dill.dumps(data)

newdata = dill.loads(pickledData)

for x in newdata:
    x.display()

This too has the same output as the first Method we discussed.


Dill Features:

Dill can pickle all normal types of standard data such as Lists and Dictionaries, just like Pickle. Unlike Pickle however, Dill is able to pickle some more exotic types of data listed below:

  • functions with yields, nested functions, lambdas,
  • cell, method, unboundmethod, module, code, methodwrapper,
  • dictproxy, methoddescriptor, getsetdescriptor, memberdescriptor,
  • wrapperdescriptor, xrange, slice,
  • notimplemented, ellipsis, qui

Dill cannot (yet) pickle these types of data:

  • frame
  • generator
  • traceback

This marks the end of the Python Dill Library Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments