The Python Dill Library is a special Library used for Object Serialization. It is an extension of the popular Pickle library, with extra features and support for some complex situations that may arise. It’s also faster and more efficient when it comes to storing the serialized data.
We’ll start off with the basic features of dill, and towards the end discuss some differences between it, and it’s parent library pickle.
What is Object Serialization?
Serialization is the process of converting an object to a byte stream, and the inverse of which is converting a byte stream back to a python object.
In simpler words, Object Serialization is the process of converting actual Python Objects into bytes, allowing the whole object to be preserved (with all it’s current values). This is commonly known as “pickling” or “dumping”, where we save the byte stream into a file.
The reverse process of this is where we convert these bytes back into a Python Object.
Object Serialization is super useful in many scenarios, such as creating save files to store things like game data or training models for AI/Machine Learning problems. It can take a long time for AI algorithms to generate a model, so instead of doing it every time you run the program, you could just dump it to a file once, and then read it from there each time, potentially speeding up your program by 100x times.
Dumping (Pickling) Objects
Now let’s take a look at how we can use the Python dill library to serialize data into byte streams. Typically we save these byte streams into a file, which we might read later on when we need it that Python object(s).
The below code is basically our “Testing Data”. We’ll be creating 3 Objects out of the Class Car, and then serialize this to a file. We’ve appended these 3 objects to a list, so we just have to dump a single object (the list). Otherwise we would have to dump each Car object individually (Having a container object like a list is highly recommended).
import dill import random as rand class Car: def __init__(self, model, year, color, name): self.model = model self.year = year self.color = color self.name = name def display(self): print("Name: ", self.name) print("Model:", self.model) print("Year:", self.year) print("Color:",self.color,"\n") data =  data.append(Car("Regular", 2017, "Grey", "Toyota")) data.append(Car("Special", 2019, "White", "BMV")) data.append(Car("Limited", 2016, "Green", "Honda"))
In this first method, we’ll take a look at how to directly dump this data into a file using a file stream.
ofile = open("BinaryData", "wb") dill.dump(data, ofile) ofile.close()
Let’s take a look at the above code step-by-step. First we opened up a file stream using the standard
open() function in python used in File Handling. The second parameter in the
open() function is the mode. We’ve used the “wb” mode, which stands for “Binary write”. Normally this would be “w”, but we are dealing with Binary Data here.
The second line is where we use the
dump() function in Dill, where the first parameter is the Object to be dumped, and the second is the file stream to which it is to be written.
The third line is just us closing the file stream.
Before we move on the ‘Un-Pickling” or “Loading” part, let’s take a look at a slightly alternative way of converting Python objects to a byte stream.
Instead of using
dill.dump() we can use
dill.dumps() which converts the Python object passed into it’s parameters into a byte stream in the form of a string. You may find this useful if you are looking to use the byte stream in your program such as sending it over a network.
The output of the above code looks something like this: (Only the first few lines are included)
This is basically what your Python objects look like when converted to byte form. Re-loading these strange characters into Python using dill/pickle or any other serializing library, will create actual Python objects with the same values they had when they were dumped.
Loading (Un-Pickling) Objects
Now let’s explore how to Loading objects back
Just like how there is a
dill.dump() function, there is also
dill.load(). This function takes only a single parameter which is the file stream. It will return a single “Pickled” or “Dumped” object. This is rather important to remember if you have made multiple dumps to the same file.
Basically the number of times you called
dill.dump() should equal the number of times you call
dill.load() to read all the data.
ifile = open("BinaryData7", "rb") newdata = dill.load(ifile) ifile.close() for x in newdata: x.display()
The output of the above code, displaying the objects we saved earlier:
Name: Toyota Model: Regular Year: 2017 Color: Grey Name: BMV Model: Special Year: 2019 Color: White Name: Honda Model: Limited Year: 2016 Color: Green
Just like how there is a
dill.dumps() function, there is also a
dill.loads() function. Instead of taking a file stream object as parameter though, it takes a string. It’s basically meant to convert a direct binary stream to a Python object, rather than read that binary stream from a file first.
pickledData = dill.dumps(data) newdata = dill.loads(pickledData) for x in newdata: x.display()
This too has the same output as the first Method we discussed.
Dill can pickle all normal types of standard data such as Lists and Dictionaries, just like Pickle. Unlike Pickle however, Dill is able to pickle some more exotic types of data listed below:
functions with yields, nested functions, lambdas,
cell, method, unboundmethod, module, code, methodwrapper,
dictproxy, methoddescriptor, getsetdescriptor, memberdescriptor,
wrapperdescriptor, xrange, slice,
notimplemented, ellipsis, qui
Dill cannot (yet) pickle these types of data:
This marks the end of the Python Dill Library Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.