The struct module in Python was introduced with the Python 3.x version. The struct module is used to convert native Python data types such as strings and numbers into a string of bytes and vice versa. In other words, it allows us to convert Python data into binary data.
Binary data is commonly used in places such as files, databases, network connections, etc. which can all be handled using the Python Struct Module.
Packing Python Data into Binary
Let’s first take a look at how to convert Python Datatypes into Binary streams.
Syntax
To convert a Python datatype(s) to Binary, we will be using the pack()
function. The following code shows the required syntax to use the pack()
function.
binary_data = struct.pack(format, value1, value2, ...)
The format variable is a string that defines the types of the values. The struct module relies on C datatypes, each of which has a fixed size. The below table shows you some of the datatypes in C, and the corresponding character that represents it.
Format Character | Data type |
---|---|
c | character |
s | char[] |
h | short |
i | integer |
l | long |
f | float |
Python Struct Module – Examples
Let’s take a look at a small example. We have passed in three values into pack()
, but before they can be converted to binary, we must know their size. And that’s where the format string comes in. We have created a string of three characters, each of which is “h”. This indicates to the pack()
function that there are three values of type “short” to be packed.
var = struct.pack('hhh',1,2,3)
print(var)
b'\x01\x00\x02\x00\x03\x00'
Every 2 digits after the “\x” represents a single byte. A short
occupies two Bytes, hence why there are a total of 6 pairs of digits.
The way the data is read depends on whether the computer architecture is little-endian or big-endian. The previous output was shown in little-endian format. If big-endian format was followed, then it would show as the following.
b'\x00\x01\x00\x02\x00\x03'
You may notice certain cases where the size of the byte stream does not match the size mentioned by format
.
var = struct.pack('hii',1,2,3)
print(var)
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
This is because of pad bytes, which the computer/machine inserts on its own for ease of reading data. In other words, in order to ensure optimal performance while reading memory, the machine will insert null/pad bytes at appropriate locations. (Usually to try and make multiples of 2 or 4 bytes, which ensures the best results)
We can also verify the number of bytes by using the calcsize()
function from the Struct module.
print(struct.calcsize('hhh'))
print(struct.calcsize('hii'))
6
12
A small example showing the conversion of strings and float values to binary.
var = struct.pack('h 5s f', 2, b"Sarah", 5.2)
print(var)
b'\x02\x00Sarah\x00ff\xa6@'
Unpack Binary Data
Now let’s take a look at how to unpack Binary data with Python struct.
For the examples, let’s try to unpack the binary streams produced from the previous examples. The unpack() function returns a tuple containing the original values.
var = struct.unpack('h 5s f', b'\x02\x00Sarah\x00ff\xa6@')
print(var)
(2, b'Sarah', 5.199999809265137)
The decimal value is not the same due to a computer’s inability to account for fractions. Using a datatype of higher range/size will improve accuracy. Such as the double
, which is twice the size of float
.
Here is another example, where we directly put the binary stream produced from the pack() function, into the unpack() function.
binary = struct.pack('hii',1,2,3)
non_binary = struct.unpack('hii', binary)
print(non_binary)
(1, 2, 3)
Additional Functions from Struct Module
Let’s take a look at two other functions from the Python Struct Module, pack_into()
and unpack_from()
. Both of these reply on a previously created buffer, which we will use in the function call as a parameter.
The buffer will be created using the ctypes library in Python.
Syntax
struct.pack_into(format, buffer, offset, val1, val2, ...)
struct.unpack_from(format, buffer, offset = 0)
Both of these functions are special versions of the pack() and unpack() functions respectively, but with the ability to pack and unpack with an offset, instead of the whole binary stream.
Let’s take a look at a small example demonstrating the use of these two functions.
import struct
import ctypes
# Calculate size for Buffer
size = struct.calcsize('hii')
print(size)
# Buffer is created with +2 in size, to accommodate offset
buffer = ctypes.create_string_buffer(size + 2)
# Values 1,2,3 packed into the buffer at offset of 2
struct.pack_into('hii',buffer, 2, 1, 2, 3)
# unpack_from() returns tuple of values
print(struct.unpack_from('hii', buffer, offset = 2))
# If unpacked without the same offset as pack_into,
# then values will appear incorrectly
print(struct.unpack_from('hii', buffer))
12
(1, 2, 3)
(0, 131072, 196608)
This marks the end of the Python Struct Module Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.