Python ZIP Files

This a tutorial on the Python ZipFile library

What is a ZipFile?

ZIP is an archive file format that uses lossless data compression. This means that unlike lossy data compression, the original data can be reconstructed perfectly and restored to it’s original form. Python ZIP files allow us compress many smaller files into one single file, making it an ideal way to transfer data.

If you’re a beginner, you should first go through the file handling guide!

ZIP Files Module

Before we proceed, make sure you have the python ZIP files library installed. It is called the “zipfile” library.

import zipfile

Creating ZIP files

Creating a zip file isn’t all that different from making a regular file.

from zipfile import Zipfile

Zippy = ZipFile('Backup.zip", 'w')
# Code
Zippy.close()

Similarly to how we write lines of data into text files, we write file paths to zip files. These files are then “written” into the zip file and compressed automatically.

from zipfile import Zipfile

Zippy = ZipFile('Backup.zip", 'w')
Zippy.write("C:\\Users\User1\File.txt")
Zippy.close()

In the code above, The file called “file.txt” is written to the zipfile. As no directory was given for the zipfile, it shall save itself in the same directory as your python script. You can now compare your zipfile with the orignal file to see the size difference.

Note: The difference will be more noticeable on files of greater size, or many files.


Zipping a large number of files

ZIP files true potential is shown when dealing with multiple files. It might get a bit tedious writing several dozen write statements for each File you wish to include in the zip. This is where the os.walk function comes in handy.

Using os.walk we can crawl through every single directory, folder and subfolder in the PC. It takes as parameter a directory path. Run the following code and proceed to see every single file and subfolder name printed on screen.

import os

for folder, subfolders, files in os.walk("C:\\"):
    for subfolders in subfolders:
        print(subfolders)
    for file in files:
        print(file)

The os.walk function returns three values, folder, subfolders and files. Hence why we write three variables there to store all values. Afterwards its simply a matter of iterating through the subfolders and files.

If you’ve made it this far, it’s just a simple matter of selecting which files to save. Maybe you want all the files with the word, “Email” in them to be written to the zipfile, or perhaps all the files with a specific extension. Below is the code for a python program that takes as input a extension and proceeds to write all those files to a Zipfile.

from zipfile import ZipFile
import os

extension = input('Input file extension: ')

Zippy = ZipFile("Backup.zip", 'w')

for folder, subfolders, file in os.walk("C:\\Users"):
    for subfolders in subfolders:
        path = folder + subfolders 
    for x in file:
        if x.endswith(extension):
            filepath = folder + "\\" + x
            print(filepath)
            Zippy.write(filepath, compress_type=zipfile.ZIP_DEFLATED)
Zippy.close()
  • filepath variables contains a concatenated string comprised of directory name + file name which gives a complete file path.
  • Double slashes (“\\”) are compulsory as “\” is a special character. The first “\”, “escapes” the second “\”.
  • The path variable remains unused. In fact, it was not necessary to include the for loop for sub folders.
  • compress_type is an optional parameter, it does not have to be included. compress_level is another optional parameter that takes values from 0 – 9. (sometimes 1 – 9, depending on compress type).

Extracting from ZIP files

We can use the extractall function to extract all items from a ZIP file, or the extract function, to extract a specific file.

zip.setpassword


This marks the end of the Python Zip Files article. Any suggestions or contributions for CodersLegacy are more than welcome. Any questions can be asked in the comments section below.

Leave a Comment