I needed to create some tarballs from within a Python code. It was unsurprisingly straightforward.

Sod roof

Python’s standard library provides the tarfile module to read and write tar archives.

To add some files, one can use absolute paths or relative ones:

from pathlib import Path
import tarfile

fstab = Path("/etc/fstab")
os_release = Path("/etc/os-release")
local_readme = Path("README.md")

my_files = Path("my_files.tar.gz")

with tarfile.open(my_files, "w:gz") as tar:
    for afile in [fstab, os_release, local_readme]:
        tar.add(afile)

This results in the following tarball, assuming there’s a README.md in the current directory:

$ tar --list --file my_files.tar.gz
etc/fstab
etc/os-release
README.md

The paths in the tar file are the same as we supplied. This can be annoying if we pass something like /home/the_user/projects/foo/bar.md, as the final tarball will contain the fully qualified path to the file. If all we want is to have the files in the root of the tar file, we add the arcname to the tar.add line:

with tarfile.open(my_files, "w:gz") as tar:
    for afile in [fstab, os_release, local_readme]:
        tar.add(afile, arcname=afile.name)

That results in:

$ tar --list --file my_files.tar.gz
fstab
os-release
README.md

You can also pass directories to the tar.add() function and they will be tarballed recursively by default. You can change it with recursive=False argument if needed.

If you tar.add() a non-existing file, you get a FileNotFoundError exception.

Extracting a tarball is also straightforward:

with tarfile.open(my_files, "r") as tar:
    tar.extractall(path="/tmp", filter="data")

If not specified, path defaults to the current working directory.

The tarfile module also gives us a bonus command:

$ python -m tarfile -l my_files.tar.gz
fstab
os-release
README.md

A pure Python alternative to tar --list --file my_files.tar.gz.