Handling tarballs in Python
I needed to create some tarballs from within a Python code. It was unsurprisingly straightforward.
Python’s standard library provides the
tarfile
module to read and
write tar archives.
To add some files, one can use absolute paths or relative ones:
from pathlib import Path
import tarfile
fstab = Path("/etc/fstab")
os_release = Path("/etc/os-release")
local_readme = Path("README.md")
my_files = Path("my_files.tar.gz")
with tarfile.open(my_files, "w:gz") as tar:
for afile in [fstab, os_release, local_readme]:
tar.add(afile)
This results in the following tarball, assuming there’s a README.md
in the
current directory:
$ tar --list --file my_files.tar.gz
etc/fstab
etc/os-release
README.md
The paths in the tar file are the same as we supplied. This can be annoying if
we pass something like /home/the_user/projects/foo/bar.md
, as the final
tarball will contain the fully qualified path to the file. If all we want is to
have the files in the root of the tar file, we add the arcname
to the
tar.add
line:
with tarfile.open(my_files, "w:gz") as tar:
for afile in [fstab, os_release, local_readme]:
tar.add(afile, arcname=afile.name)
That results in:
$ tar --list --file my_files.tar.gz
fstab
os-release
README.md
You can also pass directories to the tar.add()
function and they will be
tarballed recursively by default. You can change it with recursive=False
argument if needed.
If you tar.add()
a non-existing file, you get a FileNotFoundError
exception.
Extracting a tarball is also straightforward:
with tarfile.open(my_files, "r") as tar:
tar.extractall(path="/tmp", filter="data")
If not specified, path
defaults to the current working directory.
The tarfile
module also gives us a bonus command:
$ python -m tarfile -l my_files.tar.gz
fstab
os-release
README.md
A pure Python alternative to tar --list --file my_files.tar.gz
.