2 minute read

Often, I want to compress one or multiple files or directories into an archive. Maybe I want to upload experimental results to zenodo or exchange them with my students. Maybe I want to offer a package with slides of one of my classes. Then I always use tar.xz archives. They take a longer time to create, but the compression is usually best.

Here you can find the / script xzCompress.sh that can compress one or multiple files or directories into such an archive. You can download the script from here.

Using it is fairly simple: If you want to compress a single file or folder named X, just invoke xzCompress.sh "X" and this will create the archive X.tar.xz. If you want to store multiple files or folders, say, A, B, and C, into an archive named Y.tar.xz, then you would write xzCompress.sh "Y" "A" "B" "C".

The script will try to use a reasonable number of CPU cores, trying to strike a balance between speed and keeping the computer well usable during compression. Still, if you have lots of data, the compression can take quite some time. But usually it is well worth it.

The resulting archives Z.tar.xz can later be unpacked using tar -xf Z.tar.xz.

#!/bin/bash

# Compress files and folders to .tar.xz archives, using the strongest
# possible compression.
# Later, you can decompress the generated archive with the command
# "tar -xf archive.tar.xz".
#
# The script can be called in two ways:
#
# 1. With a single parameter 'X', which can be either a file or directory.
#    Then, an archive with name 'X.tar.xz' is created and the contents of 'X'
#    are packaged into it.
#
# 2. With multiple parameters 'Y', 'A', 'B', 'C', and so on.
#    Then, an archive with name 'Y.tar.xz' is created and the contents of 'A',
#    'B', and 'C', and so on are packaged into it.
#    'Y' is treated solely as archive name, not as source.
#
# This script may take a lot of memory and time.
# If you have N logical CPU cores, this script attempts to use
# max{1, ((N-1)/2)-1} threads and launches the compressor with niceness of 19.
# Therefore, the system should still be usable during compression.

# strict error handling
set -o pipefail  # trace ERR through pipes
set -o errtrace  # trace ERR through 'time command' and other functions
set -o nounset   # set -u : exit the script if you try to use an uninitialized variable
set -o errexit   # set -e : exit the script if any statement returns a non-true return value

dest="${1%/}"
dest="$(basename "$dest").tar.xz"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination archive name is '$dest'."

if command -v nproc &> /dev/null; then
    nthreads="$(nproc --all)"
    nthreads="$((nthreads - 1))"
    nthreads="$((nthreads / 2))"
    nthreads="$((nthreads - 1))"
    if [ $nthreads -le 1 ]; then
        nthreads=1
    fi
else
    nthreads=1
fi
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Using $nthreads threads."

if [ $# \> 1 ];
then
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Beginning to compress sources '${@:2}' to destination '$dest'"
    nice -n 19 tar -c "${@:2}" | xz --threads=$nthreads -v -9e -c > "$dest"
else
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Beginning to compress sources '$1' to destination '$dest'"
    nice -n 19 tar -c "$1" | xz --threads=$nthreads -v -9e -c > "$dest"
fi

echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Done compressing to destination '$dest'."