Compressing Files and Directories into Archives

2 minute read

Often, I want to compress one or multiple files or directories into an archive. Maybe I want to upload experimental results to zenodo or exchange them with my students. Maybe I want to offer a package with slides of one of my classes. Then I always use tar.xz archives. They take a longer time to create, but the compression is usually best.

Here you can find the Linux/Bash script xzCompress.sh that can compress one or multiple files or directories into such an archive. You can download the script from here.

Using it is fairly simple: If you want to compress a single file or folder named X, just invoke xzCompress.sh "X" and this will create the archive X.tar.xz. If you want to store multiple files or folders, say, A, B, and C, into an archive named Y.tar.xz, then you would write xzCompress.sh "Y" "A" "B" "C".

The script will try to use a reasonable number of CPU cores, trying to strike a balance between speed and keeping the computer well usable during compression. Still, if you have lots of data, the compression can take quite some time. But usually it is well worth it.

The resulting archives Z.tar.xz can later be unpacked using tar -xf Z.tar.xz.

#!/bin/bash

# Compress files and folders to .tar.xz archives, using the strongest
# possible compression.
# Later, you can decompress the generated archive with the command
# "tar -xf archive.tar.xz".
#
# The script can be called in two ways:
#
# 1. With a single parameter 'X', which can be either a file or directory.
#    Then, an archive with name 'X.tar.xz' is created and the contents of 'X'
#    are packaged into it.
#
# 2. With multiple parameters 'Y', 'A', 'B', 'C', and so on.
#    Then, an archive with name 'Y.tar.xz' is created and the contents of 'A',
#    'B', and 'C', and so on are packaged into it.
#    'Y' is treated solely as archive name, not as source.
#
# This script may take a lot of memory and time.
# If you have N logical CPU cores, this script attempts to use
# max{1, ((N-1)/2)-1} threads and launches the compressor with niceness of 19.
# Therefore, the system should still be usable during compression.

# strict error handling
set -o pipefail  # trace ERR through pipes
set -o errtrace  # trace ERR through 'time command' and other functions
set -o nounset   # set -u : exit the script if you try to use an uninitialized variable
set -o errexit   # set -e : exit the script if any statement returns a non-true return value

dest="${1%/}"
dest="$(basename "$dest").tar.xz"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination archive name is '$dest'."

if command -v nproc &> /dev/null; then
    nthreads="$(nproc --all)"
    nthreads="$((nthreads - 1))"
    nthreads="$((nthreads / 2))"
    nthreads="$((nthreads - 1))"
    if [ $nthreads -le 1 ]; then
        nthreads=1
    fi
else
    nthreads=1
fi
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Using $nthreads threads."

if [ $# \> 1 ];
then
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Beginning to compress sources '${@:2}' to destination '$dest'"
    nice -n 19 tar -c "${@:2}" | xz --threads=$nthreads -v -9e -c > "$dest"
else
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Beginning to compress sources '$1' to destination '$dest'"
    nice -n 19 tar -c "$1" | xz --threads=$nthreads -v -9e -c > "$dest"
fi

echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Done compressing to destination '$dest'."

Downloading a File via a Script

1 minute read

Sometimes, we want to download a file via the terminal Linux. Downloading files may always fail due to connection issues. So we want a command that tries again and again until success. Here I post a small Bash script does exactly that, which you may store as download.sh. The script takes one or multiple URLs as parameter and downloads the corresponding files. You could, for example, type download.sh "https://thomasweise.github.io/programmingWithPython/programmingWithPython.pdf" and it will download our book on the Python programming language. The script uses wget internally, but adds an infinite loop around it. It also first tries 10 times to download the file without rate limitation and afterwards limits the connect speed in follow-up attempts. If the URL is simply invalid or the file does not exist, the script loops forever. Here you can download this script.

Updating all Packages and Snaps

3 minute read

I use Linux for my work and try to keep my system up-to-date. This means that, whenever I shut down my computer, I run a little Bash script update.sh that updates all installed packages, both deb and snap packages. Sometimes, for whatever reason, there may be a problem with some inconsistent package state. My script basically applies all methods I know to fix such state. The script is very heavy-handed and loops over the update process four times. If for whatever reason one update would require another first and that could only be done by multiple updates, then that’s no problem. It also sleeps for one second between each two update steps, just in case. I usually run something like sudo update.sh && shutdown now, which then shuts down my computer. So I do not really need to care how long the script needs anyway. Here you can download this script.

Repeating a Command until it Succeeds under Linux

2 minute read

Assume that we have a command that we want to execute in a Linux terminal. We know that the command may sometimes fail, but it should actually succeed. So we would try the command again and again until this happens. An example is, for instance, a download process. If we know that the URL is reachable, then the command should succeed. However, maybe there is a lost connection or other temporary disturbance. Another such command could be a build process which downloads and installs dependencies.

Recursively Deleting Empty Files and Directories under Linux

1 minute read

Sometimes, we have the need to delete empty files and empty directories under Linux. This happens, for example, when we want to restart an experiment with moptipy. Here I post a small Bash script that you may store as file deleteZeroSizeFiles.sh. It will do exactly that: It will search the current directory and all subdirectories for empty files, i.e., files of size zero. It will delete all of them. Then, it will recursively look for empty directories, i.e., directories that do not contain files or other directories. It will delete them as well. All files and directories that get deleted are also printed, so you can see what happened. Here you can download this script.

Thomas Weise (汤卫思)

Compressing Files and Directories into Archives

You May Also Enjoy

Downloading a File via a Script

Updating all Packages and Snaps

Repeating a Command until it Succeeds under Linux

Recursively Deleting Empty Files and Directories under Linux