3 minute read

Sometimes we have PDF documents and want to submit through a web form, but it is too big. Often, this size is caused by the images inside the PDF. We may reduce their resolution, their “dots per inch” (DPI) value, and then the PDF gets smaller. This can be done via Ghostscript. Here I provide the little script scalePdf.sh, which does this in the terminal. It basically just executes Ghostscript with the correct parameters.

All you have to do is to provide the path to the source PDF document and the DPI to which you want to scale it. If you call scalePdf.sh mydoc.pdf 200, then it will create a file mydoc_200dpi.pdf, which is the same document, but scaled to 200 DPI. Additionally, you can also provide a destination path as optional third argument. Invoking scalePdf.sh mydoc.pdf 33 ../result.pdf creates the document result.pdf in the parent folder of the current folder. This new document would be the same as mydoc.pdf, but with all images inside scaled to 33 DPI. You can play around with different resolutions (DPI values) until you reach an acceptable file size.

The script will check if GhostScript is installed and abort if not. In that case, an appropriate error message is printed. You can install GhostScript via sudo apt-get install ghostscript. The script will also fail with an error if either the input file does not exist or if the expected output file is not produced for some reason.

Here you can download this script and the complete collection of my personal scripts is available here.

#!/bin/bash -

# (Down)scale a PDF file to the given DPI value.
#
# The script expects the following parameters:
# 1. The path to a source PDF document.
# 2. The DPI value to scale to.
# 3. Optional: The output file name.
#
# If the output file name is not specified, we will create a document in the
# current directory based on the input document name, i.e., "123.pdf" becomes
# "123_200dpi.pdf" if 200 were specified as second parameter.
#
# This script is basically a wrapper around Ghostscript.

# strict error handling
set -o pipefail  # trace ERR through pipes
set -o errtrace  # trace ERR through 'time command' and other functions
set -o nounset   # set -u : exit the script if you try to use an uninitialized variable
set -o errexit   # set -e : exit the script if any statement returns a non-true return value

if [ $# -lt 2 ]; then
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): (Down)Scale the images in a PDF document."
    echo "Parameters:"
    echo " 1. path to the source PDF document"
    echo " 2. the DPI value to scale to"
    echo " 3. OPTIONAL: output file name"
    exit 1
fi

if ! ( command -v gs &> /dev/null ); then
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Ghostscript is not installed but needed."
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): You can install it via 'sudo apt-get install ghostscript'."
    exit 1
fi

srcDocument="$(realpath "$1")"
if [ -f "$srcDocument" ]; then
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Source document is '$srcDocument'."
else
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Source document $srcDocument' does not exist."
    exit 1
fi

dpi="$2"
dstDocument="${3:-}"
if [[ -n "$dstDocument" ]]; then
    dstDocument="$(realpath $dstDocument)"
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Scaling '$srcDocument' to the specified destination document '$dstDocument' using $dpi dpi."
else
    dstDocument="$(basename "${srcDocument%.*}_${dpi}dpi.pdf")"
    dstDocument="$(realpath $dstDocument)"
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): No destination document specified, therefore converting '$srcDocument' to  '$dstDocument' using $dpi dpi."
fi

echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Now piping document '$srcDocument' through GhostScript, creating '$dstDocument'."

gs -dAntiAliasColorImages=true \
   -dAntiAliasGrayImages=true \
   -dAntiAliasMonoImages=true \
   -dAutoFilterColorImages=false \
   -dAutoFilterGrayImages=false \
   -dAutoRotatePages=/None \
   -dBATCH \
   -dCannotEmbedFontPolicy=/Error \
   -dColorConversionStrategy=/LeaveColorUnchanged \
   -dColorImageDownsampleType=/Bicubic \
   -dColorImageFilter=/FlateEncode \
   -dColorImageResolution="$dpi" \
   -dCompatibilityLevel="1.7" \
   -dCompressFonts=true \
   -dCompressStreams=true \
   -dCreateJobTicket=false \
   -dDetectDuplicateImages=true \
   -dDoThumbnails=false \
   -dDownsampleColorImages=true \
   -dDownsampleGrayImages=true \
   -dDownsampleMonoImages=true \
   -dEmbedAllFonts=true \
   -dFastWebView=false \
   -dGrayImageFilter=/FlateEncode \
   -dGrayImageDownsampleType=/Bicubic \
   -dGrayImageResolution="$dpi" \
   -dHaveTransparency=true \
   -dMaxBitmap=2147483647 \
   -dMonoImageDownsampleType=/Subsample \
   -dMonoImageResolution="$dpi" \
   -dNOPAUSE \
   -dNOPROMPT \
   -dOptimize=true \
   -dPassThroughJPEGImages=true \
   -dPassThroughJPXImages=true \
   -dPDFSTOPONERROR=true \
   -dPDFSTOPONWARNING=true \
   -dPreserveCopyPage=false \
   -dPreserveEPSInfo=false \
   -dPreserveHalftoneInfo=false \
   -dPreserveOPIComments=false \
   -dPreserveOverprintSettings=false \
   -dPreserveSeparation=false \
   -dPreserveDeviceN=false \
   -dPreserveMarkedContent=false \
   -dPrinted=false \
   -dOmitInfoDate=true \
   -dOmitID=true \
   -dOmitXMP=true \
   -dQUIET \
   -dSAFER \
   -dSubsetFonts=true \
   -dUCRandBGInfo=/Remove \
   -dUNROLLFORMS \
   -sDEVICE=pdfwrite \
   -sOutputFile="$dstDocument" \
   "$srcDocument" \
   -q

if [ -f "$dstDocument" ]; then
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Finished scaling '$srcDocument' to $dpi DPI and writing output to '$dstDocument'."
else
    echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination document '$dstDocument' was not created."
    exit 1
fi