Converting a PDF Document to a Series of Images
Sometimes we have a PDF document but want to convert it to a series of JPG or PNG images page-by-page.
This can be done via Ghostscript.
Here I provide the little Bash script pdf2imgs.sh, which wraps around Ghostscript and does this in the Linux terminal.
It takes as parameters
- The path to the source PDF document.
- OPTIONAL: The resolution for the output images in DPI, default is 300 DPI
- OPTIONAL: The output image type, which must be either
jpgorpng, wherejpgis the default. - OPTIONAL: The destination folder.
The default is a subfolder with the name of the original document with
-imagesattached, in the current folder.
For example, pdf2imgs.sh myDoc.pdf will create a folder named myDoc-images inside the current folder.
It will create JPG images with the names myDoc-00001.jpg, myDoc-00002.jpg, … inside this folder, one of each page of myDoc.pdf, and each with the resolution of 300 DPI.
pdf2imgs.sh otherDoc.pdf 128 png X, on the other hand, creates the folder X inside the current folder and places PNG images with the names otherDoc-00001.png, otherDoc-00002.png, and so on into it.
Again, each image holds one page of otherDoc.pdf and having a resolution of 128 dots per inch.
Here you can download this script and the complete collection of my personal scripts is available here.#!/bin/bash -
# Convert a PDF to a series of images, page-by-page.
#
# The script expects the following parameters:
# 1. The path to a source document.
# 2. The resolution (DPI) of the images to generate
# 3. OPTIONAL: The destination file type (default: jpg)
# 4. OPTIONAL: The destination folder path
# strict error handling
set -o pipefail # trace ERR through pipes
set -o errtrace # trace ERR through 'time command' and other functions
set -o nounset # set -u : exit the script if you try to use an uninitialized variable
set -o errexit # set -e : exit the script if any statement returns a non-true return value
if [ $# -lt 1 ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Convert a PDF document to a sequence of images."
echo "Parameters:"
echo " 1. path to source document"
echo " 2. OPTIONAL: resolution (DPI) of destination images, default: 300"
echo " 3. OPTIONAL: file type [png|jpg], default: jpg"
echo " 4. OPTIONAL: path to destination folder, default: source name + '-images'"
exit 0
fi
if ! ( command -v gs &> /dev/null ); then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Ghostscript is not installed but needed."
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): You can install it via 'sudo apt-get install ghostscript'."
exit 1
fi
srcDocument="$(realpath "$1")"
if [ -f "$srcDocument" ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Got source document '$srcDocument'."
else
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Source document $srcDocument' does not exist."
exit 1
fi
dpi="${2:-}"
if [ -n "$dpi" ]; then
if [ "$dpi" -lt 1 ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination image resolution must be at least 1, but is '$dpi'."
else
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination image resolution is specified as '$dpi'."
fi
else
dpi="300"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Using default destination image resolution '$dpi'."
fi
outType="${3:-}"
if [ -n "$outType" ]; then
if [ "$outType" == "jpg" ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Will create JPEG images."
device="jpeg"
elif [ "$outType" == "jpeg" ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Will create JPEG images."
device="jpeg"
elif [ "$outType" == "png" ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Will create PNG images."
device="pngalpha"
else
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Output type '$outType' not supported."
exit 1
fi
else
outType="jpg"
device="jpeg"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Using default output type '$outType'."
fi
destFolder="${4:-}"
srcPattern="$(basename "$srcDocument")"
srcPattern="${srcPattern%.*}"
if [ -n "$destFolder" ]; then
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination folder '$destFolder' specified."
else
destFolder="${srcPattern}-images"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Using default destination folder '$destFolder'."
fi
destFolder="$(realpath "$destFolder")"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Creating destination folder '$destFolder'."
mkdir -p "$destFolder"
destPattern="${destFolder}/${srcPattern}-%05d.${outType}"
destPattern="$(realpath "$destPattern")"
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Destination file name pattern is '$destPattern'."
gs -dAntiAliasColorImages=true \
-dAntiAliasGrayImages=true \
-dAntiAliasMonoImages=true \
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
-dAutoRotatePages=/None \
-dBATCH \
-dColorConversionStrategy=/LeaveColorUnchanged \
-dCreateJobTicket=false \
-dDownsampleColorImages=false \
-dDownsampleGrayImages=false \
-dDownsampleMonoImages=false \
-dEPSCrop \
-dGraphicsAlphaBits=4 \
-dHaveTransparency=true \
-dMaxBitmap=2147483647 \
-dNOPAUSE \
-dNOPROMPT \
-dPassThroughJPEGImages=true \
-dPassThroughJPXImages=true \
-dPDFSTOPONERROR=true \
-dPDFSTOPONWARNING=true \
-dPrinted=false \
-dOmitInfoDate=true \
-dOmitID=true \
-dOmitXMP=true \
-dQUIET \
-dSAFER \
-dTextAlphaBits=4 \
-dUCRandBGInfo=/Remove \
-r${dpi}*${dpi} \
-sDEVICE="$device" \
-sOutputFile="$destPattern" \
"$srcDocument" \
-q
echo "$(date +'%0Y-%0m-%0d %0R:%0S'): Done converting '$srcDocument' to '$outType' images in folder '$destFolder'."