-
Notifications
You must be signed in to change notification settings - Fork 425
Download Alpine Tar Archive
The most recent releases have Alpine Tar Archives built on a recent Alpine releases.
These Alpine tar archives come as two parts:
- The tar archive itself which contains a
pdf2htmlEX
executable compiled inside an Alpine docker image (and hence usingmusl
). - A
/bin/sh
script which will install thepdf2htmlEX
Alpine dependencies using theapk
command, and then unpacks the associated tar archive into the/usr/local/bin
directory.
This option should only be chosen by someone who is comfortable using shell scripts in an Alpine Linux environment.
However, it can be used to build your own customized Alpine based docker image.
NOTE: at the moment, the statically linked FontForge and Poppler libraries are using the standard Alpine version of iconv
. Unfortunately, the Alpine version of iconv
is unable to deal with some 'standard' fonts, and so you might find these fonts are not transferred into the resulting html. See Compile Alpine version of pdf2htmlEX using gnu-iconv for more details and discussion.
Whem page images are stored as WebP in base64 format instead of PNG, the resulting PDF size is significantly reduced. If the images are called externally as WebP instead of embedding them as base64, the size is reduced by approximately 30% more. Below, I’m sharing an example BASH code block that converts PNGs to WebP and embeds the base64-encoded WebP images into all pages.
for img in /path/to/your/directory/bg*.png; do
# Extract the image filename without the extension (.png)
img_name=$(basename "$img" .png)
# Convert the .png image to .webp format with quality 75 and save it in the same directory
convert "$img" -quality 75 "/path/to/your/directory/$img_name.webp"
done
folder_path="/path/to/your/directory"
for file in "$folder_path"/*.page; do
if -f "$file"; then
# Extract the src URL of the image in the .page file and replace the .png extension with .webp
x=$(grep -oP 'src="\K[^"]+'
# Encode the .webp image file to base64 and save it to encode.txt
base64 /path/to/your/directory/$x > /path/to/your/directory/encode.txt
# Remove any newlines from the base64-encoded content and save to a temporary file
cat /path/to/your/directory/encode.txt | tr -d '\n' > /path/to/your/directory/temp_base64.txt
# Update the .page file to use the .webp extension instead of .png
sed -i 's/\(src="[^"]*\)\.png"/\1.webp"/g' "$file"
# Replace the image src in the .page file with the base64-encoded data URI for the .webp image
awk -v x="$x" 'NR==FNR{base64=$0; next} {gsub(x, "data:image/webp;base64," base64)}1' /path/to/your/directory/temp_base64.txt $file > /path/to/your/directory/temp.page && mv /path/to/your/directory/temp.page $file
fi done