# PDF

The "Portable Document Format" is almost good, in that if rather neatly achieves its crucial mission of providing a way to produce documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. It is not perfect but that works well enough. Nowadays, one even generates pdf directly from pdflatex and things like "dvipdf -sPAPERSIZE="a4"" are bad memories of the past.

## Tools

### pdftk

This is a great command-line tool to handle pdf.

• Typical uses
1. To remove sheets from a document: burst it, (remove) then reassemble
pdftk figures.pdf burst
pdftk *.pdf cat output combined.pdf


—If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:

Merge PDF Documents
Split PDF Pages into a New Document
Decrypt Input as Necessary (Password Required)
Encrypt Output as Desired
Fill PDF Forms with FDF Data and/or Flatten Forms
Apply a Background Watermark
Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
Attach Files to PDF Pages or the PDF Document
Unpack PDF Attachments
Burst a PDF Document into Single Pages
Uncompress and Re-Compress Page Streams
Repair Corrupted PDF (Where Possible)

Note that pdfshuffler does the same thing with a GUI, so this can be much more agreeable and/or convenient to use.

## Converting from a picture

ImageMagick is bad to do so. Use sam2p (apt-get).

## Converting to a picture

Convert works sometimes.

convert file.pdf fig.png


creating one image file per page of the pdf. Quality can be poor and transparency preserved, so these options can be used:

convert -alpha off  -density 150 file.pdf -quality 90 fig.png


Still some pages may not be exported well. This online tool works better.

To add a border:

 for f in *.png; do convert -border 2x2 -bordercolor black "$f" ""${f%%.*}"-border.png"; done


## Extracting from pdf

pdfimages file.pdf fig


will extract images from file.pdf as ppm (or else using an option) with fig as header. One file per image (possibly does not find all of them).

## Fonts

In particular, the problem of "Embedded fonts".

Font Embedding refers to the fact that fonts are part of the pdf document, so that no local copy is assumed from the machine where the document is processed (viewed, modified, etc.) In principle, fonts should be always embedded (although they tend to make the document larger). Typically, they are not. And then good luck to you to find them online...

Embedding only a subset of the fonts is used to embed only characters that are effectively used: that allows to see, but not to change (edit).

### Identify the missing fonts

• In acroread, File Menu > Document Properties > Fonts

### Programs not embedding

Old versions of Mathematica were exporting eps file without fonts embedded (namely the mathfonts). See [1] for some background. The remedy is to run the program emmathfnt. Now, dvipdf might complain but they are apparently embedded.

## Embedding animations

A possibility is to use pdflatex.

The source is (see <k href="file:///home/laussy/conf/2008/3--ICSCE4/3--animation">here</k>):

\documentclass{article}
\usepackage{hyperref}
\usepackage[pdftex]{graphicx}

\begin{document}

\title{Test animation}
\author{\href{http://laussy.org}{F.P. Laussy}}

\maketitle

\href{run:movie.mpeg}{\includegraphics{screenshot}}

\end{document}

Then run (in a console)
pdflatex anim.tex
. Clicking on the (here) image will run externally the animation.

## Reducing size

There are various tools to reduce the size of a pdf, including online. The simplest one command-lined based is:

ps2pdf input.pdf


which apparently doesn't affect the quality but strip down a lot of unnecessary content (forms, redundant material, not displayed, etc.)

Embedded fonts is a problem for the size as it can make a small document very bulky. It is very difficult to replace or remove an embedded font, although apparently this is something you can do with acroread (if available). Otherwise, LibreOffice Draw does something similar: open the pdf and export (as pdf) can reduces a lot the size (in some case changing the aspect badly by scrambling the font, but for exported Gmail pages for instance it works well).

Another way is to import the pdf page with Gimp, compress in jpg and then export the jpg bak into pdf. In this way you control the quality and can achieve drastic reduction. You may have to do this on a page-per-page basis, using the most suitable technique in all cases. In this way we managed to reduce a ~46MB file of 383 pages to less than 14MB as requested for its upload on some governmental server.