m (Converting to a picture)
m
 
Line 2: Line 2:
  
 
The "Portable Document Format" is almost good, in that if rather neatly achieves its crucial mission of providing a way to produce ''documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems''. It is not perfect but that works well enough. Nowadays, one even generates pdf directly from pdflatex and things like "dvipdf -sPAPERSIZE="a4"" are bad memories of the past.
 
The "Portable Document Format" is almost good, in that if rather neatly achieves its crucial mission of providing a way to produce ''documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems''. It is not perfect but that works well enough. Nowadays, one even generates pdf directly from pdflatex and things like "dvipdf -sPAPERSIZE="a4"" are bad memories of the past.
 +
 +
== Tools ==
 +
 +
=== pdftk ===
 +
 +
This is a great command-line tool to handle pdf.
 +
 +
* Typical uses
 +
 +
# To remove sheets from a document: burst it, (remove) then reassemble
 +
 +
<pre>
 +
pdftk figures.pdf burst
 +
pdftk *.pdf cat output combined.pdf
 +
</pre>
 +
 +
{{Quote|[http://www.accesspdf.com/pdftk/ pdftk]|If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:
 +
 +
<poem>
 +
Merge PDF Documents
 +
Split PDF Pages into a New Document
 +
Decrypt Input as Necessary (Password Required)
 +
Encrypt Output as Desired
 +
Fill PDF Forms with FDF Data and/or Flatten Forms
 +
Apply a Background Watermark
 +
Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
 +
Update PDF Metadata
 +
Attach Files to PDF Pages or the PDF Document
 +
Unpack PDF Attachments
 +
Burst a PDF Document into Single Pages
 +
Uncompress and Re-Compress Page Streams
 +
Repair Corrupted PDF (Where Possible)
 +
</poem>}}
 +
 +
Note that [https://apps.ubuntu.com/cat/applications/precise/pdfshuffler/ pdfshuffler] does the same thing with a GUI, so this can be much more agreeable and/or convenient to use.
 +
  
 
== Converting from a picture ==
 
== Converting from a picture ==
Line 48: Line 84:
  
 
Old versions of [[Mathematica]] were exporting eps file without fonts embedded (namely the mathfonts). See [http://library.wolfram.com/infocenter/MathSource/628/] for some background. The remedy is to run the program ''emmathfnt''. Now, dvipdf might complain but they are apparently embedded.
 
Old versions of [[Mathematica]] were exporting eps file without fonts embedded (namely the mathfonts). See [http://library.wolfram.com/infocenter/MathSource/628/] for some background. The remedy is to run the program ''emmathfnt''. Now, dvipdf might complain but they are apparently embedded.
 
== Tools ==
 
 
=== pdftk ===
 
 
This is a great command-line tool to handle pdf.
 
 
* Typical uses
 
 
# To remove sheets from a document: burst it, (remove) then reassemble
 
 
<pre>
 
pdftk figures.pdf burst
 
pdftk *.pdf cat output combined.pdf
 
</pre>
 
 
{{Quote|[http://www.accesspdf.com/pdftk/ pdftk]|If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:
 
 
<poem>
 
Merge PDF Documents
 
Split PDF Pages into a New Document
 
Decrypt Input as Necessary (Password Required)
 
Encrypt Output as Desired
 
Fill PDF Forms with FDF Data and/or Flatten Forms
 
Apply a Background Watermark
 
Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
 
Update PDF Metadata
 
Attach Files to PDF Pages or the PDF Document
 
Unpack PDF Attachments
 
Burst a PDF Document into Single Pages
 
Uncompress and Re-Compress Page Streams
 
Repair Corrupted PDF (Where Possible)
 
</poem>}}
 
 
Note that [https://apps.ubuntu.com/cat/applications/precise/pdfshuffler/ pdfshuffler] does the same thing with a GUI, so this can be much more agreeable and/or convenient to use.
 
  
 
== Embedding animations ==
 
== Embedding animations ==
Line 108: Line 109:
  
 
Then run (in a console) <pre>pdflatex anim.tex</pre>. Clicking on the (here) image will run externally the animation.
 
Then run (in a console) <pre>pdflatex anim.tex</pre>. Clicking on the (here) image will run externally the animation.
 +
 +
== Reducing size ==
 +
 +
There are various tools to reduce the size of a pdf, including online. The simplest one command-lined based is:
 +
 +
<pre>
 +
ps2pdf input.pdf
 +
</pre>
 +
 +
which apparently doesn't affect the quality but strip down a lot of unnecessary content (forms, redundant material, not displayed, etc.)
 +
 +
Embedded fonts is a problem for the size as it can make a small document very bulky. It is very difficult to replace or remove an embedded font, although apparently this is something you can do with acroread (if available). Otherwise, LibreOffice Draw does something similar: open the pdf and export (as pdf) can reduces a lot the size (in some case changing the aspect badly by scrambling the font, but for exported [[Gmail]] pages for instance it works well).
 +
 +
Another way is to import the pdf page with [[Gimp]], compress in jpg and then export the jpg bak into pdf. In this way you control the quality and can achieve drastic reduction. You may have to do this on a page-per-page basis, using the most suitable technique in all cases. In this way we managed to reduce a ~46MB file of 383 pages to less than 14MB as requested for its upload on some governmental server.

Latest revision as of 07:36, 26 July 2020

Contents

PDF

The "Portable Document Format" is almost good, in that if rather neatly achieves its crucial mission of providing a way to produce documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. It is not perfect but that works well enough. Nowadays, one even generates pdf directly from pdflatex and things like "dvipdf -sPAPERSIZE="a4"" are bad memories of the past.

Tools

pdftk

This is a great command-line tool to handle pdf.

  • Typical uses
  1. To remove sheets from a document: burst it, (remove) then reassemble
pdftk figures.pdf burst
pdftk *.pdf cat output combined.pdf


pdftk
—If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a command-line tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to:

Merge PDF Documents
Split PDF Pages into a New Document
Decrypt Input as Necessary (Password Required)
Encrypt Output as Desired
Fill PDF Forms with FDF Data and/or Flatten Forms
Apply a Background Watermark
Report on PDF Metrics such as Metadata, Bookmarks, and Page Labels
Update PDF Metadata
Attach Files to PDF Pages or the PDF Document
Unpack PDF Attachments
Burst a PDF Document into Single Pages
Uncompress and Re-Compress Page Streams
Repair Corrupted PDF (Where Possible)

Note that pdfshuffler does the same thing with a GUI, so this can be much more agreeable and/or convenient to use.


Converting from a picture

ImageMagick is bad to do so. Use sam2p (apt-get).

Converting to a picture

Convert works sometimes.

convert file.pdf fig.png

creating one image file per page of the pdf. Quality can be poor and transparency preserved, so these options can be used:

convert -alpha off  -density 150 file.pdf -quality 90 fig.png

Still some pages may not be exported well. This online tool works better.

To add a border:

 for f in *.png; do convert -border 2x2 -bordercolor black "$f" ""${f%%.*}"-border.png"; done

Extracting from pdf

pdfimages file.pdf fig

will extract images from file.pdf as ppm (or else using an option) with fig as header. One file per image (possibly does not find all of them).

Fonts

In particular, the problem of "Embedded fonts".

Font Embedding refers to the fact that fonts are part of the pdf document, so that no local copy is assumed from the machine where the document is processed (viewed, modified, etc.) In principle, fonts should be always embedded (although they tend to make the document larger). Typically, they are not. And then good luck to you to find them online...

Embedding only a subset of the fonts is used to embed only characters that are effectively used: that allows to see, but not to change (edit).

Identify the missing fonts

  • In acroread, File Menu > Document Properties > Fonts

Programs not embedding

Old versions of Mathematica were exporting eps file without fonts embedded (namely the mathfonts). See [1] for some background. The remedy is to run the program emmathfnt. Now, dvipdf might complain but they are apparently embedded.

Embedding animations

A possibility is to use pdflatex.

The source is (see <k href="file:///home/laussy/conf/2008/3--ICSCE4/3--animation">here</k>):

\documentclass{article}
\usepackage{hyperref}
\usepackage[pdftex]{graphicx}

\begin{document}

\title{Test animation}
\author{\href{http://laussy.org}{F.P. Laussy}}

\maketitle

\href{run:movie.mpeg}{\includegraphics{screenshot}}

\end{document}
Then run (in a console)
pdflatex anim.tex
. Clicking on the (here) image will run externally the animation.

Reducing size

There are various tools to reduce the size of a pdf, including online. The simplest one command-lined based is:

ps2pdf input.pdf

which apparently doesn't affect the quality but strip down a lot of unnecessary content (forms, redundant material, not displayed, etc.)

Embedded fonts is a problem for the size as it can make a small document very bulky. It is very difficult to replace or remove an embedded font, although apparently this is something you can do with acroread (if available). Otherwise, LibreOffice Draw does something similar: open the pdf and export (as pdf) can reduces a lot the size (in some case changing the aspect badly by scrambling the font, but for exported Gmail pages for instance it works well).

Another way is to import the pdf page with Gimp, compress in jpg and then export the jpg bak into pdf. In this way you control the quality and can achieve drastic reduction. You may have to do this on a page-per-page basis, using the most suitable technique in all cases. In this way we managed to reduce a ~46MB file of 383 pages to less than 14MB as requested for its upload on some governmental server.