Removing trailing spaces for makeindex

⇠ Back to Blog:Hacks

Makeindex, $\mathrm{\LaTeX}$'s historical index-building engine, is irritatingly buggy. One recurrent problem that plagues the output from careless index makers is that of duplicated entries:

Spectacle.E26420.png

A sore to the eye! The problem is that makeindex sees

in a paragraph speaking of \index{second quantisation} the
fact that, in your source code, you may have \index{second
quantisation} as a result of indentation,

as two different entries of "second quantisation". The line-breaking on the 2nd line, due to indentation (e.g., M-q with Emacs) causes it to appear in the .ind file as:

  \item second   quantisation, 113, 129
  \item second quantisation, \textbf{111}, 112, 113, 115

For makeindex to work, you must write in the argument of \index{} the exact same thing. That is to say, you should write \index{second~quantisation} everywhere. If you didn't know beforehand, that's a nasty situation.

I used to use the following hackish line of code to replace anything inside an \index{} by something which spaces are made one and unbreakable:

cat chap5.tex | sed -e ':begin;$!N;s/\\index{\(\S*\)\s*\n\s*\(\S*\)}/\\index{\1~\2}/g'

However this is not very robust and meddles with lines that it shouldn't, not in a way that breaks them but for the sake of safety, I tried to hack my way out with a more robust script, and that's what I came up with, fix-makeindex.sh:

#!/bin/bash

mkdir -p fixed-index
perl -pe 's/\n/☠/g;' $1 > $1-nobr
perl -pe 's{(index\{.*?\})}{$1 =~ s/[☠\s]+/~/dr}gex' $1-nobr > $1-pass1
perl -pe 's{(index\{.*?\})}{$1 =~ s/[☠\s]+/~/dr}gex' $1-pass1 > $1-pass2
perl -pe 's/☠/\n/g;' $1-pass2 > fixed-index/$1
rm $1-nobr $1-pass1 $1-pass2

The idea is rudimentary: i) remove line breaks, ii) parse the huge line that your file became to replace spaces and/or linebreak within the {} of an \index declaration by a single non-breaking~one, iii) do this twice as spaces after a linebreak don't get parsed otherwise and iv) replace linebreaks (that were stored in an unlikely character to be present in the text; I chose a ☠).

Then you can happily sanitize:

grep -a \index *tex | cut -d':' -f1 | sort -u > list-of-files-to-fix
for file in $(cat list-of-files-to-fix); do ./fix-makeindex.sh $file; done

A check is recommended:

fldiff chap5.tex fixindex-chap5.tex
Spectacle.Lh6744.png

After running makeindex, you still may have to get rid of its random choice for using ~ or a \nobreakspace    {} variation.

makeindex Microcavities
sed -i 's/\\nobreakspace\s\+{}/~/g' Microcavities.idx
makeindex Microcavities.idx

The last line generates your .ind file.