{{{1}}}
Contents |
sanitize is a bash script to replace special and accented characters in a filename to their best-match in the ASCII code.
See a blog post for a discussion of the necessity and merit of this task.
Create a file called sanitize with the source below, make it executable and in the directory where files are to be sanitized, run:
for f in *; do ./sanitize "$f"; done
If you are happy with the output, uncomment the mv line. If required, extend the transliteration table:
s@XXX@YYY@g
where XXX will be replaced by YYY, e.g.,
s@æ@ae@g
Octal code are possible. Use "ls -b" to figure out which they are.
Also make the run-sanitize file (source below) and run instead:
find . -type d -exec sh -c "cd \"{}\" && run-sanitize \"*\"" \;
If you want to use the script to change not only the filename but also the name of directories, you can use the following trick (to put/replace in the Sanitize script):
mkdir -p "`dirname $sanitized`"
cp $1 $sanitized
This should be run with something like:
find . -type f -exec sanitize {} \;
What it does is to recreate the directory tree, sanitized according to your transliteration table, and copy the (also sanitized) files within. If you are happy with the result, you can then delete the original structure (not done by the script itself for security).
#!/bin/bash
/^%/d
s@ @_@g s@Á@A@g s@Æ@AE@g s@Ê@E@g s@É@E@g s@Ë@E@g s@Ì@I@g s@Ý@Y@g s@Ù@U@g s@Ú@U@g s@Ñ@N@g s@\o323@O@g s@à@a@g s@æ@ae@g s@á@a@g s@ê@e@g s@é@e@g s@è@e@g s@ë@e@g s@ì@i@g s@ñ@n@g s@ó@o@g s@ú@u@g s@\o350@e@g s@\o351@e@g s@\o353@e@g s@\o364@o@g s@\o363@o@g s@\o361@n@g s@\[@(@g s@\]@)@g
'`
if [[ $1 != $sanitized ]] then echo $1 "-->" $sanitized
fi
To be used for propagating through subdirectories. The script "sanitize" must then be callable from anywhere (put it in /usr/local/bin for instance)
#!/bin/bash
for f in $*; do sanitize "$f" done