(Created page with "'''sanitize''' is a bash script to replace special and accented characters in a filename to their best-match in the ASCII code. See [[Blog:Notes/Removing_recursively_special...") |
|||
Line 75: | Line 75: | ||
#begin transliteration table: | #begin transliteration table: | ||
s@ @_@g | s@ @_@g | ||
+ | s@Á@A@g | ||
+ | s@Æ@AE@g | ||
s@Ê@E@g | s@Ê@E@g | ||
s@É@E@g | s@É@E@g | ||
Line 82: | Line 84: | ||
s@Ù@U@g | s@Ù@U@g | ||
s@Ú@U@g | s@Ú@U@g | ||
− | |||
− | |||
s@Ñ@N@g | s@Ñ@N@g | ||
s@\o323@O@g | s@\o323@O@g | ||
− | s@ | + | s@à@a@g |
+ | s@æ@ae@g | ||
+ | s@á@a@g | ||
s@ê@e@g | s@ê@e@g | ||
s@é@e@g | s@é@e@g | ||
s@è@e@g | s@è@e@g | ||
− | |||
s@ë@e@g | s@ë@e@g | ||
− | s@ | + | s@ì@i@g |
− | s@ | + | s@ñ@n@g |
+ | s@ó@o@g | ||
s@ú@u@g | s@ú@u@g | ||
s@\o350@e@g | s@\o350@e@g | ||
s@\o351@e@g | s@\o351@e@g | ||
+ | s@\o353@e@g | ||
s@\o364@o@g | s@\o364@o@g | ||
s@\o363@o@g | s@\o363@o@g | ||
Line 123: | Line 126: | ||
done | done | ||
</pre> | </pre> | ||
+ | |||
+ | = History = | ||
+ | |||
+ | * [[5 June|5]], [[June (2011)|June]] [[2011|(2011)]] First version. |
sanitize is a bash script to replace special and accented characters in a filename to their best-match in the ASCII code.
See a blog post for a discussion of the necessity and merit of this task.
Contents |
Make a file sanitize with the source below executable and in the directory where files are to be fixed, run:
for f in *; do ./sanitize "$f"; done
If you are happy with the output, uncomment the mv line. If required, extend the transliteration table:
s@XXX@YYY@g
where XXX will be replaced by YYY, e.g.,
s@æ@ae@g
Octal code are possible. Use "ls -b" to figure out which they are.
Also make the run-sanitize file (source below) and run instead:
find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\"" \;
#!/bin/bash # ____ _ _ _ # / ___| __ _ _ __ (_) |_(_)_______ # \___ \ / _` | '_ \| | __| |_ / _ \ # ___) | (_| | | | | | |_| |/ / __/ # |____/ \__,_|_| |_|_|\__|_/___\___| # # sanitize v0.1 # FP Laussy -- fabrice.laussy@gmail.com # http://laussy.org # Sun Jun 5 17:20:50 CEST 2011 # (building on TeX+ :) # # This script remove special characters in filenames # according to a transliteration table given below. # # Usage: # Caution: this is potentially harmful! # Use only if you know what you are doing. # # To use in the files within the same directory: # # for f in *; do ./sanitize "$f"; done # # To go recursively through subdirectories: # # find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\"" \; # # where run-sanitize is provided separately. (it's essentially the # command above put in a script). sanitized=`echo $1 | sed ' /^%/d #begin transliteration table: s@ @_@g s@Á@A@g s@Æ@AE@g s@Ê@E@g s@É@E@g s@Ë@E@g s@Ì@I@g s@Ý@Y@g s@Ù@U@g s@Ú@U@g s@Ñ@N@g s@\o323@O@g s@à@a@g s@æ@ae@g s@á@a@g s@ê@e@g s@é@e@g s@è@e@g s@ë@e@g s@ì@i@g s@ñ@n@g s@ó@o@g s@ú@u@g s@\o350@e@g s@\o351@e@g s@\o353@e@g s@\o364@o@g s@\o363@o@g s@\o361@n@g s@\[@(@g s@\]@)@g #end transliteration table '` if [[ $1 != $sanitized ]] then echo $1 "-->" $sanitized #mv "`pwd`/$1" "`pwd`/$sanitized" fi
To be used for propagating through subdirectories.
#!/bin/bash #echo `pwd` for f in $*; do ./sanitize "$f" done