(Created page with "'''sanitize''' is a bash script to replace special and accented characters in a filename to their best-match in the ASCII code. See [[Blog:Notes/Removing_recursively_special...")
 
Line 75: Line 75:
 
#begin transliteration table:  
 
#begin transliteration table:  
 
s@ @_@g
 
s@ @_@g
 +
s@Á@A@g
 +
s@Æ@AE@g
 
s@Ê@E@g  
 
s@Ê@E@g  
 
s@É@E@g  
 
s@É@E@g  
Line 82: Line 84:
 
s@Ù@U@g  
 
s@Ù@U@g  
 
s@Ú@U@g  
 
s@Ú@U@g  
s@Á@A@g
 
s@Æ@AE@g
 
 
s@Ñ@N@g
 
s@Ñ@N@g
 
s@\o323@O@g
 
s@\o323@O@g
s@ñ@n@g
+
s@à@a@g
 +
s@æ@ae@g
 +
s@á@a@g  
 
s@ê@e@g  
 
s@ê@e@g  
 
s@é@e@g  
 
s@é@e@g  
 
s@è@e@g  
 
s@è@e@g  
s@à@a@g
 
 
s@ë@e@g  
 
s@ë@e@g  
s@á@a@g  
+
s@ì@i@g  
s@æ@ae@g  
+
s@ñ@n@g
 +
s@ó@o@g
 
s@ú@u@g  
 
s@ú@u@g  
 
s@\o350@e@g  
 
s@\o350@e@g  
 
s@\o351@e@g
 
s@\o351@e@g
 +
s@\o353@e@g
 
s@\o364@o@g
 
s@\o364@o@g
 
s@\o363@o@g
 
s@\o363@o@g
Line 123: Line 126:
 
done
 
done
 
</pre>
 
</pre>
 +
 +
= History =
 +
 +
* [[5 June|5]], [[June (2011)|June]]&nbsp;[[2011|(2011)]] First version.

Revision as of 20:56, 5 June 2011

sanitize is a bash script to replace special and accented characters in a filename to their best-match in the ASCII code.

See a blog post for a discussion of the necessity and merit of this task.

Contents

Usage

Files in a given directory

Make a file sanitize with the source below executable and in the directory where files are to be fixed, run:

for f in *; do ./sanitize "$f"; done

If you are happy with the output, uncomment the mv line. If required, extend the transliteration table:

s@XXX@YYY@g 

where XXX will be replaced by YYY, e.g.,

s@æ@ae@g 

Octal code are possible. Use "ls -b" to figure out which they are.

All files in all subdirectories

Also make the run-sanitize file (source below) and run instead:

find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\""  \;

Source

sanitize

#!/bin/bash
#  ____              _ _   _         
# / ___|  __ _ _ __ (_) |_(_)_______ 
# \___ \ / _` | '_ \| | __| |_  / _ \
#  ___) | (_| | | | | | |_| |/ /  __/
# |____/ \__,_|_| |_|_|\__|_/___\___|
#                                    
# sanitize v0.1
# FP Laussy -- fabrice.laussy@gmail.com
# http://laussy.org
# Sun Jun  5 17:20:50 CEST 2011
# (building on TeX+ :)
#
# This script remove special characters in filenames
# according to a transliteration table given below.
# 
# Usage:
# Caution: this is potentially harmful!
# Use only if you know what you are doing.
#
# To use in the files within the same directory:
#
#    for f in *; do ./sanitize "$f"; done
#
# To go recursively through subdirectories:
#
#    find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\""  \;
#
# where run-sanitize is provided separately.  (it's essentially the
# command above put in a script).

sanitized=`echo $1 | sed ' 
/^%/d 
#begin transliteration table: 
s@ @_@g
s@Á@A@g 
s@Æ@AE@g 
s@Ê@E@g 
s@É@E@g 
s@Ë@E@g 
s@Ì@I@g 
s@Ý@Y@g 
s@Ù@U@g 
s@Ú@U@g 
s@Ñ@N@g
s@\o323@O@g
s@à@a@g 
s@æ@ae@g 
s@á@a@g 
s@ê@e@g 
s@é@e@g 
s@è@e@g 
s@ë@e@g 
s@ì@i@g 
s@ñ@n@g
s@ó@o@g
s@ú@u@g 
s@\o350@e@g 
s@\o351@e@g
s@\o353@e@g
s@\o364@o@g
s@\o363@o@g
s@\o361@n@g
s@\[@(@g
s@\]@)@g
#end transliteration table 
'`

if [[ $1 != $sanitized ]]
then
echo $1 "-->" $sanitized
#mv "`pwd`/$1" "`pwd`/$sanitized"
fi

run-sanitize

To be used for propagating through subdirectories.

#!/bin/bash
#echo `pwd`
for f in $*; do
    ./sanitize "$f"
done

History