(Usage)
Line 34: Line 34:
 
find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\""  \;
 
find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\""  \;
 
</pre>
 
</pre>
 +
 +
== Directories ==
 +
 +
If you want to use the script to change not only the filename but also the name of directories, you can use the following trick:
 +
 +
<pre>
 +
mkdir -p "`dirname $sanitized`"
 +
cp $1 $sanitized
 +
</pre>
 +
 +
What it does is to recreate the directory tree, but sanitized according to your transliteration table, and copy the sanitized files within. If you are happy with the result, you can then delete the original structure (not done by the script itself for security).
  
 
= Source =
 
= Source =

Revision as of 10:05, 15 July 2011

sanitize is a bash script to replace special and accented characters in a filename to their best-match in the ASCII code.

See a blog post for a discussion of the necessity and merit of this task.

Contents

Usage

Files in a given directory

Make a file sanitize with the source below executable and in the directory where files are to be fixed, run:

for f in *; do ./sanitize "$f"; done

If you are happy with the output, uncomment the mv line. If required, extend the transliteration table:

s@XXX@YYY@g 

where XXX will be replaced by YYY, e.g.,

s@æ@ae@g 

Octal code are possible. Use "ls -b" to figure out which they are.

All files in all subdirectories

Also make the run-sanitize file (source below) and run instead:

find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\""  \;

Directories

If you want to use the script to change not only the filename but also the name of directories, you can use the following trick:

mkdir -p "`dirname $sanitized`"
cp $1 $sanitized

What it does is to recreate the directory tree, but sanitized according to your transliteration table, and copy the sanitized files within. If you are happy with the result, you can then delete the original structure (not done by the script itself for security).

Source

sanitize

#!/bin/bash
#  ____              _ _   _         
# / ___|  __ _ _ __ (_) |_(_)_______ 
# \___ \ / _` | '_ \| | __| |_  / _ \
#  ___) | (_| | | | | | |_| |/ /  __/
# |____/ \__,_|_| |_|_|\__|_/___\___|
#                                    
# sanitize v0.1
# FP Laussy -- fabrice.laussy@gmail.com
# http://laussy.org
# Sun Jun  5 17:20:50 CEST 2011
# (building on TeX+ :)
#
# This script remove special characters in filenames
# according to a transliteration table given below.
# 
# Usage:
# Caution: this is potentially harmful!
# Use only if you know what you are doing.
#
# To use in the files within the same directory:
#
#    for f in *; do ./sanitize "$f"; done
#
# To go recursively through subdirectories:
#
#    find . -type d -exec sh -c "cd \"{}\" && ./run-sanitize \"*\""  \;
#
# where run-sanitize is provided separately.  (it's essentially the
# command above put in a script).

sanitized=`echo $1 | sed ' 
/^%/d 
#begin transliteration table: 
s@ @_@g
s@Á@A@g 
s@Æ@AE@g 
s@Ê@E@g 
s@É@E@g 
s@Ë@E@g 
s@Ì@I@g 
s@Ý@Y@g 
s@Ù@U@g 
s@Ú@U@g 
s@Ñ@N@g
s@\o323@O@g
s@à@a@g 
s@æ@ae@g 
s@á@a@g 
s@ê@e@g 
s@é@e@g 
s@è@e@g 
s@ë@e@g 
s@ì@i@g 
s@ñ@n@g
s@ó@o@g
s@ú@u@g 
s@\o350@e@g 
s@\o351@e@g
s@\o353@e@g
s@\o364@o@g
s@\o363@o@g
s@\o361@n@g
s@\[@(@g
s@\]@)@g
#end transliteration table 
'`

if [[ $1 != $sanitized ]]
then
echo $1 "-->" $sanitized
#mv "`pwd`/$1" "`pwd`/$sanitized"
fi

run-sanitize

To be used for propagating through subdirectories.

#!/bin/bash
#echo `pwd`
for f in $*; do
    ./sanitize "$f"
done

History