m (Sanitize CVS files)
m (Sanitize CVS files)
Line 25: Line 25:
 
=== Sanitize CVS files ===
 
=== Sanitize CVS files ===
  
The following will sanitize [http://en.wikipedia.org/wiki/Comma-separated_values CSV files] from trailing text (headers, comments on lines ''following'' the CSV, etc.):
+
The following will sanitize all [http://en.wikipedia.org/wiki/Comma-separated_values CSV files] (here with extension .prf) from trailing text (headers, comments on lines ''following'' the CSV, etc.):
  
 
<code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code>
 
<code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code>

Revision as of 16:35, 20 September 2012

{{{1}}}

Contents

Code

This page is still largely in progress.

This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.

Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.

  1. stampit — to stamp pdf files after their name.
  2. sanitize — to remove accents & special characters from filenames.
  3. putInDir — to move files inside directories bearing their name.
  4. uniqname — to generate a timestamp which can be used as a unique name.

Pattern substitutions

See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)

Replace comma-separated digits by their point-separated counterpart

E.g, 123,45 → 123.45. To change in all .dat files:

perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat

Sanitize CVS files

The following will sanitize all CSV files (here with extension .prf) from trailing text (headers, comments on lines following the CSV, etc.):

for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done

This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):

for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done

Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first script-line will keep lines in the CSV file with a number but no comma as a valid line with one value).

Pretty print

We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.