m (Sanitize CVS files)
m (Pattern substitutions)
Line 15: Line 15:
 
== Pattern substitutions ==
 
== Pattern substitutions ==
  
See [http://www.cs.tut.fi/~jkorpela/perl/regexp.html Jukka “Yucca” Korpela]'s cheatsheet for regexps ([[Media:Regular_expressions_in_Perl_-_a_summary_with_examples.war|archived]])
+
See [http://goo.gl/C7Wsp Jukka “Yucca” Korpela]'s cheatsheet for regexps ([[Media:Regular_expressions_in_Perl_-_a_summary_with_examples.war|archived]])
  
 
=== Replace comma-separated digits by their point-separated counterpart ===
 
=== Replace comma-separated digits by their point-separated counterpart ===

Revision as of 16:09, 25 September 2012

{{{1}}}

Contents

Code

This page is still largely in progress.

This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.

Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.

  1. stampit — to stamp pdf files after their name.
  2. sanitize — to remove accents & special characters from filenames.
  3. putInDir — to move files inside directories bearing their name.
  4. uniqname — to generate a timestamp which can be used as a unique name.

Pattern substitutions

See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)

Replace comma-separated digits by their point-separated counterpart

E.g, 123,45 → 123.45. To change in all .dat files:

perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat

Sanitize CVS files

The following will sanitize all CSV files (here with extension .prf) from trailing text (headers, comments on lines following the CSV, etc.):

for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done

This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):

for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done

Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first line of code will keep lines in the CSV file with a number but no comma as a valid line [with one value]).

Pretty print

We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.