m (→Sanitize CVS files) |
m (→Pattern substitutions) |
||
Line 15: | Line 15: | ||
== Pattern substitutions == | == Pattern substitutions == | ||
− | See [http:// | + | See [http://goo.gl/C7Wsp Jukka “Yucca” Korpela]'s cheatsheet for regexps ([[Media:Regular_expressions_in_Perl_-_a_summary_with_examples.war|archived]]) |
=== Replace comma-separated digits by their point-separated counterpart === | === Replace comma-separated digits by their point-separated counterpart === |
{{{1}}}
Contents |
This page is still largely in progress.
This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.
Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.
See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)
E.g, 123,45 → 123.45. To change in all .dat files:
perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat
The following will sanitize all CSV files (here with extension .prf) from trailing text (headers, comments on lines following the CSV, etc.):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first line of code will keep lines in the CSV file with a number but no comma as a valid line [with one value]).
We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.