m (→Sanitize CVS files) |
m (→Sanitize CVS files) |
||
Line 33: | Line 33: | ||
<code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | <code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | ||
− | Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first | + | Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first line of code will keep lines in the CSV file with a number but no comma as a valid line [with one value]). |
== Pretty print == | == Pretty print == | ||
We use [http://www.mediawiki.org/wiki/User:Gri6507 Paul Grinberg]'s [http://www.mediawiki.org/wiki/Extension:Code Code extension] for [[Mediawiki]] to pretty-print source through [http://qbnz.com/highlighter/ GeSHi] on our web. | We use [http://www.mediawiki.org/wiki/User:Gri6507 Paul Grinberg]'s [http://www.mediawiki.org/wiki/Extension:Code Code extension] for [[Mediawiki]] to pretty-print source through [http://qbnz.com/highlighter/ GeSHi] on our web. |
{{{1}}}
Contents |
This page is still largely in progress.
This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.
Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.
See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)
E.g, 123,45 → 123.45. To change in all .dat files:
perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat
The following will sanitize all CSV files (here with extension .prf) from trailing text (headers, comments on lines following the CSV, etc.):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first line of code will keep lines in the CSV file with a number but no comma as a valid line [with one value]).
We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.