m (→Pattern substitutions) |
m (→Sanitize CVS files) |
||
Line 33: | Line 33: | ||
<code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | <code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | ||
− | Be sure that you understand the script so that the sanitization goes to the depth you wish it to. | + | Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first script-line will keep lines in the CSV file with a number but no comma as a valid line with one value). |
== Pretty print == | == Pretty print == | ||
We use [http://www.mediawiki.org/wiki/User:Gri6507 Paul Grinberg]'s [http://www.mediawiki.org/wiki/Extension:Code Code extension] for [[Mediawiki]] to pretty-print source through [http://qbnz.com/highlighter/ GeSHi] on our web. | We use [http://www.mediawiki.org/wiki/User:Gri6507 Paul Grinberg]'s [http://www.mediawiki.org/wiki/Extension:Code Code extension] for [[Mediawiki]] to pretty-print source through [http://qbnz.com/highlighter/ GeSHi] on our web. |
{{{1}}}
Contents |
This page is still largely in progress.
This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.
Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.
See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)
E.g, 123,45 → 123.45. To change in all .dat files:
perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat
The following will sanitize CSV files from trailing text (headers, comments on lines following the CSV, etc.):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first script-line will keep lines in the CSV file with a number but no comma as a valid line with one value).
We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.