m (→Code) |
m (→Pattern substitutions) |
||
Line 22: | Line 22: | ||
<code lang='bash'>perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat</code> | <code lang='bash'>perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat</code> | ||
+ | |||
+ | === Sanitize CVS files === | ||
+ | |||
+ | The following will sanitize [http://en.wikipedia.org/wiki/Comma-separated_values CSV files] from trailing text (headers, comments on lines ''following'' the CSV, etc.): | ||
+ | |||
+ | <code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | ||
+ | |||
+ | This is a variation to keep all lines which have ''exactly'' 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line): | ||
+ | |||
+ | <code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | ||
+ | |||
+ | Be sure that you understand the script so that the sanitization goes to the depth you wish it to. | ||
== Pretty print == | == Pretty print == | ||
We use [http://www.mediawiki.org/wiki/User:Gri6507 Paul Grinberg]'s [http://www.mediawiki.org/wiki/Extension:Code Code extension] for [[Mediawiki]] to pretty-print source through [http://qbnz.com/highlighter/ GeSHi] on our web. | We use [http://www.mediawiki.org/wiki/User:Gri6507 Paul Grinberg]'s [http://www.mediawiki.org/wiki/Extension:Code Code extension] for [[Mediawiki]] to pretty-print source through [http://qbnz.com/highlighter/ GeSHi] on our web. |
{{{1}}}
Contents |
This page is still largely in progress.
This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.
Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.
See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)
E.g, 123,45 → 123.45. To change in all .dat files:
perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat
The following will sanitize CSV files from trailing text (headers, comments on lines following the CSV, etc.):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
Be sure that you understand the script so that the sanitization goes to the depth you wish it to.
We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.