m (→Sanitize CVS files) |
m (→Sanitize CVS files) |
||
Line 25: | Line 25: | ||
=== Sanitize CVS files === | === Sanitize CVS files === | ||
− | The following will sanitize [http://en.wikipedia.org/wiki/Comma-separated_values CSV files] from trailing text (headers, comments on lines ''following'' the CSV, etc.): | + | The following will sanitize all [http://en.wikipedia.org/wiki/Comma-separated_values CSV files] (here with extension .prf) from trailing text (headers, comments on lines ''following'' the CSV, etc.): |
<code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> | <code lang='bash'>for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done</code> |
{{{1}}}
Contents |
This page is still largely in progress.
This is a list of code we make available with no guarantee, beside the one that it did once work for its intended purpose.
Beware, version below one (e.g., v°0.1) are $\beta$-version. It might be that's all you find here.
See Jukka “Yucca” Korpela's cheatsheet for regexps (archived)
E.g, 123,45 → 123.45. To change in all .dat files:
perl -pi -w -e 's/(\d+),(\d+)/$1\.$2/g;' *dat
The following will sanitize all CSV files (here with extension .prf) from trailing text (headers, comments on lines following the CSV, etc.):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^([ \t]*([-+]?\d*\.?\d+([eE][-+]?\d+)?,)*[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
This is a variation to keep all lines which have exactly 2 values (replace {1} by {$n-1$} to have exactly $n$ values per line):
for f in *.prf; do cat "$f" | perl -ne 'print "$1\n" if /^(([ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?,){1}[ \t]*[-+]?\d*\.?\d+([eE][-+]?\d+)?)/' > $f.dat ; done
Be sure that you understand the script so that the sanitization goes to the depth you wish it to (for instance in its present form the first script-line will keep lines in the CSV file with a number but no comma as a valid line with one value).
We use Paul Grinberg's Code extension for Mediawiki to pretty-print source through GeSHi on our web.