awk
is a great tool for working with columnar data a line at a time,
but the langauge itself is a bit limited. Usually when it's not quite
powerful enough, I turn to perl
. But... choice is good. Enter
tawk
, which uses tcl
as the scripting language. It's designed to
be very familiar to anyone coming from an awk
background.
Much like awk
:
tawk [OPTIONS] ['script'] [var=value | filename] ...
Reads from standard input if no filenames are given on command line.
Dependencies are tcl 8.6, and tcllib. Copy the tawk
script to
/usr/local/bin
or wherever - it's a single, self-contained script.
-F regexp
Sets the field seperator (FS
).-f filename
Read the script from the given file instead of it being the first non-option command line argument.-safe
Run the script in a safe tcl interpreter. Meant for untrusted code.-timeout N
Exit with an error if a script takes more thanN
seconds to complete.-csv
Turn on CSV line parsing. Prefer this over settingFS
to a comma.-quotechar C
Use the given character instead of double quote for quoted CSV fields.-quoteall
Always quote every CSV field when printing.
tawk
adds the following commands on top of basic tcl
:
BEGIN script
Executed at the beginning of processing, before any data.END script
Executed at the end, after processing all data.BEGINFILE script
Executed at the beginning of each file.ENDFILE script
Executed at the end of reading each file.line script
Executed for every line read.line test script
Iftest
returns true when evaluated byexpr
, execute the script.rline [-field N] re script
If the regular expressionre
matches the line, (Or the specified field), execute the script.
print [arg ...]
Print out all its arguments joined by$OFS
, or$F(0)
if called with no arguments.csv_join arglist [delim] [quotechar] [quotemode]
Return the list joined into a CSV-formatted string.csv_split string [delim] [quotechar]
Split a CSV-formatted string into a list.
continue
stops processing the current line and goes to the next. Likenext
in awk.break
stops processing the current file and goes on to the next.
Most of these are lifted straight from awk
names.
F
An array holding the columns of the line.$F(0)
is the whole line. Setting a new element aboveNF
fills in the missing interval with empty strings. SettingF(0)
rebuilds the rest of the array based on splitting the new value.NF
The number of fields in the current line. Modifying this adjustsF
.NR
The current line number.FNR
The line number of the current file.FILENAME
The name of the current file,-
for standard input.INFILE
The file handle of the current file. Only set inBEGINFILE
,line
andrline
blocks.FS
If set, a single character, or regular expression that is used to indicate field delimiters. If a a single space, or not set, any amount of whitespace is used, and leading and trailing whitespace is first stripped. If an empty string, splits every character into its own field. Can only be a single character in CSV mode.OFS
Used to separate fields inF(0)
when other elements ofF
are written to orNF
is changed. Also used to seperate arguments ofprint
. Can only be a single character in CSV mode.CSV
1 if in CSV mode, 0 if in normal mode. (Read-only)CSVQUOTECHAR
when in CSV mode, the character used to quote fields. Set by the-quotechar
option. Defaults to a double quote.CSVQUOTE
Set toalways
to always quote CSV fields (Turned on by the-quoteall
argument), orauto
to only quote when needed. Attempting to set other values raises an error. Defaults to auto.
If invoked with the -csv
option, the output field separator (OFS
)
is set to comma instead of a space, and print
joins its arguments
with CSV escaping.
When reading fields, the default field separator (FS
) if not
explicitly set is a comma, and only single-character separators are
supported. Lines are split by a CSV-aware parser - so commas in quoted
fields don't count, unlike if just setting FS
to a comma in normal
mode. The CSVQUOTECHAR
variable controls the character used to quote
fields (Defaults to double quote, set by the -quotechar
option.)
Also, the print
command CSV-escapes its arguments, and gets
reads
a full CSV record, which may be multiple lines.