title | author | date | lang |
---|---|---|---|
Replacing Spreadsheets - POSIX text utilities |
CSC Training |
2019-12 |
en |
- Your shell has built-in spread-sheet functions
- You can find/extract/combine text row or column-wise
merging lines of several input files
- syntax:
paste [-d del -s] file1 file2 [file3 ...]
- `-d ` insert different delimiter *del* (between merged lines) than tabulator (default)
- `-s` sequential in case of more than two files: `[(file1 + file2) + file3] + file4`
- lets try the following:
$ paste count.txt sheep.txt > counting_sheep_tab.txt # creates merged file with tabulators
$ paste -d ' ' count.txt sheep.txt > counting_sheep_tab.txt # creates merged file with space as delimiter
extracting fields/columns from each line of files
- syntax:
cut [-d del -f no -s] file1 file2 ...
- `-d del` use different delimiter *del* (to identify fields) than tabulator (default)
- `-f no` select fields *no*
- `-s` skip lines not containing delimiters (e.g., header lines)
- lets try the following:
$ cut -f 1 counting_sheep_tab.txt
$ cut -f 1 -d ' ' counting_sheep_space.txt
- both will display the original content of count.txt
- counting the lines, the words as well as the characters or bytes in a file (
wc
stands for word count):
wc [-l -w -m -c] file1 [file2 ...]
- `-l` count lines
- `-w` count words
- `-m` count characters
- `-c` count bytes
- without arguments displays lines, words, and byte-counts (as `-l -w -c`)
- a word is a non-zero-length sequence of characters delimited by white space
$ wc -l sheep_space.txt
- concatenates files and prints to stdout
cat [-n -E -v -T] file1 file2 ...
- `-n` numbering output lines (e.g., source-code listing)
- `-E` indicate ends with a $
- `-v` show non-printing
- `-T` indicate tabs
- numbers the lines in `sheep_space.txt` and adds the column
$ cat -n sheep_space.txt > sheep_lines.txt
$ cat -T -E sheep_tab.txt
- extracting head of files
head [-n N] file1 [file2 ...]
- `-n N` display *N* first lines
- extracting tail of files
tail [-n N -f --pid PID] file1 [file2 ...]
- `-n N` display *N* last lines
- `-f` continuously display updates of file (useful to display log-files)
- `--pid PID` terminate tail-command in sync with termination of process with process ID *PID*
- sort lines of text files (alphabetical or numerical)
sort [-d -f -g ] file1 [file2 ...]
- `-d` dictionary (alphanumeric) order
- `-f` ignore upper/lower case
- `-g` general numeric
- Spot the difference:
```bash
$ sort -d sheep_space.txt
$ sort -g sheep_lines.txt
```
- filter adjacent matching (redundant) files
uniq [-c -f -s -w ] file1 [file2 ...]
- `-c` prefix lines by number of their occurrence
- `-f N` avoid comparing the first *N* fields
- `-s N` avoid comparing the first *N* characters
- `-w N` compare not more than *N* characters/line
- skips the first column (the previously inserted numbers) and matches in max. 10 characters (i.e., avoiding the later columns) and prefixes the number of occurrence (Hint: try with –f 2)
```bash
$ uniq -c -f 1 -w 10 sheep_lines.txt
```