title	author	date	lang
Replacing Spreadsheets - POSIX text utilities	CSC Training	2019-12	en

What we will work over

Your shell has built-in spread-sheet functions
You can find/extract/combine text row or column-wise

Adding files side-by-side: paste

merging lines of several input files

syntax:

paste [-d del -s] file1 file2 [file3 ...]

- `-d ` insert different delimiter *del* (between merged lines) than tabulator (default)
- `-s` sequential in case of more than two files: `[(file1 + file2) + file3] + file4`

lets try the following:

$ paste count.txt sheep.txt > counting_sheep_tab.txt         # creates merged file with tabulators
$ paste -d ' ' count.txt sheep.txt > counting_sheep_tab.txt  # creates merged file with space as delimiter

Trimming files: cut

extracting fields/columns from each line of files

syntax:

cut [-d del -f no -s] file1 file2 ...

- `-d del` use different delimiter *del* (to identify fields) than tabulator (default)
- `-f no` select fields *no* 
- `-s` skip lines not containing delimiters (e.g., header lines)

lets try the following:

$ cut -f 1 counting_sheep_tab.txt
$ cut -f 1 -d ' ' counting_sheep_space.txt

- both will display the original content of count.txt

Counting lines [and sheep]: wc

counting the lines, the words as well as the characters or bytes in a file (wc stands for word count):

wc [-l -w -m -c] file1 [file2 ...]

- `-l` count lines
- `-w` count words
- `-m` count characters
- `-c` count bytes
- without arguments displays lines, words, and byte-counts (as `-l -w -c`)
   - a word is a non-zero-length sequence of characters delimited by white space

$ wc -l sheep_space.txt

Combining files end to start: cat

concatenates files and prints to stdout

cat [-n -E -v -T] file1 file2 ...

- `-n` numbering output lines (e.g., source-code listing)
- `-E` indicate ends with a $
- `-v` show non-printing
- `-T` indicate tabs

- numbers the lines in `sheep_space.txt` and adds the column

$ cat -n sheep_space.txt > sheep_lines.txt

$ cat -T -E sheep_tab.txt

Extracting beginning and end of files

extracting head of files

head [-n N] file1 [file2 ...]

- `-n N` display *N* first lines

extracting tail of files

tail [-n N -f --pid PID] file1 [file2 ...]

- `-n N` display *N* last lines
- `-f` continuously display updates of file (useful to display log-files)
- `--pid PID` terminate tail-command in sync with termination of process with process ID *PID*

Bringing order into files

sort lines of text files (alphabetical or numerical)

sort [-d -f -g ] file1 [file2 ...]

- `-d` dictionary (alphanumeric) order
- `-f` ignore upper/lower case
- `-g` general numeric
- Spot the difference:
```bash
 $ sort -d sheep_space.txt
 $ sort -g sheep_lines.txt 
```

Removing redundancy in files

filter adjacent matching (redundant) files

 uniq [-c -f -s -w ] file1 [file2 ...]

- `-c` prefix lines by number of their occurrence
- `-f N` avoid comparing the first *N* fields
- `-s N` avoid comparing the first *N* characters
- `-w N` compare not more than *N* characters/line

- skips the first column (the previously inserted numbers) and matches in max. 10 characters (i.e., avoiding the later columns) and prefixes the number of occurrence (Hint: try with –f 2)
```bash
$ uniq -c -f 1 -w 10 sheep_lines.txt 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05-ReplacingSpreadsheets.md

05-ReplacingSpreadsheets.md

What we will work over

Adding files side-by-side: paste

Trimming files: cut

Counting lines [and sheep]: wc

Combining files end to start: cat

Extracting beginning and end of files

Bringing order into files

Removing redundancy in files

Files

05-ReplacingSpreadsheets.md

Latest commit

History

05-ReplacingSpreadsheets.md

File metadata and controls

What we will work over

Adding files side-by-side: paste

Trimming files: cut

Counting lines [and sheep]: wc

Combining files end to start: cat

Extracting beginning and end of files

Bringing order into files

Removing redundancy in files