Skip to content

Latest commit

 

History

History
131 lines (106 loc) · 3.54 KB

05-ReplacingSpreadsheets.md

File metadata and controls

131 lines (106 loc) · 3.54 KB
title author date lang
Replacing Spreadsheets - POSIX text utilities
CSC Training
2019-12
en

What we will work over

  • Your shell has built-in spread-sheet functions
  • You can find/extract/combine text row or column-wise

Adding files side-by-side: paste

merging lines of several input files

  • syntax:
paste [-d del -s] file1 file2 [file3 ...]
- `-d ` insert different delimiter *del* (between merged lines) than tabulator (default)
- `-s` sequential in case of more than two files: `[(file1 + file2) + file3] + file4`
  • lets try the following:
$ paste count.txt sheep.txt > counting_sheep_tab.txt         # creates merged file with tabulators
$ paste -d ' ' count.txt sheep.txt > counting_sheep_tab.txt  # creates merged file with space as delimiter

Trimming files: cut

extracting fields/columns from each line of files

  • syntax:
cut [-d del -f no -s] file1 file2 ...
- `-d del` use different delimiter *del* (to identify fields) than tabulator (default)
- `-f no` select fields *no* 
- `-s` skip lines not containing delimiters (e.g., header lines)
  • lets try the following:
$ cut -f 1 counting_sheep_tab.txt
$ cut -f 1 -d ' ' counting_sheep_space.txt
- both will display the original content of count.txt

Counting lines [and sheep]: wc

  • counting the lines, the words as well as the characters or bytes in a file (wc stands for word count):
wc [-l -w -m -c] file1 [file2 ...]
- `-l` count lines
- `-w` count words
- `-m` count characters
- `-c` count bytes
- without arguments displays lines, words, and byte-counts (as `-l -w -c`)
   - a word is a non-zero-length sequence of characters delimited by white space
$ wc -l sheep_space.txt

Combining files end to start: cat

  • concatenates files and prints to stdout
cat [-n -E -v -T] file1 file2 ...
- `-n` numbering output lines (e.g., source-code listing)
- `-E` indicate ends with a $
- `-v` show non-printing
- `-T` indicate tabs

- numbers the lines in `sheep_space.txt` and adds the column	
$ cat -n sheep_space.txt > sheep_lines.txt
$ cat -T -E sheep_tab.txt

Extracting beginning and end of files

  • extracting head of files
head [-n N] file1 [file2 ...]
- `-n N` display *N* first lines
  • extracting tail of files
tail [-n N -f --pid PID] file1 [file2 ...]
- `-n N` display *N* last lines
- `-f` continuously display updates of file (useful to display log-files)
- `--pid PID` terminate tail-command in sync with termination of process with process ID *PID*

Bringing order into files

  • sort lines of text files (alphabetical or numerical)
sort [-d -f -g ] file1 [file2 ...]
- `-d` dictionary (alphanumeric) order
- `-f` ignore upper/lower case
- `-g` general numeric
- Spot the difference:
```bash
 $ sort -d sheep_space.txt
 $ sort -g sheep_lines.txt 
```

Removing redundancy in files

  • filter adjacent matching (redundant) files
 uniq [-c -f -s -w ] file1 [file2 ...]
- `-c` prefix lines by number of their occurrence
- `-f N` avoid comparing the first *N* fields
- `-s N` avoid comparing the first *N* characters
- `-w N` compare not more than *N* characters/line

- skips the first column (the previously inserted numbers) and matches in max. 10 characters (i.e., avoiding the later columns) and prefixes the number of occurrence (Hint: try with –f 2)
```bash
$ uniq -c -f 1 -w 10 sheep_lines.txt 
```