Skip to content
Kirill Pavlov edited this page Jun 28, 2020 · 5 revisions

Introduction

This project provides a set of scripts for columnar data manipulation directly in the command line. Think about it as SQL on delimited files.

Unlike some similar projects, tabtools does not have dependencies other than python 3.4+. Under the hood, each script analyzes headers of input files as well as command line arguments, transforms them into "coreutils script" using code syntax translation and executes that script on input files.

Benefits are:

  • SQL and streaming queries directly in the command line. See [Quickstart] and [Documentation].
  • Works fast as heavy-lifting is delegated to coreutils. See [Benchmarks].
  • Self-contained, 0-dependencies scripts. Deployment via copy-pasting files, no "sudo" required.

Install

There are two ways to install tabtools: copy and paste files from CircleCI artifacts OR install scripts as a python package.

Install: copy-paste latest build

Following CircleCI artifacts instructions

curl https://circleci.com/api/v1.1/project/gh/slothai/tabtools/latest/artifacts \
   | grep -o 'https://[^"]*' \
   | wget -v -i -
Install: python package
python3 -m pip install --user tabtools

Why?

The main reason is convenience: most of the command line tools are great but they are referring columns by index instead of name which makes it difficult to maintain scripts. Originally, this project just helped to "replace" column names by column numbers based on the header. Then it became something closer to "SQL in command line": based on command line parameters it generates awk program and executes it on the input file. No database or intermediate files required. Also, with awk comes the speed of execution.

Notes about CircleCI
Helpful bash Text Processing Commands

https://www.tldp.org/LDP/abs/html/textproc.html