-
Notifications
You must be signed in to change notification settings - Fork 3
Home
This project provides a set of scripts for columnar data manipulation directly in the command line. Think about it as SQL on delimited files.
Unlike some similar projects, tabtools does not have dependencies other than python 3.4+. Under the hood, each script analyzes headers of input files as well as command line arguments, transforms them into "coreutils script" using code syntax translation and executes that script on input files.
Benefits are:
- SQL and streaming queries directly in the command line. See [Quickstart] and [Documentation].
- Works fast as heavy-lifting is delegated to coreutils. See [Benchmarks].
- Self-contained, 0-dependencies scripts. Deployment via copy-pasting files, no "sudo" required.
There are two ways to install tabtools: copy and paste files from CircleCI artifacts OR install scripts as a python package.
Following CircleCI artifacts instructions
curl https://circleci.com/api/v1.1/project/gh/slothai/tabtools/latest/artifacts \
| grep -o 'https://[^"]*' \
| wget -v -i -
python3 -m pip install --user tabtools
The main reason is convenience: most of the command line tools are great but they are referring columns by index instead of name which makes it difficult to maintain scripts. Originally, this project just helped to "replace" column names by column numbers based on the header. Then it became something closer to "SQL in command line": based on command line parameters it generates awk
program and executes it on the input file. No database or intermediate files required. Also, with awk comes the speed of execution.
- How to run it locally: https://github.com/CircleCI-Public/circleci-cli/issues/212
- CircleCI 2.1 config overview (with orbs): https://discuss.circleci.com/t/circleci-2-1-config-overview/26057
- Storing build artifacts: https://circleci.com/docs/2.0/artifacts/