Skip to content

Latest commit

 

History

History
354 lines (289 loc) · 10.3 KB

README.md

File metadata and controls

354 lines (289 loc) · 10.3 KB

ETL-JS CLI

Extract, Transform, and Load sharable and repeatable from command line.

NPM Version Linux Build Windows Build Test Coverage Known Vulnerabilities

mkdir my-etl && cd my-etl
# Initialize
etl-js init
# Use local executor
sed -i "s/executor: remote1/executor: local1/" settings.yml
# Create a simple template downloading Orion nebula from NASA site
echo -e "etlSets:\n default:\n  - step1\nstep1:\n files:\n  /tmp/orion-nebula.jpg:\n   source: https://www.nasa.gov/sites/default/files/thumbnails/image/orion-nebula-xlarge_web.jpg" > download_orion_nebula.yml
# Run template
etl-js run download_orion_nebula.yml

The image is downloaded locally to /tmp/orion-nebula.jpg.

Table of Contents

Installation

npm install --global @lpezet/etl-js-cli

Features

  • Template-based process using YML to express steps and activities as part of ETL
  • Built-in modules to leverage already installed software (e.g. mysql, mysqlimport, etc.)
  • Dynamic behavior through the use of tags in activities.

Concept

This command line tool lets you tap into the power of ETL-JS. The idea is to be able to share and easily repeat activities, and leverage existing tools as much as possible.

Steps and activities are basically specified in YML like so:

etlSets:
  default:
    - activity1
    - activity2
  somethingelse:
    - activity3

activity1:
  commands:
    my_command:
      command: echo "Hello..."

activity2:
  commands:
    something:
      command: echo "World!"

activity3:
  commands:
    bye_bye:
      command: echo "Bye bye!"

For more details, have a look at ETL-JS.

Getting started

You can get starting right away and figure things out along the way. If anything is unclear or confusing, it is best to take a look at ETL-JS. The sample template only uses the Commands Mod.

Once etl-js-cli has been installed, run the following:

mkdir etl-js-test
cd etl-js-test
etl-js init

At this point, 2 new files have been created in your current directory:

  • etl.yml: a sample ETL template to show some of ETL JS features
  • settings.yml: the settings etl-js-cli will be loading to run any ETL template.

You can ope etl.yml in your favorite editor and see its content. It should have something similar to this:

etlSets:
  default: ["hello", "world"]
  envTest: ["envTest"]
  varTest: ["varTest"]

hello:
  commands:
    say_hello:
      command: echo "Hello..."
world:
  commands:
    say_world:
      command: echo "...world!"
envTest:
  commands:
    with_env:
      command: 'echo "The value for env variable ''TESTENV'': {{env.TESTENV}}"'
varTest:
  commands:
    001_create_var:
      command: printf "hello"
      var: TESTVAR
    002_use_var:
      command: 'echo "The value for var TESTVAR: {{vars.TESTVAR}}"'

It basically provides 3 different ETL Sets:

  • default: if no special argument is passed to etl-js-cli, this is what it will run by default.
  • envTest: this ETL process simply demonstrates how environment variables can be used in the template.
  • varTest: this ETL process demonstrates how variables generated by other steps can be using within the template.

First run

To execute your first ETL template simply run the following:

etl-js run etl.yml

This will effectively execute the default ETL set. Be default, the complete result of all activities execute by this ETL Set is outputed, along with other log messages:

{ exit: false,
  activities:
   [ { activity: 'hello',
       steps:
        { commands:
           { exit: false,
             skip: false,
             results:
              [ { command: 'say_hello',
                  results:
                   { exit: false,
                     pass: true,
                     skip: false,
                     _stdout: 'Hello...\n',
                     _stderr: '',
                     result: 'Hello...\n' },
                  exit: false,
                  skip: false } ] } },
       exit: false,
       skip: false
     },
     { activity: 'world',
       steps:
        { commands:
           { exit: false,
             skip: false,
             results:
              [ { command: 'say_world',
                  results:
                   { exit: false,
                     pass: true,
                     skip: false,
                     _stdout: '...world!\n',
                     _stderr: '',
                     result: '...world!\n' },
                  exit: false,
                  skip: false } ] } },
       exit: false,
       skip: false } ] }

This ETL Set consists of two simple commands echo-ing "Hello...world".

Environment variables

To see how environment variables work, first run the following:

etl-js run etl.yml envTest

By specifying envTest, we are asking etl-js-cli to only run the ETL Set envTest. The final output should look like the following:

{ exit: false,
  activities:
   [ { activity: 'envTest',
       steps:
        { commands:
           { exit: false,
             skip: false,
             results:
              [ { command: 'with_env',
                  results:
                   { exit: false,
                     pass: true,
                     skip: false,
                     _stdout: 'The value for env variable \'TESTENV\': \n',
                     _stderr: '',
                     result: 'The value for env variable \'TESTENV\': \n' },
                  exit: false,
                  skip: false } ] } },
       exit: false,
       skip: false } ] }

You should notice here that the standard output did not resolve the value of the environment variable TESTENV. This is because when running the previous command, we did not have TESTENV environment variable set. We can set it using the env command in linux like so:

env TESTENV="Hello world!" etl-js run etl.yml envTest

The output should then look like the following:

{ exit: false,
  activities:
   [ { activity: 'envTest',
       steps:
        { commands:
           { exit: false,
             skip: false,
             results:
              [ { command: 'with_env',
                  results:
                   { exit: false,
                     pass: true,
                     skip: false,
                     _stdout: 'The value for env variable \'TESTENV\': Hello world!\n',
                     _stderr: '',
                     result: 'The value for env variable \'TESTENV\': Hello world!\n' },
                  exit: false,
                  skip: false } ] } },
       exit: false,
       skip: false } ] }

Variables

The varTest ELT Set in etl.yml is as followed:

varTest:
  commands:
    001_create_var:
      command: printf "hello"
      var: TESTVAR
    002_use_var:
      command: 'echo "The value for var TESTVAR: {{vars.TESTVAR}}"'

It basically consists to 2 commands:

  • 001_create_var: This command will echo a piece of text and save (its output) into a variable named TESTVAR.
  • 002_use_var: This command will echo another piece of text which includes a tag for the variable TESTVAR.

You can run this ETL set and see its output:

etl-js run etl.yml varTest

The final output should look like this:

{ exit: false,
  activities:
   [ { activity: 'varTest',
       steps:
        { commands:
           { exit: false,
             skip: false,
             results:
              [ { command: '001_create_var',
                  results:
                   { exit: false,
                     pass: true,
                     skip: false,
                     _stdout: 'hello',
                     _stderr: '',
                     result: 'hello' },
                  exit: false,
                  skip: false },
                { command: '002_use_var',
                  results:
                   { exit: false,
                     pass: true,
                     skip: false,
                     _stdout: 'The value for var TESTVAR: hello\n',
                     _stderr: '',
                     result: 'The value for var TESTVAR: hello\n' },
                  exit: false,
                  skip: false } ] } },
       exit: false,
       skip: false } ] }

You can see how the variable has been resoled using the output of the first command.

NB: Something worth noticing here is that the echo command always adds a newline at the end of a text by default. Simply calling echo hello doesn't display just hello but hello\n. Here we are using printf instead which does not behave like echo and does not generate this newline.

Examples/Tutorials

Examples and tutorials can be found here.

License

MIT

Publishing

To publish next version of etl-js-cli, run the following:

npm version patch
git push --tags origin master
npm run dist
npm publish dist/ --access public