Skip to content

Latest commit

 

History

History
332 lines (270 loc) · 17.1 KB

README.md

File metadata and controls

332 lines (270 loc) · 17.1 KB

RisingStack bootcamp

General

Installation

Install Node 8 and the latest npm.

For this, use nvm, the Node version manager.

$ nvm install 8
# optional: set it as default
$ nvm alias default 8
# install latest npm
$ npm install -g npm

Install PostgreSQL on your system

Preferably with Homebrew.

$ brew install postgresql
# create a table
$ createdb risingstack_bootcamp

You should also install a visual tool for PostgreSQL, pick one:

Install Redis on your system

Preferably with Homebrew.

$ brew install redis

Install project dependencies

You only need to install them once, necessary packages are included for all of the steps.

$ npm install

Set up your development environment

If you already have a favorite editor or IDE, you can skip this step.

  1. Download Visual Studio Code
  2. Install the following extensions:
  1. Read the Node.js tutorial

Steps

1. Create a simple web application and make the test pass

Tasks:

  • Create a GET endpoint /hello returning Hello Node.js! in the response body, use the middleware of the koa-router package
  • Use the PORT environment variable to set the port, make it required
  • Make the tests pass (npm run test-web)
  • Run the application (eg. PORT=3000 npm start and try if it breaks when PORT is not provided)

Readings:

2. Create a model for the Github API

In this step you will implement two functions, wrappers for the GitHub API. You will use them to get information from GitHub later.

Tasks:

  • searchRepositories(query): should search for repositories given certain programming languages and/or keywords
    • The query function parameter is an Object of key-value pairs of the request query parameters (eg. { q: 'language:javascript' }, defaults to {})
    • It returns a Promise of the HTTP response without modification
  • getContributors(repository, query): get contributors list with additions, deletions, and commit counts (statistics)
    • repository function parameter is a String of the repository full name, including the owner (eg. RisingStack/cache)
    • The query function parameter is an Object of key-value pairs of the request query parameters (defaults to {})
    • It returns a Promise of the HTTP response without modification
  • Write unit tests for each function, use nock to intercept HTTP calls to the GitHub API endpoints

Readings:

Extra:

3. Implement the database models

In this step you will create the database tables, where the data will be stored, using migrations.

Your model should look like this:

DB schema

It consists of 3 tables: user, repository, contribution. Rows in the repository table have foreign keys to a record in the user table, owner. The contribution table is managing many-to-many relationship between the user and repository tables with foreign keys.

Tasks:

  • Edit the config and specify the migrations field in the knex initialization Object, for example:
      {
        client: 'pg',
        connection: '...',
        migrations: {
          directory: path.join(__dirname, './migrations')
        }
      }
  • Create one migration file per table (eg. 1-create-user.js, 2-create-repository.js, 3-create-contribution.js) with the following skeleton:
    • up method has the logic for the migration, down is for reverting it
    • The migrations are executed in transactions
    • The files are executed in alphabetical order
    'use strict'
    
    const tableName = '...'
    
    function up(knex) {
      return knex.schema.createTable(tableName, (table) => {
        // your code goes here
      })
    }
    
    function down(knex) {
      return knex.schema.dropTableIfExists(tableName)
    }
    
    module.exports = {
      up,
      down
    }
  • Add a migrate-db script to the scripts in package.json, edit scripts/migrate-db.js to add the migration call. Finally, run your migration script to create the tables:
    $ npm run migrate-db -- --local

Readings:

4. Implement helper functions for the database models

In this step you will implement and test helper functions for inserting, changing and reading data from the database.

Tasks:

  • Implement the user model:
    • User.insert({ id, login, avatar_url, html_url, type })
      • validate the parameters
    • User.read({ id, login })
      • validate the parameters
      • one is required: id or login
  • Implement the repository model:
    • Repository.insert({ id, owner, full_name, description, html_url, language, stargazers_count })
      • Validate the parameters
      • description and language can be empty Strings
    • Repository.read({ id, full_name })
      • Validate the parameters
      • One is required: id or full_name
      • Return the owner as well as an object (join tables and reorganize fields)
  • Implement the contribution model:
    • Contribution.insert({ repository, user, line_count })
      • Validate the parameters
    • Contribution.insertOrReplace({ repository, user, line_count })
    • Contribution.read({ user: { id, login }, repository: { id, full_name } })
      • Validate the parameters

      • The function parameter should be an Object, it should contain either a user, either a repository field or both of them.

        If only the user is provided, then all the contributions of that user will be resolved. If only the repository is provided, than all the users who contributed to that repository will be resolved. If both are provided, then it will match the contribution of a particular user to a particular repo.

      • The functions resolves to an Array of contributions (when both a user and a repository identifier is passed, it will only have 1 element)

      • Return the repository and user as well as an object (This requirement is just for the sake of making up a problem, when you actually need this function, you will most likely have the user or the repository Object in a whole)

        {
          line_count: 10,
          user: { id: 1, login: 'coconut', ... },
          repository: { id: 1, full_name: 'risingstack/repo', ... }
        }
      • Use a single SQL query

      • When you join the tables, there will be conflicting column names (id, html_url). Use the as keyword when selecting columns (eg. repository.id as repository_id) to avoid this

Notes:

  • user is a reserved keyword in PG, use double quotes where you reference the table in a raw query
  • You can get the columns of a table by querying information_schema.columns, which can be useful to select fields dinamically when joining tables, eg.:
    SELECT column_name FROM information_schema.columns WHERE table_name='contribution';

5. Create a worker process

In this step you will implement another process of the application, the worker. We will trigger a request to collect the contributions for repositories based on some query. The trigger will send messages to another channel, the handler for this channel is reponsible to fetch the repositories. The third channel is used to fetch and save the contributions.

Make a drawing of the message flow, it will help you a lot!

Tasks:

  • Start Redis locally
  • Implement the contributions handler:
    • The responsibility of the contributions handler is to fetch the contributions of a repository from the GitHub API and to save the contributors and their line counts to the database
    • Validate the message, it has two fields: date and repository with id and full_name fields
    • Get the contributions from the GitHub API (use your models created in step 2)
    • Count all the lines currently in the repository per users (use lodash and Array functions)
    • Save the users to the database, don't fail if the user already exists (use your models created in step 3)
    • Save the contributions to the database, insert or replace (use your models created in step 3)
  • Implement the repository handler:
    • Validate the message, it has three fields: date, query and page
    • Get the repositories from the GitHub API (use your models created in step 2) with the q, page and per_page (set to 100) query parameters.
    • Modify the response to a format which is close to the database models (try to use lodash/fp)
    • Save the owner to the database, don't fail if the user already exists (use your models created in step 3)
    • Save the repository to the database, don't fail if the repository already exists (use your models created in step 3)
    • Publish a message to the contributions channel with the same date
  • Implement the trigger handler:
    • The responsibility of the trigger handler is to send 10 messages to the repository collect channel implemented above. 10, because GitHub only gives access to the first 1000 (10 * page size of 100) search results
    • Validate the message, it has two fields: date and query
  • We would like to make our first search and data collection from GitHub.
    • For this, create a trigger.js file in the scripts folder. It should be a simple run once Node script which will publish a message to the trigger channel with the query passed in as an environment variable (TRIGGER_QUERY), then exit. It should have the same --local, -L flag, but for setting the REDIS_URI, as the migrate-db script.
    • Add a trigger field to the scripts in package.json that calls your trigger.js script.

Readings:

6. Implement a REST API

In this step you will add a few routes to the existing web application to trigger a data crawl and to expose the collected data.

Tasks:

  • The database requirements changed in the meantime, create a new migration (call it 4-add-indexes.js), add indexes to user.login and repository.full_name (use knex.schema.alterTable)
  • Implement the POST /api/v1/trigger route, the body contains an object with a string query field, you will use this query to send a message to the corresponding Redis channel. Return 201 when it was successful
  • Implement the GET /api/v1/repository/:id and GET /api/v1/repository/:owner/:name endpoints
  • Implement the GET /api/v1/repository/:id/contributions and GET /api/v1/repository/:owner/:name/contributions endpoints
  • Create a middleware (requestLogger({ level = 'silly' })) and add it to your server, that logs out:
    • The method and original url of the request
    • Request headers (except authorization and cookie) and body
    • The request duration in ms
    • Response headers (except authorization and cookie) and body
    • Response status code (based on it: log level should be error when server error, warn when client error)
  • Document your API using Apiary's Blueprint format (edit the API_DOCUMENTATION.apib).

Notes:

  • Make use of koa-compose and the validator middleware
    compose([
      middleware.validator({
        params: paramsSchema,
        query: querySchema,
        body: bodySchema
      }),
      // additional middleware
    ])

Readings:

7. Prepare your service for production

In this step you will add some features, which are required to have your application running in production environment.

Tasks:

  • Listen on the SIGTERM signal in web/index.js.
    • Create a function called gracefulShutdown
    • Use koa's .callback() function to create a http server (look for http.createServer) and convert server.close with util.promisify
    • Close the server and destroy the database and redis connections (use the destroy function to the redis model, which calls disconnect on both redis clients and returns a Promise)
    • Log out and exit the process with code 1 if something fails
    • Exit the process with code 0 if everything is closed succesfully
  • Implement the same for the worker process
  • Add a health check endpoint for the web server
    • Add a healthCheck function for the database model, use the PG_HEALTH_CHECK_TIMEOUT environment variable to set the query timeout (set default to 2000 ms)
    • Add a healthCheck function to the redis model
    • Implement the GET /healthz endpoint, return 200 with JSON body { "status": "ok" }when everything is healthy, 500 if any of the database or redis connections are not healthy and 503 if the process got SIGTERM signal
  • Create a http server and add a similar health check endpoint for the worker process

Readings: