Add some common doc (amundsen-io#1)

vazmeee · May 15, 2019 · d681537 · d681537
1 parent ca246af
commit d681537
Show file tree

Hide file tree

Showing 12 changed files with 154 additions and 2 deletions.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,2 @@
+This project is governed by [Lyft's code of conduct](https://github.com/lyft/code-of-conduct).
+All contributors and participants agree to abide by its terms.
diff --git a/NOTICE b/NOTICE
@@ -0,0 +1,4 @@
+amundsen
+Copyright 2018-2019 Lyft Inc.
+
+This product includes software developed at Lyft Inc.
diff --git a/README.md b/README.md
@@ -1,2 +1,82 @@
-# amundsen
-Repository for the Amundsen project
+# Amundsen
+
+[![License](http://img.shields.io/:license-Apache%202-blue.svg)](LICENSE)
+[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)
+[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://bit.ly/2FVq37z)
+
+Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data. It does that today by indexing data resources (tables, dashboards, streams, etc.) and powering a page-rank style search based on usage patterns (e.g. highly queried tables show up earlier than less queried tables). Think of it as Google search for data. The project is named after Norwegian explorer [Roald Amundsen](https://en.wikipedia.org/wiki/Roald_Amundsen), the first person to discover South Pole.
+
+It includes three microservices and a data ingestion library.
+- [amundsenfrontendlibrary](https://github.com/lyft/amundsenfrontendlibrary): Frontend service which is a Flask application with a React frontend.
+- [amundsensearchlibrary](https://github.com/lyft/amundsensearchlibrary): Search service, which leverages Elasticsearch for search capabilities, is used to power frontend metadata searching.
+- [amundsenmetadatalibrary](https://github.com/lyft/amundsenmetadatalibrary): Metadata service, which leverages Neo4j or Apache Atlas as the persistent layer, to provide various metadata.
+- [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder): Data ingestion library for building metadata graph and search index.
+Users could either load the data with [a python script](https://github.com/lyft/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py) with the library
+or with an [Airflow DAG](https://github.com/lyft/amundsendatabuilder/blob/master/example/dags/sample_dag.py) importing the library.
+
+
+## Requirements
+- Python >= 3.4
+- Node = v8.x.x or v10.x.x (v11.x.x has compatibility issues)
+- npm >= 6.x.x
+
+## User Interface
+
+Please note that the mock images only served as demonstration purpose.
+
+- **Landing Page**: The landing page for Amundsen including 1. search bars; 2. popular used tables;
+
+    ![](docs/img/landing_page.png)
+
+- **Table Detail Page**: Visualization of a Hive / Redshift table
+
+    ![](docs/img/table_detail_page.png)
+
+- **Column detail**: Visualization of columns of a Hive / Redshift table which includes an optional stats display
+
+    ![](docs/img/column_details.png)
+
+- **Data Preview Page**: Visualization of table data preview which could integrate with [Apache Superset](https://github.com/apache/incubator-superset)
+
+    ![](docs/img/data_preview.png)
+
+## Get Involved in the Community
+
+Want help or want to help?
+Use the button in our [header](https://github.com/lyft/amundsen#amundsen) to join our slack channel. Please join our [mailing list](https://groups.google.com/forum/#!forum/amundsen-dev) as well.
+
+## Getting started
+
+Please visit the Amundsen documentation for help with [installing Amundsen](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docs/installation.md#install-standalone-application-directly-from-the-source)
+and getting a [quick start](https://github.com/lyft/amundsenfrontendlibrary/blob/master/docs/installation.md#bootstrap-a-default-version-of-amundsen-using-docker) with dummy data
+or an [overview of the architecture](docs/architecture.md).
+
+## Architecture Overview
+
+Please visit [Architecture](docs/architecture.md) for Amundsen architecture overview.
+
+## Installation
+
+Please visit [Installation guideline](docs/installation.md) on how to install Amundsen.
+
+## Configuration
+
+Please visit [Configuration doc](docs/configuration.md) on how to configure Amundsen various enviroment settings(local vs production).
+
+## Developer Guidelines
+
+Please visit [Developer guidelines](docs/developer_guide.md) if you want to build Amundsen in your local environment.
+
+## Roadmap
+
+Please visit [Roadmap](docs/roadmap.md) if you are interested in Amundsen upcoming roadmap items.
+
+## Publications
+- [Disrupting Data Discovery](https://www.slideshare.net/taofung/strata-sf-amundsen-presentation) (Strata SF 2019)
+- [Amundsen - Lyft's data discovery & metadata engine](https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9) (Lyft engineering blog)
+- [Amundsen: A Data Discovery Platform from Lyft](https://www.slideshare.net/taofung/data-council-sf-amundsen-presentation) (Data council 19 SF)
+- [Software Engineering Daily podcast on Amundsen](https://softwareengineeringdaily.com/2019/04/16/lyft-data-discovery-with-tao-feng-and-mark-grover/) (April 2019)
+- [Disrupting Data Discovery](https://www.slideshare.net/markgrover/disrupting-data-discovery) (Strata London 2019)
+
+# License
+[Apache 2.0 License.](/LICENSE)
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,19 @@
+# Architecture
+
+The following diagram shows the overall architecture for Amundsen.
+![](img/Amundsen_Architecture.png)
+
+The frontend service serves as web UI portal for users interaction. 
+It is Flask-based web app which representation layer is built with React with Redux, Bootstrap, Webpack, and Babel.
+
+The search service leverages Elasticsearch's search functionality and 
+provides a RESTful API to serve search requests from the frontend service. 
+Currently only [table resources](https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/models/elasticsearch_document.py) are indexed and searchable.
+The search index is built with the [elasticsearch publisher](https://github.com/lyft/amundsendatabuilder/blob/master/databuilder/publisher/elasticsearch_publisher.py).
+
+The metadata service currently uses a Neo4j proxy to interact with Neo4j graph db and serves frontend service's metadata. 
+The metadata is represented as a graph model:
+![](img/graph_model.png)
+The above diagram shows how metadata is modeled in Amundsen. 
+Amundsen provides a [data ingestion library](https://github.com/lyft/amundsendatabuilder) for building the metadata. At Lyft, we build the metadata once a day 
+using an Airflow DAG([example](https://github.com/lyft/amundsendatabuilder/blob/master/example/dags/sample_dag.py)).
diff --git a/docs/img/Amundsen_Architecture.png b/docs/img/Amundsen_Architecture.png
diff --git a/docs/img/column_details.png b/docs/img/column_details.png
diff --git a/docs/img/data_preview.png b/docs/img/data_preview.png
diff --git a/docs/img/graph_model.png b/docs/img/graph_model.png
diff --git a/docs/img/landing_page.png b/docs/img/landing_page.png
diff --git a/docs/img/table_detail_page.png b/docs/img/table_detail_page.png
diff --git a/docs/installation.md b/docs/installation.md
@@ -0,0 +1,44 @@
+# Installation
+
+## Bootstrap a default version of Amundsen using Docker
+The following instructions are for setting up a version of Amundsen using Docker. At the moment, we only support a bootstrap for connecting the Amundsen application to an example metadata service.
+
+1. Install `docker`, `docker-compose`, and `docker-machine`.
+2. Install `virtualbox` and `virtualenv`.
+3. Start a managed docker virtual host using the following command:
+```bash
+# in our examples our machine is named 'default'
+$ docker-machine create -d virtualbox default
+```
+4. Check your docker daemon locally using:
+```bash
+$ docker-machine ls
+```
+  You should see the `default` machine listed, running on virtualbox with no errors listed.
+5. Set up the docker environment using
+```bash
+$ eval $(docker-machine env default)
+```
+6. Setup your local environment.
+  * Clone [this repo](https://github.com/lyft/amundsenfrontendlibrary), [amundsenmetadatalibrary](https://github.com/lyft/amundsenmetadatalibrary), and [amundsensearchlibrary](https://github.com/lyft/amundsensearchlibrary).
+  * In your local versions of each library, update the `LOCAL_HOST` in the `LocalConfig` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
+7. Start all of the services using:
+```bash
+# in ~/<your-path-to-cloned-repo>/amundsenfrontendlibrary
+$ docker-compose -f docker-amundsen.yml up
+```
+8. Ingest dummy data into Neo4j by doing the following:
+  * Clone [amundsendatabuilder](https://github.com/lyft/amundsendatabuilder).
+  * Update the `NEO4J_ENDPOINT` and `Elasticsearch host` in [sample_data_loader.py](https://github.com/lyft/amundsendatabuilder/blob/master/example/scripts/sample_data_loader.py) and replace `localhost` with the IP used for the `default` docker machine. You can see the IP in the `URL` outputted from running `docker-machine ls`.
+  * Run the following commands:
+    ```bash
+    # in ~/<your-path-to-cloned-repo>/amundsendatabuilder
+    $ virtualenv -p python3 venv3
+    $ source venv3/bin/activate  
+    $ pip3 install -r requirements.txt
+    $ python setup.py install      
+    $ python example/scripts/sample_data_loader.py
+    ```
+9. Verify dummy data has been ingested by viewing in Neo4j by visiting `http://YOUR-DOCKER-HOST-IP:7474/browser/` and run `MATCH (n:Table) RETURN n LIMIT 25` in the query box. You should see two tables -- `hive.test_schema.test_table1` and `dynamo.test_schema.test_table2`.
+10. View UI at `http://YOUR-DOCKER-HOST-IP:5000/table_detail/gold/hive/test_schema/test_table1` or `/table_detail/gold/dynamo/test_schema/test_table2`
+11. View UI at `http://YOUR-DOCKER-HOST-IP:5000` and try to search `test`, it should return some result.
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -0,0 +1,3 @@
+# Roadmap
+
+**TODO: add ongoing roadmap**
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		This project is governed by [Lyft's code of conduct](https://github.com/lyft/code-of-conduct).
		All contributors and participants agree to abide by its terms.