diff --git a/docs_website/docs/configurations/general_config.md b/docs_website/docs/configurations/general_config.md index 6ef04350b..0ef8d3c76 100644 --- a/docs_website/docs/configurations/general_config.md +++ b/docs_website/docs/configurations/general_config.md @@ -16,7 +16,7 @@ In the next section we will go over different things that can be configured in t Checkout [Sharing & Security](../overview/sharing_and_security.md) to learn how to configure access permission for these entities. ::: -#### Environment +### Environment Environment ensures users on Querybook are only allowed to access to information/query they have permission to. All DataDocs, Query Engines are attached to some environments. @@ -44,6 +44,12 @@ Once a metastore is created, you can configure the auto sync schedule, manually Query engine configures the endpoints that users can query. Each query engine needs to be attached to an environment for security measures. They can also attach a metastore to allow users to see table information while writing queries. All available query engine executors are grouped by language and each of them have different configuration values that needs to be set. +![](/img/documentation/Querybook_concepts.png) + +- A query engine can be associated with a metastore. +- An environment can contain multiple query engines. +- A user can be added to one or many environments, depending on the data source(s) they are granted access to and the environment(s) that have access. + ### Announcement Querybook Admins can use the announcement feature to send quick updates to users on Querybook.The announcement will appear as a top banner on Querybook's main site. Querybook actively polls the announcement end point five minutes so any change to the announcements are quickly reflected. diff --git a/docs_website/docs/setup_guide/connect_to_a_query_engine.md b/docs_website/docs/setup_guide/connect_to_a_query_engine.md new file mode 100644 index 000000000..0de985f33 --- /dev/null +++ b/docs_website/docs/setup_guide/connect_to_a_query_engine.md @@ -0,0 +1,111 @@ +--- +id: connect_to_a_query_engine +title: Connect to a Query Engine +sidebar_label: Connect to a Query Engine +--- + +## Prerequisites + +- Have the Querybook repository cloned. See [Quick Setup](./quick_setup.md). +- Have a PostgreSQL database ready to connect. It could be either on your localhost or on a remote server. + +## General Process + +1. Create a query engine for query execution. +2. Add the query engine to an environment. Create one first if needed. +3. **[Optional but highly recommended]** Create a new metastore to associate with the query engine. + +:::info +If you dont have an idea of above concepts of **query engine**, **environment** and **metastore**, please refer to [here](../configurations/general_config#environment) +::: + +## Step by Step + +Here we'll guide you through the process of adding a query engine for **PostgreSQL** as an example. + +1. Create a `local.txt` file under the `requirements/` folder in the project's root directory. + +```bash +touch requirements/local.txt +``` + +2. Check the [engine list](https://www.querybook.org/docs/setup_guide/connect_to_query_engines#all-query-engines) and find the package it depends on. +3. If the required package is not included by default, add it to the `local.txt` file. For `PostgreSQL`, no additional package is needed. Here is an example for `Amazon Redshift`: + +```bash +echo -e "sqlalchemy-redshift\nredshift_connector" > requirements/local.txt +``` + +4. Start the container: + +```bash +make +``` + +5. Open [http://localhost:10001](http://localhost:10001) +6. Sign up as a new user and use the demo setup. The first signed up user will be added as the admin. +7. Open the admin tool [http://localhost:10001/admin](http://localhost:10001/admin) +8. Click `Query Engine` to add a new query engine + + - Provide a name for the query engine. + - Select `Postgresql` as the language. + - Select `sqlalchemy` as the executor. + - Input the connection string, which should look like + ``` + postgresql://:@:/ + ``` + Please refer to the SqlAlchemy [documentation](https://docs.sqlalchemy.org/en/20/core/engines.html#postgresql) for the connection string format. + - Select `SelectOneChecker` as the status checker + + :::caution About localhost + + If Querybook and PostgresSQL are both running on the same machine, you'll need some extra change. + + **Mac** + + Please use `host.docker.internal` instead of `localhost` as the server address. e.g. `postgresql://:@host.docker.internal:5432/` + + **Linux** + + Before step 4 `make` + + - update `docker-compose.yml` to add `network_mode=host` for below services + - web + - worker + - scheduler + - update `containers/bundled_querybook_config.yaml` to use `localhost` instead of service names + + ```yaml + DATABASE_CONN: mysql+pymysql://test:passw0rd@localhost:3306/querybook2?charset=utf8mb4 + REDIS_URL: redis://localhost:6379/0 + ELASTICSEARCH_HOST: localhost:9200 + ``` + + Then keep using `localhost` as the server host in the connection string + ::: + +9. Click `Test Connection` to see if it can connect to the database correctly. If it fails, go back and check the settings above and ensure that the database server is ready for connection. You can use command-line tools like `psql` to try to connect with the same connection settings. +10. Click `Save` to create the engine. +11. Go to the `Environment` tab and select `demo_environment`. You can also create a new environment if you like. +12. For `Query Engines`, select `postgresql` from the dropdown list, and click `Add Query Engine`. +13. Open [http://localhost:10001/demo_environment/adhoc/](http://localhost:10001/demo_environment/adhoc/). Switch to the new environment if you created a new one in step 11. +14. Try to write a test query, select `postgresql`, and run it. + +**That's it 🎉. Keep reading if you'd like to know how to add a metastore.** + +15. Open [http://localhost:10001/admin/metastore/](http://localhost:10001/admin/metastore/) +16. Create a new metastore. + - Provide a name for the metastore. + - Select `SqlAlchemyMetastoreLoader` as the loader. + - Input the same connection string as the query engine. + :::info Connection String + For PostgreSQL, the metastore is the same as the database, so we're using the same connection string for both the metastore and query engine. However, this may not be the case for other engines, such as Hive Metastore + Presto. + ::: +17. Click `Create` to create the metastore. +18. On the same page, you will see a section called `Update Schedule`. Click the button `Create Task`. This scheduled task is used for syncing the metadata from the metastore to Querybook periodically. +19. Click `Run Task` and wait until it completes. +20. Go to the `Query Engine` tab and select the new query engine `postgresql`. +21. Select `postgres_metastore` from the dropdown list for the `Metastore` field and click `Save`. +22. Go to the `Tables` tab on page [http://localhost:10001/demo_environment/](http://localhost:10001/admin/metastore/). Select `postgres_metastore` from the dropdown list. You'll see the tables synced from the metastore. If you don't see any tables, go back to step 17 and check if the connection string is correct. + +Congratulations! You have successfully set up Querybook with a query engine and a metastore. You can now start exploring and analyzing your data with ease. diff --git a/docs_website/docs/setup_guide/helm_guide.md b/docs_website/docs/setup_guide/helm_guide.md index 554935ed7..dbc61d83b 100644 --- a/docs_website/docs/setup_guide/helm_guide.md +++ b/docs_website/docs/setup_guide/helm_guide.md @@ -1,5 +1,5 @@ --- -id: helm_deployment_guide +id: helm_guide title: Helm Deployment Guide sidebar_label: Helm Deployment Guide --- diff --git a/docs_website/docs/setup_guide/connect_to_query_engines.md b/docs_website/docs/setup_guide/query_engines.md similarity index 76% rename from docs_website/docs/setup_guide/connect_to_query_engines.md rename to docs_website/docs/setup_guide/query_engines.md index 22b93f739..15044fb48 100644 --- a/docs_website/docs/setup_guide/connect_to_query_engines.md +++ b/docs_website/docs/setup_guide/query_engines.md @@ -1,7 +1,7 @@ --- -id: connect_to_query_engines -title: Connect to Query Engines -sidebar_label: Connect to Query Engines +id: query_engines +title: Query Engines +sidebar_label: Query Engines --- ## Overview @@ -30,48 +30,11 @@ If you have tried any of the tier 3 databases and confirmed it works, please upd ## Query Engine Support -Querybook only supports a few of the Tier 1 & 2 databases by default. When Querybook is launched, it checks with SqlAlchemy to see if any of the databases below are available. If so, the query engine would be automatically available to set up in the Admin UI. Please see the [step by step guide](#step-by-step-guide) below to see an working example. - -## Step by step guide - -In this guide, we will go through adding Amazon Redshift query engine to Querybook. This serves as an example to adding all sqlalchemy-compatible query engines. - -1. Clone and download the repo - -```sh -git clone git@github.com:pinterest/querybook.git -cd querybook -``` - -2. Create a `local.txt` under `requirements/` folder in the project's root directory - -```sh -touch requirements/local.txt -``` - -3. Add the required packages - -```sh -echo -e "sqlalchemy-redshift\nredshift_connector" > requirements/local.txt -``` - -4. Start the container - -```sh -make -``` - -5. Register as a new user and use the demo setup. -6. Visit [https://localhost:10001/admin/query_engine/](https://localhost:10001/admin/query_engine/) and create a new query engine. Put `redshift` as the language and `generic-sqlalchemy` as the executor. In the `Executor Params`, put the connection string (as specified by SqlAlchemy) in the `Connection_string` field. -7. Go to [https://localhost:10001/admin/environment/1/](https://localhost:10001/admin/environment/1/) and add the Redshift engine under the demo_environment. -8. Now you can run queries against the new Redshift engine in [https://localhost:10001/demo_environment/adhoc/](https://localhost:10001/demo_environment/adhoc/). -9. To include table metadata and autocompletion, you would need to add a metastore. Visit [https://localhost:10001/admin/metastore/](https://localhost:10001/admin/metastore/) and create a new metastore. Use SqlAlchemyMetastoreLoader with the exact connection string used for the query engine. Click on `Save` -> `CREATE SCHEDULE` -> `Create Task`. Now click on `Run Task` to sync. You can view the progress in the `History` tab. Wait until it is completed (Should be done in seconds if the number of tables is small). -10. Go to your query engine page on [https://localhost:10001/admin/query_engine/](https://localhost:10001/admin/query_engine/), in the Metastore field, choose the metastore you just created and click `Save`. -11. Visit [https://localhost:10001/demo_environment/adhoc/](https://localhost:10001/demo_environment/adhoc/) again and the auto complete feature should be available. You can also view all tables by clicking on the `Tables` button on the left sidebar and select the specific metastore. +Querybook only supports a few of the Tier 1 & 2 databases by default. When Querybook is launched, it checks with SqlAlchemy to see if any of the databases below are available. If so, the query engine would be automatically available to set up in the Admin UI. ## All Query Engines -**Note**: If the query engine is not included below, but it does have a Sqlalchemy integration, you can still use it in Querybook. Follow the [step by step guide](#step-by-step-guide) with 1 additional step before step 4. Visit `/querybook/server/lib/query_executor/sqlalchemy.py` and add the query engine to the list variable `SQLALCHEMY_SUPPORTED_DIALECTS`, and continue to step 4. If it works, please contribute to Querybook by submitting a PR of your changes. +**Note**: If the query engine is not included below, but it does have a Sqlalchemy integration, you can still use it in Querybook. Follow the [Connect to a Query Engine](./connect_to_a_query_engine) with 1 additional step before step 4. Visit `/querybook/server/lib/query_executor/sqlalchemy.py` and add the query engine to the list variable `SQLALCHEMY_SUPPORTED_DIALECTS`, and continue to step 4. If it works, please contribute to Querybook by submitting a PR of your changes. | Query Engine | Tier | Package | | -------------------- | ---- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | diff --git a/docs_website/sidebars.json b/docs_website/sidebars.json index e3954fa96..3618dd62c 100755 --- a/docs_website/sidebars.json +++ b/docs_website/sidebars.json @@ -9,7 +9,8 @@ "setup_guide/setup_overview", "setup_guide/quick_setup", "setup_guide/troubleshoot", - "setup_guide/connect_to_query_engines", + "setup_guide/connect_to_a_query_engine", + "setup_guide/query_engines", "setup_guide/prod_setup", "setup_guide/deployment_guide", "setup_guide/helm_guide", diff --git a/docs_website/static/img/documentation/Querybook_concepts.png b/docs_website/static/img/documentation/Querybook_concepts.png new file mode 100644 index 000000000..de692cb54 Binary files /dev/null and b/docs_website/static/img/documentation/Querybook_concepts.png differ diff --git a/requirements/base.txt b/requirements/base.txt index 9830be5f5..6ce4607db 100644 --- a/requirements/base.txt +++ b/requirements/base.txt @@ -39,3 +39,6 @@ markdown2 pandas==1.3.5 typing-extensions==3.10.0.0 setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability + +# Query engine - PostgreSQL +psycopg2==2.9.5