Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Druid API for use in notebooks #13787

Merged
merged 23 commits into from
Mar 5, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ integration-tests/gen-scripts/
/bin/
*.hprof
**/.ipynb_checkpoints/
*.pyc
91 changes: 63 additions & 28 deletions docs/tutorials/tutorial-jupyter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,51 +22,86 @@ title: "Jupyter Notebook tutorials"
~ under the License.
-->

<!-- tutorial-jupyter-index.md and examples/quickstart/juptyer-notebooks/README.md share a lot of the same content. If you make a change in one place, update the other too. -->
<!-- tutorial-jupyter-index.md and examples/quickstart/juptyer-notebooks/README.md
share a lot of the same content. If you make a change in one place, update the other
too. -->

You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These tutorials provide snippets of Python code that you can use to run calls against the Druid API to complete the tutorial.
You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These
tutorials provide snippets of Python code that you can use to run calls against
the Druid API to complete the tutorial.

## Prerequisites
## Prerequisites

Make sure you meet the following requirements before starting the Jupyter-based tutorials:

- Python 3
- Python 3.7 or later

- The `requests` package for Python. For example, you can install it with the following command:

- The `requests` package for Python. For example, you can install it with the following command:

```bash
pip3 install requests
```

- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid and Jupyter both try to use port `8888,` so start Jupyter on a different port.
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
and Jupyter both try to use port `8888`, so start Jupyter on a different port.

- Install JupyterLab or Notebook:

```bash
# Install JupyterLab
pip3 install jupyterlab
# Install Jupyter Notebook
pip3 install notebook
```
- Start JupyterLab


```bash
# Install JupyterLab
pip3 install jupyterlab
paul-rogers marked this conversation as resolved.
Show resolved Hide resolved
# Install Jupyter Notebook
pip3 install notebook
```
- Start Jupyter using either JupyterLab
```bash
# Start JupyterLab on port 3001
jupyter lab --port 3001
```

Or using Jupyter Notebook
```bash
# Start JupyterLab on port 3001
jupyter lab --port 3001
```
- Alternatively, start Jupyter Notebook
```bash
# Start Jupyter Notebook on port 3001
jupyter notebook --port 3001
```
# Start Jupyter Notebook on port 3001
jupyter notebook --port 3001
```

- An available Druid instance. You can use the [Quickstart (local)](./index.md) instance. The tutorials
assume that you are using the quickstart, so no authentication or authorization
is expected unless explicitly mentioned.

If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.

```bash
cd $DRUID_DEV
./it.sh build
./it.sh image
./it.sh up <category>
```

Replace `<category>` with one of the available integration test categories. See the integration
test `README.md` for details.

## Simple Druid API

- An available Druid instance. You can use the [Quickstart (local)](./index.md) instance. The tutorials assume that you are using the quickstart, so no authentication or authorization is expected unless explicitly mentioned.
One of the notebooks shows how to use the Druid REST API. The others focus on other
topics and use a simple set of Python wrappers around the underlying REST API. The
wrappers reside in the `druidapi` package within the notebooks directory. While the package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if it'd make sense to pull the python package outside the context of the jupyter notebook so it can be reused for other things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good longer-term goal. For now, we're putting our toe in the water by including the code here. From there, we can see if there is broader interest besides just as a training tool.

can be used in any Python program, the key purpose, at present, is to support these
notebooks. See the [Introduction to the Druid Python API]
(https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
for an overview of the Python API.

## Tutorials

The notebooks are located in the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/). You can either clone the repo or download the notebooks you want individually.
The notebooks are located in the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/). You can either clone the repo or download the notebooks you want individually.

The links that follow are the raw GitHub URLs, so you can use them to download the notebook directly, such as with `wget`, or manually through your web browser. Note that if you save the file from your web browser, make sure to remove the `.txt` extension.

- [Introduction to the Druid API](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb) walks you through some of the basics related to the Druid API and several endpoints.
- [Introduction to Druid SQL](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb) covers the basics of Druid SQL.
- [Introduction to the Druid REST API](
https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb)
walks you through some of the basics related to the Druid REST API and several endpoints.
- [Introduction to the Druid Python API](
https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb)
walks you through some of the basics related to the Druid API using the Python wrapper API.
- [Introduction to Druid SQL](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb) covers the basics of Druid SQL.
156 changes: 156 additions & 0 deletions examples/quickstart/jupyter-notebooks/-START HERE-.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e415d732",
"metadata": {},
"source": [
"# Jupyter Notebook tutorials for Druid\n",
"\n",
"<!-- This README and the tutorial-jupyter-index.md file in docs/tutorials share a lot of the same content.\n",
"If you make a change in one place, update the other too. -->\n",
"\n",
"<!--\n",
" ~ Licensed to the Apache Software Foundation (ASF) under one\n",
" ~ or more contributor license agreements. See the NOTICE file\n",
" ~ distributed with this work for additional information\n",
" ~ regarding copyright ownership. The ASF licenses this file\n",
" ~ to you under the Apache License, Version 2.0 (the\n",
" ~ \"License\"); you may not use this file except in compliance\n",
" ~ with the License. You may obtain a copy of the License at\n",
" ~\n",
" ~ http://www.apache.org/licenses/LICENSE-2.0\n",
" ~\n",
" ~ Unless required by applicable law or agreed to in writing,\n",
" ~ software distributed under the License is distributed on an\n",
" ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
" ~ KIND, either express or implied. See the License for the\n",
" ~ specific language governing permissions and limitations\n",
" ~ under the License.\n",
" -->\n",
"\n",
"You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These\n",
"tutorials provide snippets of Python code that you can use to run calls against\n",
"the Druid API to complete the tutorial."
]
},
{
"cell_type": "markdown",
"id": "60015702",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"To get this far, you've installed Python 3 and Jupyter Notebook. Make sure you meet the following requirements before starting the Jupyter-based tutorials:\n",
"\n",
"- The `requests` package for Python. For example, you can install it with the following command:\n",
"\n",
" ```bash\n",
" pip3 install requests\n",
" ````\n",
"\n",
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
" and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
"\n",
"- An available Druid instance. You can use the local quickstart configuration\n",
" described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
" The tutorials assume that you are using the quickstart, so no authentication or authorization\n",
" is expected unless explicitly mentioned.\n",
"\n",
"## Simple Druid API\n",
"\n",
"One of the notebooks shows how to use the Druid REST API. The others focus on other\n",
"topics and use a simple set of Python wrappers around the underlying REST API. The\n",
"wrappers reside in the `druidapi` package within this directory. While the package\n",
"can be used in any Python program, the key purpose, at present, is to support these\n",
"notebooks. See the [Introduction to the Druid Python API](Python_API_Tutorial.ipynb)\n",
"for an overview of the Python API."
]
},
{
"cell_type": "markdown",
"id": "d9e18342",
"metadata": {},
"source": [
"## Tutorials\n",
"\n",
"The notebooks are located in the [apache/druid repo](\n",
"https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
"You can either clone the repo or download the notebooks you want individually.\n",
"\n",
"The links that follow are the raw GitHub URLs, so you can use them to download the\n",
"notebook directly, such as with `wget`, or manually through your web browser. Note\n",
"that if you save the file from your web browser, make sure to remove the `.txt` extension.\n",
"\n",
"- [Introduction to the Druid REST API](api-tutorial.ipynb) walks you through some of the\n",
" basics related to the Druid REST API and several endpoints.\n",
"- [Introduction to the Druid Python API](Python_API_Tutorial.ipynb) walks you through some of the\n",
" basics related to the Druid API using the Python wrapper API.\n",
"- [Learn the basics of Druid SQL](sql-tutorial.ipynb) introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement. "
]
},
{
"cell_type": "markdown",
"id": "1a4b986a",
"metadata": {},
"source": [
"## Contributing\n",
"\n",
"If you build a Jupyter tutorial, you need to do a few things to add it to the docs\n",
"in addition to saving the notebook in this directory. The process requires two PRs to the repo.\n",
"\n",
"For the first PR, do the following:\n",
"\n",
"1. Depending on the goal of the notebook, you may want to clear the outputs from your notebook\n",
" before you make the PR. You can use the following command:\n",
"\n",
" ```bash\n",
" jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace ./path/to/notebook/notebookName.ipynb\n",
" ```\n",
" \n",
" This can also be done in Jupyter Notebook itself: `Kernel` &rarr; `Restart & Clear Output`\n",
"\n",
"2. Create the PR as you normally would. Make sure to note that this PR is the one that\n",
" contains only the Jupyter notebook and that there will be a subsequent PR that updates\n",
" related pages.\n",
"\n",
"3. After this first PR is merged, grab the \"raw\" URL for the file from GitHub. For example,\n",
" navigate to the file in the GitHub web UI and select **Raw**. Use the URL for this in the\n",
" second PR as the download link.\n",
"\n",
"For the second PR, do the following:\n",
"\n",
"1. Update the list of [Tutorials](#tutorials) on this page and in the\n",
" [Jupyter tutorial index page](../../../docs/tutorials/tutorial-jupyter-index.md#tutorials)\n",
" in the `docs/tutorials` directory.\n",
"\n",
"2. Update `tutorial-jupyter-index.md` and provide the URL to the raw version of the file\n",
" that becomes available after the first PR is merged.\n",
"\n",
"Note that you can skip the second PR, if you just copy the prefix link from one of the\n",
"existing notebook links when doing your first PR."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading