Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New doc fixes #6156

Merged
merged 2 commits into from
Aug 13, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/content/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ layout: doc_page
This page documents all of the configuration properties for each Druid service type.

## Table of Contents
* [Recommended Configuration File Organization](#recommended-configuration-file-organization)
* [Common configurations](#common-configurations)
* [JVM Configuration Best Practices](#jvm-configuration-best-practices)
* [Extensions](#extensions)
Expand Down
9 changes: 5 additions & 4 deletions docs/content/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,14 @@ layout: toc
* [Tutorial: Loading a file using Hadoop](/docs/VERSION/tutorials/tutorial-batch-hadoop.html)
* [Tutorial: Loading stream data using HTTP push](/docs/VERSION/tutorials/tutorial-tranquility.html)
* [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html)
* [Further tutorials](/docs/VERSION/tutorials/advanced.html)
* [Tutorial: Rollup](/docs/VERSION/tutorials/rollup.html)
* Further tutorials
* [Tutorial: Rollup](/docs/VERSION/tutorials/tutorial-rollup.html)
* [Tutorial: Configuring retention](/docs/VERSION/tutorials/tutorial-retention.html)
* [Tutorial: Updating existing data](/docs/VERSION/tutorials/tutorial-update-data.html)
* [Tutorial: Compacting segments](/docs/VERSION/tutorials/tutorial-compaction.html)
* [Tutorial: Deleting data](/docs/VERSION/tutorials/tutorial-delete-data.html)
* [Tutorial: Writing your own ingestion specs](/docs/VERSION/tutorials/tutorial-ingestion-spec.html)
* [Tutorial: Transforming input data](/docs/VERSION/tutorials/tutorial-transform-spec.html)
* [Clustering](/docs/VERSION/tutorials/cluster.html)

## Data Ingestion
Expand All @@ -33,8 +34,8 @@ layout: toc
* [Schema Design](/docs/VERSION/ingestion/schema-design.html)
* [Schema Changes](/docs/VERSION/ingestion/schema-changes.html)
* [Batch File Ingestion](/docs/VERSION/ingestion/batch-ingestion.html)
* [Native Batch Ingestion](docs/VERSION/ingestion/native_tasks.html)
* [Hadoop Batch Ingestion](docs/VERSION/ingestion/hadoop.html)
* [Native Batch Ingestion](/docs/VERSION/ingestion/native_tasks.html)
* [Hadoop Batch Ingestion](/docs/VERSION/ingestion/hadoop.html)
* [Stream Ingestion](/docs/VERSION/ingestion/stream-ingestion.html)
* [Stream Push](/docs/VERSION/ingestion/stream-push.html)
* [Stream Pull](/docs/VERSION/ingestion/stream-pull.html)
Expand Down
12 changes: 6 additions & 6 deletions docs/content/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf

This will bring up instances of Zookeeper and the Druid services, all running on the local machine, e.g.:

```
```bash
bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
[Thu Jul 26 12:16:23 2018] Running command[zk], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[coordinator], logging to[/stage/druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
Expand Down Expand Up @@ -121,7 +121,7 @@ The sample data has the following columns, and an example event is shown below:
* regionName
* user

```
```json
{
"timestamp":"2015-09-12T20:03:45.018Z",
"channel":"#en.wikipedia",
Expand Down Expand Up @@ -151,18 +151,18 @@ The following tutorials demonstrate various methods of loading data into Druid,

This tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.

### [Tutorial: Loading stream data from Kafka](../tutorial-kafka.html)
### [Tutorial: Loading stream data from Kafka](./tutorial-kafka.html)

This tutorial demonstrates how to load streaming data from a Kafka topic.

### [Tutorial: Loading a file using Hadoop](../tutorial-batch-hadoop.html)
### [Tutorial: Loading a file using Hadoop](./tutorial-batch-hadoop.html)

This tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.

### [Tutorial: Loading data using Tranquility](../tutorial-tranquility.html)
### [Tutorial: Loading data using Tranquility](./tutorial-tranquility.html)

This tutorial demonstrates how to load streaming data by pushing events to Druid using the Tranquility service.

### [Tutorial: Writing your own ingestion spec](../tutorial-ingestion-spec.html)
### [Tutorial: Writing your own ingestion spec](./tutorial-ingestion-spec.html)

This tutorial demonstrates how to write a new ingestion spec and use it to load data.
26 changes: 13 additions & 13 deletions docs/content/tutorials/tutorial-batch-hadoop.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ For this tutorial, we've provided a Dockerfile for a Hadoop 2.8.3 cluster, which

This Dockerfile and related files are located at `quickstart/tutorial/hadoop/docker`.

From the druid-${DRUIDVERSION} package root, run the following commands to build a Docker image named "druid-hadoop-demo" with version tag "2.8.3":
From the druid-#{DRUIDVERSION} package root, run the following commands to build a Docker image named "druid-hadoop-demo" with version tag "2.8.3":

```
```bash
cd quickstart/tutorial/hadoop/docker
docker build -t druid-hadoop-demo:2.8.3 .
```
Expand All @@ -37,7 +37,7 @@ We'll need a shared folder between the host and the Hadoop container for transfe

Let's create some folders under `/tmp`, we will use these later when starting the Hadoop container:

```
```bash
mkdir -p /tmp/shared
mkdir -p /tmp/shared/hadoop_xml
```
Expand All @@ -54,13 +54,13 @@ On the host machine, add the following entry to `/etc/hosts`:

Once the `/tmp/shared` folder has been created and the `etc/hosts` entry has been added, run the following command to start the Hadoop container.

```
```bash
docker run -it -h druid-hadoop-demo -p 50010:50010 -p 50020:50020 -p 50075:50075 -p 50090:50090 -p 8020:8020 -p 10020:10020 -p 19888:19888 -p 8030:8030 -p 8031:8031 -p 8032:8032 -p 8033:8033 -p 8040:8040 -p 8042:8042 -p 8088:8088 -p 8443:8443 -p 2049:2049 -p 9000:9000 -p 49707:49707 -p 2122:2122 -p 34455:34455 -v /tmp/shared:/shared druid-hadoop-demo:2.8.3 /etc/bootstrap.sh -bash
```

Once the container is started, your terminal will attach to a bash shell running inside the container:

```
```bash
Starting sshd: [ OK ]
18/07/26 17:27:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [druid-hadoop-demo]
Expand All @@ -80,17 +80,17 @@ The `Unable to load native-hadoop library for your platform... using builtin-jav

### Copy input data to the Hadoop container

From the druid-${DRUIDVERSION} package root on the host, copy the `quickstart/wikiticker-2015-09-12-sampled.json.gz` sample data to the shared folder:
From the druid-#{DRUIDVERSION} package root on the host, copy the `quickstart/wikiticker-2015-09-12-sampled.json.gz` sample data to the shared folder:

```
```bash
cp quickstart/wikiticker-2015-09-12-sampled.json.gz /tmp/shared/wikiticker-2015-09-12-sampled.json.gz
```

### Setup HDFS directories

In the Hadoop container's shell, run the following commands to setup the HDFS directories needed by this tutorial and copy the input data to HDFS.

```
```bash
cd /usr/local/hadoop/bin
./hadoop fs -mkdir /druid
./hadoop fs -mkdir /druid/segments
Expand All @@ -113,13 +113,13 @@ Some additional steps are needed to configure the Druid cluster for Hadoop batch

From the Hadoop container's shell, run the following command to copy the Hadoop .xml configuration files to the shared folder:

```
```bash
cp /usr/local/hadoop/etc/hadoop/*.xml /shared/hadoop_xml
```

From the host machine, run the following, where {PATH_TO_DRUID} is replaced by the path to the Druid package.

```
```bash
mkdir -p {PATH_TO_DRUID}/quickstart/tutorial/conf/druid/_common/hadoop-xml
cp /tmp/shared/hadoop_xml/*.xml {PATH_TO_DRUID}/quickstart/tutorial/conf/druid/_common/hadoop-xml/
```
Expand Down Expand Up @@ -177,17 +177,17 @@ a task that loads the `wikiticker-2015-09-12-sampled.json.gz` file included in t

Let's submit the `wikipedia-index-hadoop-.json` task:

```
```bash
bin/post-index-task --file quickstart/tutorial/wikipedia-index-hadoop.json
```

## Querying your data

After the data load is complete, please follow the [query tutorial](../tutorial/tutorial-query.html) to run some example queries on the newly loaded data.
After the data load is complete, please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.

## Cleanup

This tutorial is only meant to be used together with the [query tutorial](../tutorial/tutorial-query.html).
This tutorial is only meant to be used together with the [query tutorial](../tutorials/tutorial-query.html).

If you wish to go through any of the other tutorials, you will need to:
* Shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package.
Expand Down
8 changes: 4 additions & 4 deletions docs/content/tutorials/tutorial-batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ A data load is initiated by submitting an *ingestion task* spec to the Druid ove
The Druid package includes the following sample native batch ingestion task spec at `quickstart/wikipedia-index.json`, shown here for convenience,
which has been configured to read the `quickstart/wikiticker-2015-09-12-sampled.json.gz` input file:

```
```json
{
"type" : "index",
"spec" : {
Expand Down Expand Up @@ -101,13 +101,13 @@ This script will POST an ingestion task to the Druid overlord and poll Druid unt

Run the following command from Druid package root:

```
```bash
bin/post-index-task --file quickstart/tutorial/wikipedia-index.json
```

You should see output like the following:

```
```bash
Beginning indexing data for wikipedia
Task started: index_wikipedia_2018-07-27T06:37:44.323Z
Task log: http://localhost:8090/druid/indexer/v1/task/index_wikipedia_2018-07-27T06:37:44.323Z/log
Expand All @@ -121,7 +121,7 @@ wikipedia loading complete! You may now query your data

## Querying your data

Once the data is loaded, please follow the [query tutorial](../tutorial/tutorial-query.html) to run some example queries on the newly loaded data.
Once the data is loaded, please follow the [query tutorial](../tutorials/tutorial-query.html) to run some example queries on the newly loaded data.

## Cleanup

Expand Down
12 changes: 6 additions & 6 deletions docs/content/tutorials/tutorial-compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ Because there is some per-segment memory and processing overhead, it can sometim
For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.

It will also be helpful to have finished [Tutorial: Loading a file](/docs/VERSION/tutorials/tutorial-batch.html) and [Tutorial: Querying data](/docs/VERSION/tutorials/tutorial-query.html).
It will also be helpful to have finished [Tutorial: Loading a file](../tutorials/tutorial-batch.html) and [Tutorial: Querying data](../tutorials/tutorial-query.html).

## Load the initial data

For this tutorial, we'll be using the Wikipedia edits sample data, with an ingestion task spec that will create a separate segment for each hour in the input data.

The ingestion spec can be found at `quickstart/tutorial/compaction-init-index.json`. Let's submit that spec, which will create a datasource called `compaction-tutorial`:

```
```bash
bin/post-index-task --file quickstart/tutorial/compaction-init-index.json
```

Expand All @@ -31,7 +31,7 @@ There will be 24 segments for this datasource, one segment per hour in the input

Running a COUNT(*) query on this datasource shows that there are 39,244 rows:

```
```bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe using sql here would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, it's not purely SQL, and it shows the output from a shell, so I'll keep this as bash for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this also makes sense. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think there's some ambiguity in how to classify it, in the end I decided to go with "bash" because I can say that all parts of that snippet are "shell text" but I can't say that all parts are "sql".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks for explaining this.

dsql> select count(*) from "compaction-tutorial";
┌────────┐
│ EXPR$0 │
Expand All @@ -47,7 +47,7 @@ Let's now combine these 24 segments into one segment.

We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-final-index.json`:

```
```json
{
"type": "compact",
"dataSource": "compaction-tutorial",
Expand All @@ -69,7 +69,7 @@ In this tutorial example, only one compacted segment will be created, as the 392

Let's submit this task now:

```
```bash
bin/post-index-task --file quickstart/tutorial/compaction-final-index.json
```

Expand All @@ -85,7 +85,7 @@ The new compacted segment has a more recent version than the original segments,

Let's try running a COUNT(*) on `compaction-tutorial` again, where the row count should still be 39,244:

```
```bash
dsql> select count(*) from "compaction-tutorial";
┌────────┐
│ EXPR$0 │
Expand Down
16 changes: 8 additions & 8 deletions docs/content/tutorials/tutorial-delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ This tutorial demonstrates how to delete existing data.
For this tutorial, we'll assume you've already downloaded Druid as described in
the [single-machine quickstart](index.html) and have it running on your local machine.

Completing [Tutorial: Configuring retention](/docs/VERSION/tutorials/tutorial-retention.html) first is highly recommended, as we will be using retention rules in this tutorial.
Completing [Tutorial: Configuring retention](../tutorials/tutorial-retention.html) first is highly recommended, as we will be using retention rules in this tutorial.

## Load initial data

In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at `quickstart/tutorial/deletion-index.json`, and it creates a datasource called `deletion-tutorial`.

Let's load this initial data:

```
```bash
bin/post-index-task --file quickstart/tutorial/deletion-index.json
```

Expand Down Expand Up @@ -48,9 +48,9 @@ In the `rule #2` box at the bottom, click `Drop` and `Forever`.

This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.

You can see that all 24 segments are still present in deep storage by listing the contents of `druid-{DRUIDVERSION}/var/druid/segments/deletion-tutorial`:
You can see that all 24 segments are still present in deep storage by listing the contents of `druid-#{DRUIDVERSION}/var/druid/segments/deletion-tutorial`:

```
```bash
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
Expand Down Expand Up @@ -90,7 +90,7 @@ The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2016-

Let's disable the hour 14 segment by sending the following DELETE request to the coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:

```
```bash
curl -XDELETE http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/segments/{SEGMENT-ID}
```

Expand All @@ -100,7 +100,7 @@ After that command completes, you should see that the segment for hour 14 has be

Note that the hour 14 segment is still in deep storage:

```
```bash
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
Expand Down Expand Up @@ -134,13 +134,13 @@ Now that we have disabled some segments, we can submit a Kill Task, which will d

A Kill Task spec has been provided at `quickstart/deletion-kill.json`. Submit this task to the Overlord with the following command:

```
```bash
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/deletion-kill.json http://localhost:8090/druid/indexer/v1/task
```

After this task completes, you can see that the disabled segments have now been removed from deep storage:

```
```bash
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
Expand Down
Loading