Skip to content
This repository has been archived by the owner on Mar 30, 2021. It is now read-only.

Building the Druid Index for the TPCH Dataset using the Local Indexing Service.

sdesikan6 edited this page Jun 10, 2016 · 5 revisions

This assumes that you have setup Druid and have created the Denormalized dataset

In the following we describe the procedure to use the Druid Indexing Service in local mode. Use this procedure when indexing a small dataset in a dev. environment. For a production environment use the HadoopDruidIndexer

Ensure that the Druid overlord service is running. Then issue a POST like the following:

curl -X 'POST' -H 'Content-Type:application/json' \
-d @/Users/hbutani/sparkline/tpch-spark-druid/druid/tpch_index_task.json \
localhost:8090/druid/indexer/v1/task

The overlord listens on port 8090 and indexing commands can be posted to it. The Index Json in this case points the TPCH datascale1 denormalized dataset.

The Status of the Indexing can be viewed at its console. Note that the local indexing service takes several hours to index even the datascale 1 TPCH dataset. For development purposes consider indexing only a small sample/subset of the datascale1 dataset.

Clone this wiki locally