Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.
See project page and documentation for detailed information.
Elasticsearch (1.x or higher (2.x highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.
ES-Hadoop 6.x and higher are compatible with Elasticsearch 1.X, 2.X, 5.X, and 6.X
ES-Hadoop 5.x and higher are compatible with Elasticsearch 1.X, 2.X and 5.X
ES-Hadoop 2.2.x and higher are compatible with Elasticsearch 1.X and 2.X
ES-Hadoop 2.0.x and 2.1.x are compatible with Elasticsearch 1.X only
Available through any Maven-compatible tool:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-cascading</artifactId>
<version>7.2.0</version>
</dependency>
or as a stand-alone ZIP.
Grab the latest nightly build from the repository again through Maven:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-cascading</artifactId>
<version>wip-7.2-nnn</version>
</dependency>
or build the project yourself.
We do build and test the code on each commit.
Running against Hadoop 1.x is deprecated in 5.5 and will no longer be tested against in 6.0. ES-Hadoop is developed for and tested against Hadoop 2.x and YARN. More information in this section.
We're interested in your feedback! You can find us on the User mailing list - please append [Hadoop]
to the post subject to filter it out. For more details, see the community page.
The latest reference documentation is available online on the project home page. Below the README contains basic usage instructions at a glance.
All configuration properties start with es
prefix. Note that the es.internal
namespace is reserved for the library internal use and should not be used by the user at any point.
The properties are read mainly from the Hadoop configuration but the user can specify (some of) them directly depending on the library used.
es.resource=<ES resource location, relative to the host/port specified above>
es.query=<uri or query dsl query> # defaults to {"query":{"match_all":{}}}
es.nodes=<ES host address> # defaults to localhost
es.port=<ES REST port> # defaults to 9200
The full list is available here
ES-Hadoop offers a dedicate Elasticsearch Tap, EsTap
that can be used both as a sink or a source. Note that EsTap
can be used in both local (LocalFlowConnector
) and Hadoop (HadoopFlowConnector
) flows:
Tap in = new EsTap("radio/artists", "?q=me*");
Tap out = new StdOut(new TextLine());
new LocalFlowConnector().connect(in, out, new Pipe("read-from-ES")).complete();
Tap in = Lfs(new TextDelimited(new Fields("id", "name", "url", "picture")), "src/test/resources/artists.dat");
Tap out = new EsTap("radio/artists", new Fields("name", "url", "picture"));
new HadoopFlowConnector().connect(in, out, new Pipe("write-to-ES")).complete();
Elasticsearch Hadoop uses Gradle for its build system and it is not required to have it installed on your machine. By default (gradlew
), it automatically builds the package and runs the unit tests. For integration testing, use the integrationTests
task.
See gradlew tasks
for more information.
To create a distributable zip, run gradlew distZip
from the command line; once completed you will find the jar in build/libs
.
To build the project, JVM 8 (Oracle one is recommended) or higher is required.
This project is released under version 2.0 of the Apache License
Licensed to Elasticsearch under one or more contributor
license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright
ownership. Elasticsearch licenses this file to you under
the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.