-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-34223: [Java] Java Substrait Consumer JNI call to ACERO C++ #34227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
53 commits
Select commit
Hold shift + click to select a range
a0aac46
feat: consume Substrait Plan
davisusanibar 0d91f09
fix: solving maven-dependency-plugin
davisusanibar 0599dc2
feat: add support for execution of Substrait binary plans also
davisusanibar c794ae5
Upgrade to Java 11 to be able to consume Isthmus library
davisusanibar 8cc5443
fix: profile to Java test with JDK11 (be able to consume Isthmus libr…
davisusanibar e5594f8
fix: solve error to call Isthmus by Dataset that use JDK8
davisusanibar 223ddef
fix: detected both log4j-over-slf4j.jar AND bound slf4j-reload4j.jar …
davisusanibar 795e619
fix: rollback changes on orc
davisusanibar 3bd18f1
Merge branch 'main' into poc-substrait
davisusanibar 088a101
fix: able to compile main source with jdk8 and test with jdk11
davisusanibar ba23e44
fix: able to compile main source with jdk8 and test with jdk11
davisusanibar 8655815
fix: JAVA_HOME_11_X64: command not found
davisusanibar d22d6b1
fix: partial comments fix
davisusanibar f0d8a25
Update java/dataset/src/main/cpp/jni_util.h
davisusanibar 632f90d
Update java/dataset/src/main/java/org/apache/arrow/dataset/substrait/…
davisusanibar 9437f4e
fix: comments
davisusanibar 61d6ee7
fix: comments
davisusanibar 64c7607
fix: comments
davisusanibar 721fe01
fix: hash boost_1_81_0 does not match expected value
davisusanibar b3c2e1e
fix: maven-shade-plugin:jar:3.1.1 -> org.ow2.asm:asm:jar:6.0: Failed …
davisusanibar f5596c9
Merge branch 'main' into poc-substrait
davisusanibar 388446b
Merge branch 'main' into poc-substrait
davisusanibar ead80a8
fix: clean unit test, fix comments
davisusanibar 0446453
fix: clean substrait method to get plan
davisusanibar 8c57c16
fix: clean sout
davisusanibar 766b383
fix: rollback maven-shade-plugin
davisusanibar 5e8b887
fix: failures test
davisusanibar 7f59fbd
fix: delete methods not needed, create files of substrait plan
davisusanibar 0d2bcf8
fix: npe read resources
davisusanibar 4380932
fix: add resources files for nosuchfile error
davisusanibar 9bbe4fb
fix: add resources files for nosuchfile error
davisusanibar 5351ee1
fix: update rst documentation
davisusanibar e966d32
Apply suggestions from code review
davisusanibar cfe4061
fix: code review
davisusanibar 2419896
Merge branch 'main' into poc-substrait
davisusanibar 8811bc6
Merge branch 'main' into poc-substrait
davisusanibar ead4784
fix: rebase and changes to consider new arrow acero
davisusanibar 9bfa15c
fix: solving PR comments
davisusanibar 8a0eae6
Merge branch 'main' into poc-substrait
davisusanibar 87e75eb
fix: solving PR comments
davisusanibar 812921f
Merge branch 'main' into poc-substrait
davisusanibar 89060eb
fix: rebase
davisusanibar 33c634f
Update java/dataset/src/main/java/org/apache/arrow/dataset/substrait/…
davisusanibar 34979a5
fix: comment on code review
davisusanibar 1a6f0e5
fix: comment on code review
davisusanibar e388be5
fix: validate input on arrow Table associated with a given table name
davisusanibar 8eb3e40
fix: code review
davisusanibar ce7800b
Merge branch 'main' into poc-substrait
davisusanibar fdd042b
Merge branch 'main' into poc-substrait
davisusanibar 3dddea0
fix: solve code review comments
davisusanibar 6bdae18
fix: solve code review comments
davisusanibar 2d9fc84
fix: solve code review comments
davisusanibar 9b5f0cb
fix: solve code review comments
davisusanibar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| .. Licensed to the Apache Software Foundation (ASF) under one | ||
| .. or more contributor license agreements. See the NOTICE file | ||
| .. distributed with this work for additional information | ||
| .. regarding copyright ownership. The ASF licenses this file | ||
| .. to you under the Apache License, Version 2.0 (the | ||
| .. "License"); you may not use this file except in compliance | ||
| .. with the License. You may obtain a copy of the License at | ||
|
|
||
| .. http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| .. Unless required by applicable law or agreed to in writing, | ||
| .. software distributed under the License is distributed on an | ||
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| .. KIND, either express or implied. See the License for the | ||
| .. specific language governing permissions and limitations | ||
| .. under the License. | ||
|
|
||
| ========= | ||
| Substrait | ||
| ========= | ||
|
|
||
| The ``arrow-dataset`` module can execute Substrait_ plans via the :doc:`Acero <../cpp/streaming_execution>` | ||
| query engine. | ||
|
|
||
| Executing Substrait Plans | ||
| ========================= | ||
|
|
||
| Plans can reference data in files via URIs, or "named tables" that must be provided along with the plan. | ||
|
|
||
| Here is an example of a Java program that queries a Parquet file using Java Substrait | ||
| (this example use `Substrait Java`_ project to compile a SQL query to a Substrait plan): | ||
|
|
||
| .. code-block:: Java | ||
|
|
||
| import com.google.common.collect.ImmutableList; | ||
| import io.substrait.isthmus.SqlToSubstrait; | ||
| import io.substrait.proto.Plan; | ||
| import org.apache.arrow.dataset.file.FileFormat; | ||
| import org.apache.arrow.dataset.file.FileSystemDatasetFactory; | ||
| import org.apache.arrow.dataset.jni.NativeMemoryPool; | ||
| import org.apache.arrow.dataset.scanner.ScanOptions; | ||
| import org.apache.arrow.dataset.scanner.Scanner; | ||
| import org.apache.arrow.dataset.source.Dataset; | ||
| import org.apache.arrow.dataset.source.DatasetFactory; | ||
| import org.apache.arrow.dataset.substrait.AceroSubstraitConsumer; | ||
| import org.apache.arrow.memory.BufferAllocator; | ||
| import org.apache.arrow.memory.RootAllocator; | ||
| import org.apache.arrow.vector.ipc.ArrowReader; | ||
| import org.apache.calcite.sql.parser.SqlParseException; | ||
|
|
||
| import java.nio.ByteBuffer; | ||
| import java.util.HashMap; | ||
| import java.util.Map; | ||
|
|
||
| public class ClientSubstrait { | ||
| public static void main(String[] args) { | ||
| String uri = "file:///data/tpch_parquet/nation.parquet"; | ||
| ScanOptions options = new ScanOptions(/*batchSize*/ 32768); | ||
| try ( | ||
| BufferAllocator allocator = new RootAllocator(); | ||
| DatasetFactory datasetFactory = new FileSystemDatasetFactory(allocator, NativeMemoryPool.getDefault(), | ||
| FileFormat.PARQUET, uri); | ||
| Dataset dataset = datasetFactory.finish(); | ||
| Scanner scanner = dataset.newScan(options); | ||
| ArrowReader reader = scanner.scanBatches() | ||
| ) { | ||
| // map table to reader | ||
| Map<String, ArrowReader> mapTableToArrowReader = new HashMap<>(); | ||
| mapTableToArrowReader.put("NATION", reader); | ||
| // get binary plan | ||
| Plan plan = getPlan(); | ||
| ByteBuffer substraitPlan = ByteBuffer.allocateDirect(plan.toByteArray().length); | ||
| substraitPlan.put(plan.toByteArray()); | ||
| // run query | ||
| try (ArrowReader arrowReader = new AceroSubstraitConsumer(allocator).runQuery( | ||
| substraitPlan, | ||
| mapTableToArrowReader | ||
| )) { | ||
| while (arrowReader.loadNextBatch()) { | ||
| System.out.println(arrowReader.getVectorSchemaRoot().contentToTSVString()); | ||
| } | ||
| } | ||
| } catch (Exception e) { | ||
| e.printStackTrace(); | ||
| } | ||
| } | ||
|
|
||
| static Plan getPlan() throws SqlParseException { | ||
| String sql = "SELECT * from nation"; | ||
| String nation = "CREATE TABLE NATION (N_NATIONKEY BIGINT NOT NULL, N_NAME CHAR(25), " + | ||
| "N_REGIONKEY BIGINT NOT NULL, N_COMMENT VARCHAR(152))"; | ||
| SqlToSubstrait sqlToSubstrait = new SqlToSubstrait(); | ||
| Plan plan = sqlToSubstrait.execute(sql, ImmutableList.of(nation)); | ||
| return plan; | ||
| } | ||
| } | ||
|
|
||
| .. code-block:: text | ||
|
|
||
| // Results example: | ||
| FieldPath(0) FieldPath(1) FieldPath(2) FieldPath(3) | ||
| 0 ALGERIA 0 haggle. carefully final deposits detect slyly agai | ||
| 1 ARGENTINA 1 al foxes promise slyly according to the regular accounts. bold requests alon | ||
|
|
||
| .. _`Substrait`: https://substrait.io/ | ||
| .. _`Substrait Java`: https://github.com/substrait-io/substrait-java | ||
| .. _`Acero`: https://arrow.apache.org/docs/cpp/streaming_execution.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.