Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary outputCommitter setting #465

Merged

Conversation

asalamon74
Copy link
Contributor

Description

I've tried to use the hive integration using with hive on tez, but got the following error:

java.lang.RuntimeException: java.lang.RuntimeException: class org.opensearch.hadoop.mr.OpenSearchOutputFormat$OpenSearchOutputCommitter not org.apache.hadoop.mapred.OutputCommitter
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2734)
        at org.apache.tez.mapreduce.committer.MROutputCommitter.getOutputCommitter(MROutputCommitter.java:143)
        at org.apache.tez.mapreduce.committer.MROutputCommitter.initialize(MROutputCommitter.java:82)
        at org.apache.tez.dag.app.dag.impl.VertexImpl$2.run(VertexImpl.java:2452)
        at org.apache.tez.dag.app.dag.impl.VertexImpl$2.run(VertexImpl.java:2431)
        at java.security.AccessController.doPrivileged(Native Method)

The code assumed that we want to use the old api and not the new api: https://github.infra.cloudera.com/CDH/tez/blob/cdpd-master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/committer/MROutputCommitter.java#L117-L124

I tried to specify that I want to use the new API but I only got a new error message:

ERROR : Failed to execute tez graph.
java.lang.RuntimeException: java.lang.RuntimeException: class org.opensearch.hadoop.mr.OpenSearchOutputFormat$OpenSearchOutputCommitter not org.apache.hadoop.mapred.OutputCommitter
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2734) 
        at org.apache.hadoop.mapred.JobConf.getOutputCommitter(JobConf.java:725) 
        at java.util.Optional.map(Optional.java:215)
        at org.apache.hadoop.hive.ql.exec.tez.TezTask.collectCommitInformation(TezTask.java:381) 

I think at this point we require the old api: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java#L366

Later I realised that the problem is here: https://github.com/opensearch-project/opensearch-hadoop/blob/main/hive/src/main/java/org/opensearch/hadoop/hive/OpenSearchStorageHandler.java#L120

Configuration cfg = getConf();
        // NB: we can't just merge the table properties in, we need to save them per input/output otherwise clashes occur which confuse Hive

        Settings settings = HadoopSettingsManager.loadFrom(cfg);
        //settings.setProperty((read ? HiveConstants.INPUT_TBL_PROPERTIES : HiveConstants.OUTPUT_TBL_PROPERTIES), IOUtils.propsToString(tableDesc.getProperties()));
        if (read) {
            // no generic setting
        }
        else {
            // replace the default committer when using the old API
            HadoopCfgUtils.setOutputCommitterClass(cfg, OpenSearchOutputFormat.OpenSearchOutputCommitter.class.getName());
        }

We do specify the outputcommitter class, but this is useless, it is already set implicitly. And the comment is misleading, according to the comment we use the old API, but this is the new API. And I don't know why we check the read property here, it has noting to do with the old/new API.

Issues Resolved

Removing this codepart solved the problem for me.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Andras Salamon <andras.salamon@melda.info>
@asalamon74 asalamon74 force-pushed the outputcommitter_cleanup branch from 640352b to 9c93884 Compare May 22, 2024 07:08
@harshavamsi harshavamsi merged commit fbd9d31 into opensearch-project:main Sep 5, 2024
14 checks passed
dgoldenberg-ias pushed a commit to dgoldenberg-ias/opensearch-hadoop that referenced this pull request Sep 11, 2024
Signed-off-by: Andras Salamon <andras.salamon@melda.info>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants