Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Feb 24, 2022

This is a backport of #34362 to branch 3.2.

What changes were proposed in this pull request?

This PR ported HIVE-21498, HIVE-25098 and upgraded libthrift to 0.16.0.

The CHANGES list for libthrift 0.16.0 is available at: https://github.com/apache/thrift/blob/v0.16.0/CHANGES.md

Why are the changes needed?

To address CVE-2020-13949.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test.

14:53:54.715 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "HiveServer2-Handler-Pool: Thread-164" java.lang.NoClassDefFoundError: org/apache/thrift/transport/TFramedTransport
  | => hat java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1708)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:131)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3607)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3659)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3639)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3901)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:395)
        at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:339)
        at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:319)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
@srowen
Copy link
Member

srowen commented Feb 25, 2022

Weird, is this running tests? I don't see the test workflow executed here.

@wangyum
Copy link
Member Author

wangyum commented Feb 26, 2022

Triggered the test.

@srowen
Copy link
Member

srowen commented Feb 26, 2022

Merged to 3.2

srowen pushed a commit that referenced this pull request Feb 26, 2022
…ty vulnerabilities

This is a backport of #34362 to branch 3.2.

### What changes were proposed in this pull request?

This PR ported HIVE-21498, HIVE-25098 and upgraded libthrift to 0.16.0.

The CHANGES list for libthrift 0.16.0 is available at: https://github.com/apache/thrift/blob/v0.16.0/CHANGES.md

### Why are the changes needed?

To address [CVE-2020-13949](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test.

Closes #35646 from wangyum/SPARK-37090-branch-3.2.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
@wangyum wangyum deleted the SPARK-37090-branch-3.2 branch February 27, 2022 00:21
@sunchao
Copy link
Member

sunchao commented Mar 2, 2022

Hi @wangyum @srowen , after picking up this change and deploy it internally, we found an issue that seems to be related:

Receiver class org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport does not define or inherit an implementation of the resolved method 'abstract void checkReadBytesAvailable(long)' of abstract class org.apache.thrift.transport.TTransport.
org.apache.thrift.protocol.TBinaryProtocol.checkStringReadLength(TBinaryProtocol.java:444)
org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:415)
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:411)
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:251)
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1514)
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1500)
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1370)
org.apache.iceberg.hive.HiveTableOperations.lambda$doRefresh$0(HiveTableOperations.java:182)
org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:182)
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:94)

still investigating what exactly caused the error though. The TUGIAssumingTransport class is from Hive client side and seems it's used when talking to a secure HMS via either delegation token or kerberos.

@srowen
Copy link
Member

srowen commented Mar 2, 2022

This seems like a Hive version issue - what are you using?

@sunchao
Copy link
Member

sunchao commented Mar 2, 2022

We are using the same Hive version 2.3.9 as in upstream Spark. I checked TUGIAssumingTransport there and it doesn't implement checkReadBytesAvailable which is from Thrift 0.14 and up.

@wangyum
Copy link
Member Author

wangyum commented Mar 2, 2022

@sunchao Could we backport https://issues.apache.org/jira/browse/HIVE-21498 and https://issues.apache.org/jira/browse/HIVE-25098 to branch-2.3 and release a new version?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 2, 2022

Hi, @wangyum . Thanks, but, before releasing a new Hive, we need to revert libthrift 0.16.0 from master/3.2/3.1.

Apache Spark 3.3 branch cut is planned on March 15th. We have only two weeks.

Could you revert SPARK-37090 from master/3.2/3.1 as a committer and author, please?

@srowen
Copy link
Member

srowen commented Mar 2, 2022

(Could someone briefly explain the issue - what's different about where it fails than what the tests run? not something that can be just fixed-forward?)

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 2, 2022

Hi, @srowen .

  • As of today, Apache Hive 2.3.x is built with libthrift 0.9.3 and it turned out that it is incompatible with libthrift 0.14.1+.
<libthrift.version>0.9.3</libthrift.version>

Given the size of Hive patches, I don't think we can afford these in Apache Spark 3.3/3.2/3.1.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 2, 2022

To be clear, I didn't take a look at those two huge patches, but I'm not sure those patches are able to land at Apache Hive 2.3.10.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 2, 2022

@srowen . For your questions, @sunchao found that libthrift 0.14.1+ requires an implementation of abstract method, checkReadBytesAvailable, and Apache Hive branch-2.3 didn't implemented it.

org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport does not define or inherit an implementation of the resolved method 'abstract void checkReadBytesAvailable(long)' of abstract class org.apache.thrift.transport.TTransport.
org.apache.thrift.protocol.TBinaryProtocol.checkStringReadLength(TBinaryProtocol.java:444)

@sunchao
Copy link
Member

sunchao commented Mar 2, 2022

Could we backport https://issues.apache.org/jira/browse/HIVE-21498 and https://issues.apache.org/jira/browse/HIVE-25098 to branch-2.3 and release a new version?

@wangyum I can give it a try but it could be challenging given the amount of changes in these two. It'll take some time and most likely won't be ready before Spark 3.3 release.

... what's different about where it fails than what the tests run?

@srowen I think this scenario is not covered in any test in Spark - it requires a remote secure HMS but Spark Hive tests only use embedded HMS through Derby.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Mar 2, 2022

Ya, @sunchao is right.

To @srowen and @wangyum . SPARK-37090 is not released yet. I'm going to revert this from all branches.

We can land this back. However, before that, at least, we need to make it sure that the following three new abstract methods (at apache/thrift@63213c1 ) are handled correctly.

  public abstract TConfiguration getConfiguration();

  public abstract void updateKnownMessageSize(long size) throws TTransportException;

  public abstract void checkReadBytesAvailable(long numBytes) throws TTransportException;

@dongjoon-hyun
Copy link
Member

In addition, I added those two Hive 4.0 JIRA links as a blocker for SPARK-37090.

@wangyum
Copy link
Member Author

wangyum commented Mar 2, 2022

Thank you @dongjoon-hyun This PR(apache/hive#3066) tries to backport HIVE-21498 and HIVE-25098 to branch-2.3.

@dongjoon-hyun
Copy link
Member

Thank you, @wangyum !

@dongjoon-hyun
Copy link
Member

I added my comment on your Hive PR, too.

Just a question: What about branch-3.1 and branch-3.0? Are these patches released officially in any Apache Hive artifacts after vote?

kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…ty vulnerabilities

This is a backport of apache#34362 to branch 3.2.

This PR ported HIVE-21498, HIVE-25098 and upgraded libthrift to 0.16.0.

The CHANGES list for libthrift 0.16.0 is available at: https://github.com/apache/thrift/blob/v0.16.0/CHANGES.md

To address [CVE-2020-13949](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-13949).

No.

Existing test.

Closes apache#35646 from wangyum/SPARK-37090-branch-3.2.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants