Skip to content

Conversation

@rickyma
Copy link
Contributor

@rickyma rickyma commented Mar 7, 2024

What changes were proposed in this pull request?

When we release the shaded client jar for Spark 2.x, the class org.apache.spark.network.util.NettyUtils.class should not be included in the package.

Why are the changes needed?

Fix #1567.
&
It's also a followup PR for #727.

When running in Spark 2.4, we will encounter exceptions as below:

24/03/07 16:34:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(tdwadmin); groups with view permissions: Set(); users  with modify permissions: Set(tdwadmin); groups with modify permissions: Set()
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.NettyUtils.createEventLoop(Lorg/apache/spark/network/util/IOMode;ILjava/lang/String;)Lio/netty/channel/EventLoopGroup;
	at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:104)
	at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:89)
	at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:70)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:449)
	at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:56)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:264)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:271)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:474)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.<init>(SQLApplicationMaster.scala:96)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.<init>(SQLApplicationMaster.scala:53)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:544)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:543)
	at org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)

This is because the return value of createEventLoop in NettyUtils within Uniffle is org.apache.uniffle.io.netty.channel.EventLoopGroup (which is shaded), while the return value of createEventLoop in NettyUtils within Spark is io.netty.channel.EventLoopGroup. When running a Spark application, the Driver loads NettyUtils from the rss-client's JAR, causing inconsistency in the method's return values and ultimately leading to a NoSuchMethodError exception.

We should let Spark use its own NettyUtils instead of ours.

However, if we simply remove the org.apache.spark.network.util.NettyUtils file from the code repository, we will encounter errors when running integration tests.

java.lang.RuntimeException: java.lang.NoSuchFieldException: DEFAULT_TINY_CACHE_SIZE
	at org.apache.spark.network.util.NettyUtils.getPrivateStaticField(NettyUtils.java:131)
	at org.apache.spark.network.util.NettyUtils.createPooledByteBufAllocator(NettyUtils.java:118)
	at org.apache.spark.network.server.TransportServer.init(TransportServer.java:94)
	at org.apache.spark.network.server.TransportServer.<init>(TransportServer.java:73)
	at org.apache.spark.network.TransportContext.createServer(TransportContext.java:114)
	at org.apache.spark.rpc.netty.NettyRpcEnv.startServer(NettyRpcEnv.scala:119)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory$$anonfun$4.apply(NettyRpcEnv.scala:465)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory$$anonfun$4.apply(NettyRpcEnv.scala:464)
	at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2275)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
	at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2267)
	at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:469)
	at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:249)
	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:256)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:423)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
	at org.apache.uniffle.test.SparkIntegrationTestBase.runSparkApp(SparkIntegrationTestBase.java:92)
	at org.apache.uniffle.test.SparkIntegrationTestBase.run(SparkIntegrationTestBase.java:53)
	at org.apache.uniffle.test.RSSStageResubmitTest.testRSSStageResubmit(RSSStageResubmitTest.java:86)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

This is because our code project globally controls the version of Netty in the root pom.xml's dependencyManagement, which leads to Spark2's own lower version of Netty being upgraded to a higher version. This causes exceptions due to Netty version incompatibility, resulting in certain fields not being found. This issue does not occur in the production environment, as Spark has its own NettyUtils and does not need to use our provided NettyUtils. Retaining org.apache.spark.network.util.NettyUtils is somewhat of a workaround for passing integration tests. But given that Spark2 is not frequently updated anymore, maintaining a static version of NettyUtils should not pose a significant problem.

Of course, the optimal approach would be to shade our own Netty during integration testing, allowing Spark to continue using its own Netty dependency, effectively separating the two. This would provide the most accurate testing, as any changes in Spark2's Netty version could be verified through unit tests. However, this would mean that a large amount of integration test code would need to prefix org.apache.uniffle to the import statements where Netty is used. Ultimately, this could lead to significant redundancy in the code and cause confusion for those who write codes in the future.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

@codecov-commenter
Copy link

codecov-commenter commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.66%. Comparing base (d994b27) to head (dfe4beb).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1565      +/-   ##
============================================
+ Coverage     53.72%   54.66%   +0.94%     
- Complexity     2838     2840       +2     
============================================
  Files           437      417      -20     
  Lines         24663    22302    -2361     
  Branches       2094     2094              
============================================
- Hits          13250    12192    -1058     
+ Misses        10581     9347    -1234     
+ Partials        832      763      -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

github-actions bot commented Mar 7, 2024

Test Results

 2 326 files  ±0  2 326 suites  ±0   4h 29m 48s ⏱️ +22s
   851 tests ±0    849 ✅ ±0   2 💤 ±0  0 ❌ ±0 
10 018 runs  ±0  9 995 ✅ ±0  23 💤 ±0  0 ❌ ±0 

Results for commit dfe4beb. ± Comparison against base commit d994b27.

♻️ This comment has been updated with latest results.

@rickyma rickyma force-pushed the followup-for-pr-727 branch 2 times, most recently from 55668e2 to b7dab8d Compare March 7, 2024 11:38
@rickyma rickyma marked this pull request as draft March 7, 2024 11:40
@rickyma rickyma force-pushed the followup-for-pr-727 branch 8 times, most recently from b2aac37 to 28686e6 Compare March 7, 2024 15:35
@rickyma rickyma changed the title [#133][FOLLOWUP] fix(netty): Let Spark use its own NettyUtils [#133][FOLLOWUP] fix(netty): Let Spark2 use its own NettyUtils Mar 7, 2024
@rickyma rickyma marked this pull request as ready for review March 7, 2024 15:56
@rickyma
Copy link
Contributor Author

rickyma commented Mar 7, 2024

@jerqi @zuston PTAL.

@rickyma rickyma changed the title [#133][FOLLOWUP] fix(netty): Let Spark2 use its own NettyUtils [#133][FOLLOWUP] fix(spark): Let Spark2 use its own NettyUtils Mar 7, 2024
@rickyma rickyma force-pushed the followup-for-pr-727 branch 2 times, most recently from 4af531b to 1ced5e7 Compare March 7, 2024 16:22
@rickyma rickyma closed this Mar 7, 2024
@rickyma rickyma reopened this Mar 7, 2024
@rickyma rickyma marked this pull request as draft March 8, 2024 03:54
@rickyma rickyma force-pushed the followup-for-pr-727 branch from 1ced5e7 to 2fa1a11 Compare March 8, 2024 03:58
@rickyma rickyma marked this pull request as ready for review March 8, 2024 03:58
to="lib${rss.shade.native.packageName}_netty_transport_native_kqueue_aarch_64.jnilib"
type="glob"></mapper>
</move>
<!-- Delete NettyUtils, let Spark use its own NettyUtils -->
Copy link
Contributor

@jerqi jerqi Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the comment to explain why we do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rickyma rickyma requested a review from jerqi March 8, 2024 05:17
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cc @zuston

@rickyma rickyma changed the title [#133][FOLLOWUP] fix(spark): Let Spark2 use its own NettyUtils [#1567] fix(spark): Let Spark use its own NettyUtils Mar 8, 2024
@jerqi jerqi merged commit 51b8581 into apache:master Mar 11, 2024
@rickyma rickyma deleted the followup-for-pr-727 branch May 5, 2024 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] NoSuchMethodError will be encountered when using Spark 2.x

3 participants