Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL][UNIFFLE] DRA does not work in gluten with uniffle #7559

Closed
wForget opened this issue Oct 16, 2024 · 0 comments · Fixed by #7560
Closed

[VL][UNIFFLE] DRA does not work in gluten with uniffle #7559

wForget opened this issue Oct 16, 2024 · 0 comments · Fixed by #7560
Labels
bug Something isn't working triage

Comments

@wForget
Copy link
Member

wForget commented Oct 16, 2024

Backend

VL (Velox)

Bug description

DRA does not work in gluten with uniffle.

spark confs:

spark.shuffle.manager=org.apache.spark.shuffle.gluten.uniffle.UniffleShuffleManager;
spark.shuffle.sort.io.plugin.class=org.apache.spark.shuffle.RssShuffleDataIo;
spark.dynamicAllocation.shuffleTracking.enabled=false;
spark.dynamicAllocation.enabled=true;

error:

24/10/16 15:47:47 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Dynamic allocation of executors requires one of the following conditions: 1) enabling external shuffle service through spark.shuffle.service.enabled. 2) enabling shuffle tracking through spark.dynamicAllocation.shuffleTracking.enabled. 3) enabling shuffle blocks decommission through spark.decommission.enabled and spark.storage.decommission.shuffleBlocks.enabled. 4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage.
	at org.apache.spark.ExecutorAllocationManager.validateSettings(ExecutorAllocationManager.scala:221)
	at org.apache.spark.ExecutorAllocationManager.<init>(ExecutorAllocationManager.scala:136)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:660)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1093)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.createSpark(SparkSQLEngine.scala:303)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:377)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:738)

We should set RSS_ENABLED to true in the UniffleShuffleManager, because uniffle uses RSS_ENABLED conf and shuffle manager class to determine whether it supports reliableStorage:

https://github.com/apache/incubator-uniffle/blob/a36261296b05d72e4a774d9c9555cc12b922be97/client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleDriverComponents.java#L37-L42

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant