-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use spark-excel_2.11-0.13.5.jar setup Azure Synapse Spark Pool batch job? #282
Comments
Hi Yang, I haven't worked with Azure Synapse, so I can't provide great insights...
because |
@nightscape Thank you for your reply. I want to know the Main Class of "spark-excel_2.11-0.13.5.jar". How should I check? |
@nightscape java -jar "C:\Users\Administrator\Downloads\spark-excel_2.11-0.13.5.jar" At "C:\Users\Administrator\Downloads\spark-excel_2.11-0.13.5\META-INF\MANIFEST.MF" has not
|
Hi @yang-jiayi, spark-excel does not have a |
@nightscape |
Hi @yang-jiayi, you shouldn't have to rebuild spark-excel as standalone JAR with main class. |
Hi @nightscape |
Hi @yang-jiayi, you'd have to search for instructions specific to your programming language and build tool. |
Hi @nightscape |
@yang-jiayi |
Hi.
Azure Synapse Spark Pool does support importing third party packages.
Reference URL:.
https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-job-definitions#create-an-apache-spark-job-definition-for-apache-sparkscala
In this article (https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries),
you will find ".jar based packages can be added at the Spark job definition level." states,
so I believe it can be an extension of the existing Spark Pool.
I downloaded spark-excel_2.11-0.13.5.jar from this URL (https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.11/0.13.5).
In Azure Synapse Studio, I created a Spark job against the Spark Pool.
In the Main definition file value, I entered the ADLS Gen2 address (abfss://rawdata@xyz.dfs.core.windows.net/SparkExcelLibrary/spark-excel_2.11-0.13.5. jar) for the value of the main definition file.
In the Main class name value, I entered com.createalytics.spark.excel.
Unfortunately, I got an error.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.99.201-15911041/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.99.201-15911041/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/08/16 16:30:37 INFO SignalUtils: Registered signal handler for TERM
20/08/16 16:30:37 INFO SignalUtils: Registered signal handler for HUP
20/08/16 16:30:37 INFO SignalUtils: Registered signal handler for INT
20/08/16 16:30:37 INFO SecurityManager: Changing view acls to: trusted-service-user
20/08/16 16:30:37 INFO SecurityManager: Changing modify acls to: trusted-service-user
20/08/16 16:30:37 INFO SecurityManager: Changing view acls groups to:
20/08/16 16:30:37 INFO SecurityManager: Changing modify acls groups to:
20/08/16 16:30:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(trusted-service-user); groups with view permissions: Set(); users with modify permissions: Set(trusted-service-user); groups with modify permissions: Set()
20/08/16 16:30:37 INFO ApplicationMaster: Preparing Local resources
20/08/16 16:30:38 INFO MetricsConfig: loaded properties from hadoop-metrics2.properties
20/08/16 16:30:38 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
20/08/16 16:30:38 INFO MetricsSystemImpl: azure-file-system metrics system started
20/08/16 16:30:38 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1597593360079_0004_000001
20/08/16 16:30:39 INFO ApplicationMaster: Starting the user application in a separate Thread
20/08/16 16:30:39 ERROR ApplicationMaster: Uncaught exception:
java.lang.ClassNotFoundException: com.crealytics.spark.excel
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:461)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
20/08/16 16:30:39 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: com.crealytics.spark.excel
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:674)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:461)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
)
20/08/16 16:30:39 INFO ShutdownHookManager: Shutdown hook called
20/08/16 16:30:39 INFO MetricsSystemImpl: Stopping azure-file-system metrics system...
20/08/16 16:30:39 INFO MetricsSystemImpl: azure-file-system metrics system stopped.
20/08/16 16:30:39 INFO MetricsSystemImpl: azure-file-system metrics system shutdown complete.
End of LogType:stderr
Question:
How can I correctly add the Spark-excel_2.11-0.13.5.jar to the Azure Synapse Spark Pool?
I'm waiting for your advice, and wait your reply.
Thanks.
Best Regards,
Yang
The text was updated successfully, but these errors were encountered: