issue with spark-excel jar #344

tufanrakshit · 2021-02-11T18:33:32Z

Hi ,
I am using the below format to read multitab , xlsx file .
I am using spark version 3.0.1 and scala 2.11 and spark-excel_2.11-0.8.2.jar .
Previously i tried with 2.13.1 , but failed as well

snippet :
val df_cmdb = spark
.read
.format("com.crealytics.spark.excel")
.option("dataAddress", "'CMDB'!B3:C35") // Optional, default: "A1"
.option("header", "true") // Required
.option("treatEmptyValuesAsNulls", "false") // Optional, default: true
.option("usePlainNumberFormat", "false") // Optional, default: false, If true, format the cells without rounding and scientific notations
.option("inferSchema", "false") // Optional, default: false
.option("addColorColumns", "true") // Optional, default: false
.option("timestampFormat", "MM-dd-yyyy HH:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss[.fffffffff]
.option("maxRowsInMemory", 20) // Optional, default None. If set, uses a streaming reader which can help with big files
.option("excerptSize", 10) // Optional, default: 10. If set and if schema inferred, number of rows to infer schema from
//.option("workbookPassword", "pass") // Optional, default None. Requires unlimited strength JCE for older JVMs
//.schema(myCustomSchema) // Optional, default: Either inferred schema, or all columns are Strings
.load("C:\Project\MSF_RP_SCES_Backup_Report_PWC_20210202.xlsx")

Error : Exception in thread "main" java.lang.IllegalArgumentException: Parameter "location" is missing in options.

what should be version of scala and jar file I should use ?
can you please also provide the location of the jar .

many thanks

nightscape · 2021-02-11T19:06:26Z

You definitely need double backslashes \\ in the path. Not sure if that is the problem though.

tufanrakshit · 2021-02-11T20:26:27Z

what is the latest version of the Jar that is available in maven central please ? Also Does it compatible with Spark 3.0.1 and what is the scala version I should use ?

nightscape · 2021-02-11T20:33:51Z

https://github.com/crealytics/spark-excel#scala-212

tufanrakshit · 2021-02-12T14:43:01Z

you were right , it was my mistake in the path .
I have done a few things , i have Spark version 3.0.1 , Hadoop 3.2 , Scala 2.12.13 , and the jar I am using is spark-excel_2.12-0.13.6 .
The jar is added in my module

I am getting the stack trace
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:97)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:147)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:256)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)
at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49)
at scala.Option.fold(Option.scala:251)
at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13)
at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45)
at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:102)
at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:101)
at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:163)
at scala.Option.getOrElse(Option.scala:189)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:162)
at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:232)
at Data_Import$.main(Data_Import.scala:35)
at Data_Import.main(Data_Import.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 29 more

nightscape · 2021-02-12T23:43:17Z

Have you manually added the jar to the project?
I would not recommend that. Use a build tool like SBT, Gradle or Maven to resolve the dependencies properly.

quanghgx · 2021-08-24T15:43:01Z

Hi @tufanrakshit

It seems that, the latest issue is about the dependency Jars.
There are solution from @jakeatmsft and @fwani in #133 and we also put this into a wiki page here.

Please take a look and feel free to reopen this ticket in case of further issue.

tufanrakshit · 2021-08-25T06:35:06Z

thanks

…

On Tue, Aug 24, 2021 at 5:43 PM Quang Hoang Xuan ***@***.***> wrote: Hi @tufanrakshit <https://github.com/tufanrakshit> It seems that, the latest issue is about the dependency Jars. There are solution from @jakeatmsft <https://github.com/jakeatmsft> and @fwani <https://github.com/fwani> in #133 <#133> and we also put this into a wiki page here <https://github.com/crealytics/spark-excel/wiki#dependencies>. Please take a look and feel free to reopen this ticket in case of further issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#344 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZXD2JO2TJMGJOCVLDWTVDT6O4ZDANCNFSM4XPKQJKQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

quanghgx closed this as completed Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with spark-excel jar #344

issue with spark-excel jar #344

tufanrakshit commented Feb 11, 2021

nightscape commented Feb 11, 2021

tufanrakshit commented Feb 11, 2021

nightscape commented Feb 11, 2021

tufanrakshit commented Feb 12, 2021

nightscape commented Feb 12, 2021

quanghgx commented Aug 24, 2021

tufanrakshit commented Aug 25, 2021 via email

issue with spark-excel jar #344

issue with spark-excel jar #344

Comments

tufanrakshit commented Feb 11, 2021

nightscape commented Feb 11, 2021

tufanrakshit commented Feb 11, 2021

nightscape commented Feb 11, 2021

tufanrakshit commented Feb 12, 2021

nightscape commented Feb 12, 2021

quanghgx commented Aug 24, 2021

tufanrakshit commented Aug 25, 2021 via email