Error while reading mounted xlsx: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable #438

ghost · 2021-10-15T15:58:58Z

I am using Azure Databricks and I am trying to read an Excel file (xlsx) from a Storage account (ADLS Gen2). Because I get an 'Anonymous access' error when I connect to the file using the wasbs path I mounted it and tried to read the excel from there. This is my code:

`df = spark.read
.format("csv")
.option("header", "true")
.option("delimiter", ";")
.load("/mnt/mountPoint/Budget.csv")

df = spark.read
.format("com.crealytics.spark.excel")
.option("header", "true")
.option("sheetName", "Sheet1")
.load("/mnt/mountPoint/Budget.xls")

df = spark.read
.format("com.crealytics.spark.excel")
.option("header", "true")
.option("sheetName", "Sheet1")
.load("/mnt/mountPoint/Budget.xlsx") `

The first command succeeds and I get the headers from the file. A df.show() will show me the content. The second command (using the xls) succeeds as well and I get the schema and content. The third command fails with this error:
java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable

I am using Databricks runtime 8.3 with Apache Spark 3.1.1 and Scala 2.12. What I have tried so far (all with the same error):

Different version of the crealytics library. I tries 14.0, 13.7 and 13.6. All of them for scala 2.12
The above code is in Python; I also tried it in scala
I copied the content of the file (just the cells with data) to a new file and stored as xlsx and xls.
Use different sheet names. The file has just one sheet named 'Sheet1'

This this the full stack trace. Any help is very much appreciated!'
`---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
in
11 .load("/mnt/mountPoint/Budget.xls")
12
---> 13 df = spark.read
14 .format("com.crealytics.spark.excel")
15 .option("header", "true") \

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
202 self.options(**options)
203 if isinstance(path, str):
--> 204 return self._df(self._jreader.load(path))
205 elif path is not None:
206 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o714.load.
: java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable
at shadeio.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:61)
at shadeio.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:684)
at shadeio.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:180)
at shadeio.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:288)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:97)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:147)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:256)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)
at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49)
at scala.Option.fold(Option.scala:251)
at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13)
at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45)
at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:102)
at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:101)
at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:163)
at scala.Option.getOrElse(Option.scala:189)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:162)
at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:432)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:399)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:399)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at sun.reflect.GeneratedMethodAccessor274.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)`

udossa · 2021-11-18T06:37:55Z

Hi guys, any update on this error? I have the same issue

quanghgx · 2021-11-18T12:33:59Z

Hi @thijsnijhuis and @udossa

Could you please try again with the format from: "com.crealytics.spark.excel" -> "excel"?

    .format("excel")

And, please help take a look for list of dependencies for spark-excel to work. This wiki might has some useful idea

Credit to #133 Apache commons dependency issue by @jakeatmsft and @fwani solution

ghost · 2021-12-10T17:03:54Z

@quanghgx , thanks for your reply.
I have changed it but now I simply get this eror:
java.lang.ClassNotFoundException: Failed to find data source: excel. Please find packages at http://spark.apache.org/third-party-projects.html

I will need to take a look at the wiki link later on. Thanks!

fwani · 2021-12-13T05:47:58Z

@thijsnijhuis
I think, you should add a dependency for excel that is com.crealytics:spark-excel_2.12 with specific version, first.
(because the error is java.lang.ClassNotFoundException: Failed to find data source: excel)
https://github.com/crealytics/spark-excel#linking

abhisrphoenix · 2022-06-10T16:27:00Z

Please try and change the library installation to Maven, that resolved my issue.

quanghgx added the cloud Usage of spark-excel on cloud storage & platform label Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while reading mounted xlsx: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable #438

Error while reading mounted xlsx: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable #438

ghost commented Oct 15, 2021

udossa commented Nov 18, 2021

quanghgx commented Nov 18, 2021

ghost commented Dec 10, 2021

fwani commented Dec 13, 2021

abhisrphoenix commented Jun 10, 2022

Error while reading mounted xlsx: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable #438

Error while reading mounted xlsx: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable #438

Comments

ghost commented Oct 15, 2021

udossa commented Nov 18, 2021

quanghgx commented Nov 18, 2021

ghost commented Dec 10, 2021

fwani commented Dec 13, 2021

abhisrphoenix commented Jun 10, 2022