You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Azure Databricks and I am trying to read an Excel file (xlsx) from a Storage account (ADLS Gen2). Because I get an 'Anonymous access' error when I connect to the file using the wasbs path I mounted it and tried to read the excel from there. This is my code:
The first command succeeds and I get the headers from the file. A df.show() will show me the content. The second command (using the xls) succeeds as well and I get the schema and content. The third command fails with this error:
java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable
I am using Databricks runtime 8.3 with Apache Spark 3.1.1 and Scala 2.12. What I have tried so far (all with the same error):
Different version of the crealytics library. I tries 14.0, 13.7 and 13.6. All of them for scala 2.12
The above code is in Python; I also tried it in scala
I copied the content of the file (just the cells with data) to a new file and stored as xlsx and xls.
Use different sheet names. The file has just one sheet named 'Sheet1'
This this the full stack trace. Any help is very much appreciated!'
`---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
in
11 .load("/mnt/mountPoint/Budget.xls")
12
---> 13 df = spark.read
14 .format("com.crealytics.spark.excel")
15 .option("header", "true") \
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
202 self.options(**options)
203 if isinstance(path, str):
--> 204 return self._df(self._jreader.load(path))
205 elif path is not None:
206 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o714.load.
: java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable
at shadeio.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:61)
at shadeio.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:684)
at shadeio.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:180)
at shadeio.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:288)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:97)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:147)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:256)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)
at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49)
at scala.Option.fold(Option.scala:251)
at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13)
at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45)
at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:102)
at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:101)
at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:163)
at scala.Option.getOrElse(Option.scala:189)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:162)
at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:432)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:399)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:399)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at sun.reflect.GeneratedMethodAccessor274.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)`
The text was updated successfully, but these errors were encountered:
@thijsnijhuis
I think, you should add a dependency for excel that is com.crealytics:spark-excel_2.12 with specific version, first.
(because the error is java.lang.ClassNotFoundException: Failed to find data source: excel) https://github.com/crealytics/spark-excel#linking
I am using Azure Databricks and I am trying to read an Excel file (xlsx) from a Storage account (ADLS Gen2). Because I get an 'Anonymous access' error when I connect to the file using the wasbs path I mounted it and tried to read the excel from there. This is my code:
`df = spark.read
.format("csv")
.option("header", "true")
.option("delimiter", ";")
.load("/mnt/mountPoint/Budget.csv")
df = spark.read
.format("com.crealytics.spark.excel")
.option("header", "true")
.option("sheetName", "Sheet1")
.load("/mnt/mountPoint/Budget.xls")
df = spark.read
.format("com.crealytics.spark.excel")
.option("header", "true")
.option("sheetName", "Sheet1")
.load("/mnt/mountPoint/Budget.xlsx") `
The first command succeeds and I get the headers from the file. A df.show() will show me the content. The second command (using the xls) succeeds as well and I get the schema and content. The third command fails with this error:
java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable
I am using Databricks runtime 8.3 with Apache Spark 3.1.1 and Scala 2.12. What I have tried so far (all with the same error):
This this the full stack trace. Any help is very much appreciated!'
`---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
in
11 .load("/mnt/mountPoint/Budget.xls")
12
---> 13 df = spark.read
14 .format("com.crealytics.spark.excel")
15 .option("header", "true") \
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
202 self.options(**options)
203 if isinstance(path, str):
--> 204 return self._df(self._jreader.load(path))
205 elif path is not None:
206 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o714.load.
: java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.xssf.model.SharedStringsTable
at shadeio.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:61)
at shadeio.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:684)
at shadeio.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:180)
at shadeio.poi.xssf.usermodel.XSSFWorkbook.(XSSFWorkbook.java:288)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:97)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:147)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:256)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)
at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:49)
at scala.Option.fold(Option.scala:251)
at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:49)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:14)
at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:13)
at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:45)
at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:31)
at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:102)
at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:101)
at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:163)
at scala.Option.getOrElse(Option.scala:189)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:162)
at com.crealytics.spark.excel.ExcelRelation.(ExcelRelation.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:35)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:390)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:432)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:399)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:399)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
at sun.reflect.GeneratedMethodAccessor274.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)`
The text was updated successfully, but these errors were encountered: