com.crealytics.spark.excel doesn't read directly from ADL #125

Mathyaku · 2019-05-29T18:34:18Z

I'm getting the following error:

[2019-05-29 18:25:21,894] {init.py:1580} ERROR - An error occurred while calling o77.load.
: java.io.IOException: Password fs.adl.oauth2.client.id not found
at org.apache.hadoop.fs.adl.AdlFileSystem.getPasswordString(AdlFileSystem.java:950)
at org.apache.hadoop.fs.adl.AdlFileSystem.getConfCredentialBasedTokenProvider(AdlFileSystem.java:289)

ex1- DOESN'T WORK:

spark = sparkSession....
spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "false")
.option("skipFirstRows","15")
.load("adl://test.azuredatalakestore.net/teste.xls")

PS:

If I try to read any file from my adl with that sparkSession and then read the .xls everything works.

ex2 - WORKS:

spark = sparkSession....
spark.read.format("csv")
.option("useHeader", "false")
.option("skipFirstRows","15")
.load("adl://test.azuredatalakestore.net/teste2.csv")

spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "false")
.option("skipFirstRows","15")
.load("adl://test.azuredatalakestore.net/teste.xls")

nightscape · 2019-05-30T19:06:20Z

Hmm, I'm quite clueless, what we'd have to do to support ADL properly. Would you be willing to contribute a PR or dig out the corresponding documentation?
We don't have this use case and can't spend much time on this...

aravish · 2019-07-10T14:36:43Z

We are trying below from Databricks, per them this is the update.

This is because the Spark reader used to load the excel file does not honor the configs given as Hadoop configuration and it does not load the same

Repro code, Data lake store (ADL) is an Azure storage platform, the problem is only when you reference a full path like below. But when you mount the storage platform as a mount point on Databricks, problem does not occur.

dayreportfullpath = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").load("adl://aravishdatalake.azuredatalakestore.net/external/Test.xlsx")

IllegalArgumentException: 'No value for dfs.adls.oauth2.access.token.provider found in conf file.'

IllegalArgumentException Traceback (most recent call last)
in ()
----> 1 dayreportfullpath = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").load("adl://aravishdatalake.azuredatalakestore.net/external/Test.xlsx")

/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
164 self.options(**options)
165 if isinstance(path, basestring):
--> 166 return self._df(self._jreader.load(path))
167 elif path is not None:
168 if type(path) != list:

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
78 if s.startswith('java.lang.IllegalArgumentException: '):
---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
80 raise
81 return deco

IllegalArgumentException: 'No value for dfs.adls.oauth2.access.token.provider found in conf file.'

brickfrog · 2019-10-18T00:38:28Z

For anyone else having this issue, you need to use the RDD context. You can also mount it, but in some cases you may be averse to mounting (like my use case).

spark.sparkContext.hadoopConfiguration.set(...

This is what worked for me earlier today. Was able to read from ADLS without mounting.

axen22 · 2019-10-23T16:15:18Z

@brickfrog mounting does not work for me, I still get the mentioned error. Can you provide your code as an example?
also what do you mean by use RDD context? can you provide an example?

divyavanmahajan · 2022-05-16T14:47:10Z

For someone who comes here - looking for a Pyspark solution
Spark 3.1.2
Cannot read abyss:// url with spark-excel

Use
com.crealytics:spark-excel_2.12:0.13.7
and set the Azure OAuth parameters with
spark._jsc.hadoopConfiguration().set(key, value)
in addition to
spark.conf.set(key, value)

@brickfrog - thanks for pointing us in the right direction.

quanghgx added bug cloud Usage of spark-excel on cloud storage & platform labels Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

com.crealytics.spark.excel doesn't read directly from ADL #125

com.crealytics.spark.excel doesn't read directly from ADL #125

Mathyaku commented May 29, 2019

nightscape commented May 30, 2019

aravish commented Jul 10, 2019

brickfrog commented Oct 18, 2019 •

edited

Loading

axen22 commented Oct 23, 2019

divyavanmahajan commented May 16, 2022

com.crealytics.spark.excel doesn't read directly from ADL #125

com.crealytics.spark.excel doesn't read directly from ADL #125

Comments

Mathyaku commented May 29, 2019

nightscape commented May 30, 2019

aravish commented Jul 10, 2019

IllegalArgumentException: 'No value for dfs.adls.oauth2.access.token.provider found in conf file.'

brickfrog commented Oct 18, 2019 • edited Loading

axen22 commented Oct 23, 2019

divyavanmahajan commented May 16, 2022

brickfrog commented Oct 18, 2019 •

edited

Loading