-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
com.crealytics.spark.excel doesn't read directly from ADL #125
Comments
Hmm, I'm quite clueless, what we'd have to do to support ADL properly. Would you be willing to contribute a PR or dig out the corresponding documentation? |
We are trying below from Databricks, per them this is the update. This is because the Spark reader used to load the excel file does not honor the configs given as Hadoop configuration and it does not load the same Repro code, Data lake store (ADL) is an Azure storage platform, the problem is only when you reference a full path like below. But when you mount the storage platform as a mount point on Databricks, problem does not occur. dayreportfullpath = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").load("adl://aravishdatalake.azuredatalakestore.net/external/Test.xlsx") IllegalArgumentException: 'No value for dfs.adls.oauth2.access.token.provider found in conf file.'IllegalArgumentException Traceback (most recent call last) /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options) /databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in call(self, *args) /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) IllegalArgumentException: 'No value for dfs.adls.oauth2.access.token.provider found in conf file.' |
For anyone else having this issue, you need to use the RDD context. You can also mount it, but in some cases you may be averse to mounting (like my use case). spark.sparkContext.hadoopConfiguration.set(... This is what worked for me earlier today. Was able to read from ADLS without mounting. |
@brickfrog mounting does not work for me, I still get the mentioned error. Can you provide your code as an example? |
For someone who comes here - looking for a Pyspark solution Use @brickfrog - thanks for pointing us in the right direction. |
I'm getting the following error:
[2019-05-29 18:25:21,894] {init.py:1580} ERROR - An error occurred while calling o77.load.
: java.io.IOException: Password fs.adl.oauth2.client.id not found
at org.apache.hadoop.fs.adl.AdlFileSystem.getPasswordString(AdlFileSystem.java:950)
at org.apache.hadoop.fs.adl.AdlFileSystem.getConfCredentialBasedTokenProvider(AdlFileSystem.java:289)
ex1- DOESN'T WORK:
spark = sparkSession....
spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "false")
.option("skipFirstRows","15")
.load("adl://test.azuredatalakestore.net/teste.xls")
PS:
If I try to read any file from my adl with that sparkSession and then read the .xls everything works.
ex2 - WORKS:
spark = sparkSession....
spark.read.format("csv")
.option("useHeader", "false")
.option("skipFirstRows","15")
.load("adl://test.azuredatalakestore.net/teste2.csv")
spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "false")
.option("skipFirstRows","15")
.load("adl://test.azuredatalakestore.net/teste.xls")
The text was updated successfully, but these errors were encountered: