Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query data using spark from Azure blob storage. #261

Open
nagashreepoojari07 opened this issue Dec 12, 2024 · 0 comments
Open

Query data using spark from Azure blob storage. #261

nagashreepoojari07 opened this issue Dec 12, 2024 · 0 comments
Assignees

Comments

@nagashreepoojari07
Copy link

I'm trying to read data from azure blob storage using spark in Data bricks.

code:
RELEASE = '2024-09-18.0'
theme = 'base'
type='water'
subtype= 'water'
classes= ['water', 'tidal']

classes_str = "('{}')".format("','".join(classes))
query = """SELECT * FROM parquet.https://overturemapswestus2.blob.core.windows.net/{}/theme={}/type={}/ WHERE subType='{}' AND class IN {};""".format(RELEASE, theme, type, subtype, classes_str)

result_df = spark.sql(query)

I'm getting error:
[[DELTA_INVALID_FORMAT](https://docs.microsoft.com/azure/databricks/error-messages/error-classes#delta_invalid_format)] Incompatible format detected.

A transaction log for Delta was found at https://overturemapswestus2.blob.core.windows.net/release/2024-09-18.0/theme=base/type=water/part-00000-284b06bd-9385-4936-a4bb-71a4a6df08ac-c000.zstd.parquet/_delta_log,
but you are trying to read from https://overturemapswestus2.blob.core.windows.net/release/2024-09-18.0/theme=base/type=water/part-00000-284b06bd-9385-4936-a4bb-71a4a6df08ac-c000.zstd.parquet using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.

querying using amazon s3 storage, works.
RELEASE = '2024-09-18.0'
theme = 'base'
type='water'
subtype= 'water'
classes= ['water', 'tidal']

classes_str = "('{}')".format("','".join(classes))
query = """SELECT * FROM parquet.https://overturemapswestus2.blob.core.windows.net/{}/theme={}/type={}/ WHERE subType='{}' AND class IN {};""".format(RELEASE, theme, type, subtype, classes_str)

result_df = spark.sql(query)
display(result_df)

what is the right way to query data using spark from azure blob storage?

@jenningsanderson jenningsanderson transferred this issue from OvertureMaps/docs Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants