Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest is not a Parquet file. expected magic number #736

Closed
michael-j-thomas opened this issue Aug 5, 2021 · 4 comments
Closed

manifest is not a Parquet file. expected magic number #736

michael-j-thomas opened this issue Aug 5, 2021 · 4 comments

Comments

@michael-j-thomas
Copy link

Hi, I have a delta table integrated with the AWS Glue data catalog.

When running a query against the table via spark, I am getting the following error:

FileReadException: Error while reading file s3://.. _symlink_format_manifest/manifest

Caused by: RuntimeException: .../manifest is not a Parquet file. expected magic number at tail`

And the integrated spectrum table is generating a

S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey

Furthermore, not all partitions are added after attempting to generate the manifests even though there are files in the partition.

Related issues #365

I ran the Spark SQL on a Databricks Cluster with runtime: 8.1 - Apache Spark 3.1.1 and Delta 1.0.0

Please let me know if i can supply more details.

@m-credera
Copy link

m-credera commented Feb 28, 2022

Hey @michael-j-thomas - what was the solution here?

Edit: just read #706 - we should expect the metastore entries in Hive to work with Presto, Athena, etc. but not for Spark.

@sarahotoole
Copy link

sarahotoole commented Nov 3, 2022

@m-credera @michael-j-thomas Did either of you find a solution for this? I am also trying to use the Glue Catalog (to be able to query those tables using Spark SQL), but I'm experiencing the same issue since switching to delta/parquet.

@deepakmohanakrishnan07
Copy link

Does this issue still exists? We are facing the same issue, not able to read the Athena table created on DeltaLake file using Spark 3.4.

@tdas
Copy link
Contributor

tdas commented Feb 13, 2024

Are you creating a Delta table entry in Glue Catalog as the right table type? Reading delta table via manifests requires setting up the table in very specific way that will work only via manifest but not as a native delta table in other engines that understand delta natively. See more details here - https://docs.delta.io/latest/presto-integration.html#step-2-configure-presto-trino-or-athena-to-read-the-generated-manifests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants