-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read a single parquet file from a Delta table with LastUpdated=2023-10-08T20%3A54%3A33.250Z
in the path
#7877
Comments
Thank you for the report @rspears74 -- it certainly seems like a bug. I wonder if something about the |
@alamb I've been able to a bit more debugging and I've figured out what seems to be the root of the problem. I changed the partition file names when I posted here, the bottom level partition folders are actually Timestamps, so the folder names are something like After discovering this, I tried to simply read the file with Side note: I also found that if I tried to
which I assume means an empty DF. |
LastUpdated=2023-10-08T20%3A54%3A33.250Z
in the path
Thanks @rspears74 - I agree it sounds like an issue with Thanks again for the report |
Minimum example: use datafusion::error::Result;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
ctx.read_csv(
"/home/jeffrey/tmp%123/test.csv",
CsvReadOptions::new(),
)
.await?
.show()
.await?;
Ok(())
} Running this throws error:
This can be avoided by specifying the use datafusion::error::Result;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
ctx.read_csv(
"file:///home/jeffrey/tmp%123/test.csv", // <-- here
CsvReadOptions::new(),
)
.await?
.show()
.await?;
Ok(())
} (This can be reproduced for parquet as well) Cause seems to be here: Specifically, if |
Describe the bug
I am trying to read parquet files from a Delta table. The parquet files are snappy compressed. My Delta table has 3 partition columns, so the folder structure of the Delta table looks something like this:
I was originally trying to read a list of files (I need to be able to read an arbitrary list of files), but debugging my issue has brought me to trying to read a single parquet file. My code for this is as follows:
When I run this, I get:
HOWEVER, if I run this same code, but instead of passing the full path to the parquet file, I pass only the directory the file is in (
"/Users/me/Downloads/table/Col1=ABC/Col2=123/Col3=abc"
), I get no such error and I'm able to successfully read the parquet file.I'm not sure if I'm doing something wrong, or if this is some kind of bug.
To Reproduce
Try to read a single parquet file from a local, partitioned Delta table, using
SessionContext::read_parquet
.Expected behavior
I expect the file to be read into DataFusion as a DataFrame.
Additional context
No response
The text was updated successfully, but these errors were encountered: