You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file extension is treated in a special way: there's a isCompressed method, and depending on it readCSV wraps InputStream. But it doesn't work for *.zip because InputStream is wrapped in a GZIPInputStream. Apparently it's also not enough to just wrap an InputStream, because ZIP has more complex structure and you need to call methods of ZipInputStream:
val zipInputStream = ZipInputStream(
File("data.csv.zip").inputStream(),
Charsets.UTF_8
)
zipInputStream.nextEntry
val df1 = DataFrame.readCSV(zipInputStream)
zipInputStream.closeEntry()
The text was updated successfully, but these errors were encountered:
Another issue is that file ending with *.gz can be *.tar.gz, and we cannot read it properly without some special handling. So, i suggest to either support it or at least provide an exception message that file should be just an archive and not a *.tar
There's actually a lot of places where DataFrame assumes a type based on the file extension, but we should avoid that, as file extensions can be changed while the contents of the file are not.
Will be solved in the new CSV implementation: "dataframe-csv". I will probably also migrate its new Compression class to the :core module in the future to solve reading zips from other read functions too.
This file extension is treated in a special way: there's a
isCompressed
method, and depending on itreadCSV
wrapsInputStream
. But it doesn't work for *.zip because InputStream is wrapped in a GZIPInputStream. Apparently it's also not enough to just wrap an InputStream, because ZIP has more complex structure and you need to call methods of ZipInputStream:The text was updated successfully, but these errors were encountered: