Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read and deserialize parquet file as RowGroups #27

Closed
srigumm opened this issue Jan 19, 2020 · 0 comments
Closed

Read and deserialize parquet file as RowGroups #27

srigumm opened this issue Jan 19, 2020 · 0 comments
Assignees

Comments

@srigumm
Copy link
Contributor

srigumm commented Jan 19, 2020

Version: Parquet.Net

Runtime Version: .Net Core

OS: Windows/Linux/MacOSX

Expected behavior

Though our current "Deserialise" API is very helpful to work with parquet files, It still lacks APIs that gives ability to read parquet file as RowGroups. My current project team is in need of such API as we work with huge data like 10million record files, deserializing such a huge file and load entire business objects collection in memory is not working for us due to memory constraints.

So, It would be very helpful to dev teams if we can have these two APIs.

int GetRowGroupCount();
IEnumerable< T > ReadRowGroup(int i);

Actual behavior

No such API exists

Steps to reproduce the behavior

  1. Deserialize a parquet file with 10 million records.

Code snippet reproducing the behavior

@srigumm srigumm changed the title Read and Deserializing parquet file as RowGroups Read and deserialize parquet file as RowGroups Jan 19, 2020
aloneguid pushed a commit that referenced this issue Jan 22, 2020
…oup index (#32)

- Add ability to read and deserialize records by row group index

  #27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants