Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] low-memory reader options not very discoverable #16443

Closed
wence- opened this issue Jul 31, 2024 · 3 comments · Fixed by #17314
Closed

[DOC] low-memory reader options not very discoverable #16443

wence- opened this issue Jul 31, 2024 · 3 comments · Fixed by #17314
Assignees
Labels
doc Documentation Python Affects Python cuDF API.

Comments

@wence-
Copy link
Contributor

wence- commented Jul 31, 2024

Recently, we added chunked (low-memory) readers in cudf-python for parquet and json formats.

The only place these features are documented are in the options values that globally select whether to use the chunked reader. These options are, respectively io.parquet.low_memory and io.json.low_memory.

These are shown (in an unformatted manner) as the output of describe_options in the user documentation as part of the description of options: https://docs.rapids.ai/api/cudf/nightly/user_guide/api_docs/options/#api-options

If I were looking for information about how to control IO memory usage, I do not think that I would think to look here.

I would suggest that:

  • chunked reader control is mentioned in the relevant read_parquet and read_json docstrings. This is especially important because there is no keyword argument to control the behaviour, it is only controlled through the option.
  • these settings are mentioned in the I/O overview documentation (somewhere here https://docs.rapids.ai/api/cudf/nightly/user_guide/io/)
@wence- wence- added the doc Documentation label Jul 31, 2024
@bdice
Copy link
Contributor

bdice commented Jul 31, 2024

@galipremsagar Could you take this on?

@galipremsagar
Copy link
Contributor

@galipremsagar Could you take this on?

Sure

@galipremsagar galipremsagar self-assigned this Jul 31, 2024
@vyasr
Copy link
Contributor

vyasr commented Aug 16, 2024

It might also be good to have a high-level user guide indicating how to user cudf in low memory situations. That would include the I/O options as well as things like switching to a managed memory allocator or tips and tricks for cleaning up intermediate objects to reduce how many allocations stick around.

@vyasr vyasr added the Python Affects Python cuDF API. label Oct 10, 2024
@GPUtester GPUtester moved this from Todo to In Progress in cuDF Python Nov 13, 2024
rapids-bot bot pushed a commit that referenced this issue Nov 14, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants