Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter_fn to all generic dataset classes and builders API #1789

Merged
merged 23 commits into from
Oct 10, 2024

Conversation

krammnic
Copy link
Contributor

@krammnic krammnic commented Oct 9, 2024

Context

What is the purpose of this PR? Is it to add filter_fn to all required dataset builders APIs and generics.

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

  1. filter_fn to all required generic dataset classes according to Add filter_fn to all dataset builders #1768
  2. filter_fn in all specific dataset builders(usually based on SFTDataset) API update
  3. docstrings update

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

I'm not sure about all changes, so review is very glad here.

I have might missed something so added [WIP].

Copy link

pytorch-bot bot commented Oct 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1789

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8ddc7bd with merge base 57ab583 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2024
@krammnic
Copy link
Contributor Author

krammnic commented Oct 9, 2024

@RdoubleA @joecummings Require accurate review

@RdoubleA
Copy link
Contributor

RdoubleA commented Oct 9, 2024

This looks great. Could you also add it to the multimodal datasets?

@krammnic
Copy link
Contributor Author

krammnic commented Oct 9, 2024

This looks great. Could you also add it to the multimodal datasets?

Done.

@krammnic
Copy link
Contributor Author

krammnic commented Oct 9, 2024

Revised.

@krammnic
Copy link
Contributor Author

@RdoubleA Probably now is fine(fixed linting and docs was building on previous revisions). I have 3 PR-s opened yet, but let's start from here.

@krammnic
Copy link
Contributor Author

@RdoubleA Finnaly!

@RdoubleA RdoubleA changed the title [WIP] Add filter_fn to all generic dataset classes and builders API Add filter_fn to all generic dataset classes and builders API Oct 10, 2024
@RdoubleA RdoubleA merged commit 5de5001 into pytorch:main Oct 10, 2024
17 checks passed
mori360 pushed a commit to mori360/torchtune that referenced this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants