Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create preprocess CLI #785

Merged
merged 10 commits into from
Oct 26, 2023
Merged

Create preprocess CLI #785

merged 10 commits into from
Oct 26, 2023

Conversation

casper-hansen
Copy link
Collaborator

@casper-hansen casper-hansen commented Oct 25, 2023

  • remove --prepare_ds_only
  • use python -m axolotl.cli.preprocess your_config.yml to preprocess
  • when using --debug, print raw prompt template when it is supported
  • modified readme to include new CLI and update the multi-GPU section

@winglian
Copy link
Collaborator

I noticed that running it the first time prints the prompter,
Screenshot 2023-10-26 at 8 46 20 AM
but running it again right after doesn't
Screenshot 2023-10-26 at 8 46 46 AM

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the prompter printing, lgtm. thanks!

@casper-hansen
Copy link
Collaborator Author

I noticed that running it the first time prints the prompter,
Screenshot 2023-10-26 at 8 46 20 AM
but running it again right after doesn't
Screenshot 2023-10-26 at 8 46 46 AM

That’s because we can’t print a pre-tokenized dataset. That would take a bit of work to do and considered out of scope for this PR.

@winglian winglian merged commit e50ab07 into main Oct 26, 2023
4 checks passed
@winglian winglian deleted the preprocess_cli branch October 26, 2023 13:35
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
* Create preprocess CLI

* Print prompt template if debugging

* Add print for unsupported prompters

* Formatting

* Formatting

* Refactor variables

* Formatting

* Formatting

* Formatting

* Formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants