-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: merge set of changes for v2.3.0 #428
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Code to perform dataset sampling via sampling probabilities in data Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
* Expose additional data handlers as an argument to the train function. Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
#399) * fix: set legacy behavior to false, enable new behavior Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Resolve push_to_hub_token warning Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Remove max_seq_length and dataset_text_field from SFTTrainer Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fmt Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Resolve tokenizer.padding_side warning Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * nit: restructure warning fixes Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Add packing directly to SFTConfig Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fmt Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * Removed dataset_kwargs from SFTTrainer Removed the argument dataset_kwargs from the the invocation of SFTTRainer() because it will be deprecated in V1.0.0. Instead, dataset_kwargs have been added as a key to the training_args variable. Following the example provided by HF found here: https://huggingface.co/docs/trl/en/sft_trainer#training-the-vision-language-model Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * fix: Added max_seq_length back to SFTConfig() Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed legacy and padding_side args Removed these args as they were based on changes from @willmj that haven't been approved yet Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Moved all args to additional_args Following @kmehant suggestion. Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed packing and max_seq_length Removed packing and max_seq_length variables from additional_args Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed check is_pretokenized_dataset Co-authored-by: Mehant Kammakomati <kmehant@gmail.com> Signed-off-by: Luka-D <56648891+Luka-D@users.noreply.github.com> * Removed max_seq_length from additional_args Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed error.log Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * fix: move packing to SFTConfig as well Co-authored-by: Luka-D <56648891+Luka-D@users.noreply.github.com> Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> Signed-off-by: Luka-D <56648891+Luka-D@users.noreply.github.com> Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Co-authored-by: Mehant Kammakomati <kmehant@gmail.com> Co-authored-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
…les (#418) Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
…ts (#412) * test: Add unit tests to test multiple files in single/multiple datasets Signed-off-by: Abhishek <maurya.abhishek@ibm.com> * e2e testing unit test for multiple datasets with multiple files Signed-off-by: Abhishek <maurya.abhishek@ibm.com> * test: multiple datasets with multiple datafiles column names Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * PR changes Signed-off-by: Abhishek <maurya.abhishek@ibm.com> * PR Changes Signed-off-by: Abhishek <maurya.abhishek@ibm.com> * fix: fmt Signed-off-by: Abhishek <maurya.abhishek@ibm.com> * Merge test_process_dataconfig_multiple_files_varied_data_formats Signed-off-by: Abhishek <maurya.abhishek@ibm.com> --------- Signed-off-by: Abhishek <maurya.abhishek@ibm.com> Signed-off-by: Will Johnson <mwjohnson728@gmail.com> Co-authored-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
Also add mlflow docs and add mlflow to docker file and as optional requirement Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com>
feat: Integrate MLflow tracker
…atterns, HF Dataset and combination (#424) Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
aluu317
requested review from
anhuong,
Ssukriti,
fabianlim and
kmehant
as code owners
December 23, 2024 14:54
Thanks for making a pull request! 😃 |
aluu317
changed the title
release: merge set of changes for v2.3.0
chore: merge set of changes for v2.3.0
Dec 23, 2024
The commits looks good to me. After addition of this one more PR, looks good to merge. |
Signed-off-by: Dushyant Behl <dushyantbehl@in.ibm.com> Signed-off-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Abhishek <maurya.abhishek@ibm.com> Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Co-authored-by: Abhishek <maurya.abhishek@ibm.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the change
Related issue number
How to verify the PR
Was the PR tested