-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tried training a model #5
Comments
@b-albar is in charge of pre-training and model optimization. However, he's on vacation this week, so he won't be able to answer your questions on the tokenizer until next week. For my part, I'm in charge of finetuning the resulting models to assess their performance. In the absence of a concrete answer on that point, a few comments based on what I've seen. I hope to be able to give you a more precise answer next week when Boris is back and we've discussed this point. We'll get back to you. |
I'll try to provide a full training script for minipile. The diverging loss after 10k steps is not something I encountered before, I'll have a look. Converting from HF to FAT5 is not so trivial as the masking is different, I had to finetune them for a few steps to make it learn how to deal with padding tokens. I haven't try to convert from FAT5 to HF, I feel that finetuning may not be required but the proper way is probably to include the code in the HF repo of the model, similar to this for example : https://huggingface.co/CATIE-AQ/FAT5-large-flan-en |
First off thank you for building this repo!
I tried training a t5 base model on mini pile and got some interesting results.
https://wandb.ai/amazingvince/flasht5-pretrain/runs/ch4a9y51?nw=nwuseramazingvince
I had to modify code in the UL2 objective to accept the dataset I tokenized. I am worried my modifications might have broken something.
Could you provide a script or link to the one you used to tokenize your dataset?
After training the model preformed much worse than we expected generating giberish. I wrote a script based on your hf to flasht5 to reverse the process and go from flasht5 to hf. I might have messed up something in this conversion.
Do you have an example of doing inference using the models without converting them somewhere. I tried copying all the files from a repo on y'alls huggingface and loading with trust remote code equals true. Model still generates junk.
I also tried finetuning the model on samsum and this did not result in a model that seemed to have learned anything. Curious if model needed to tune before being used.
Some intresting results here this was using my converted to huggingface model:
https://wandb.ai/amazingvince/t5-summarization-test/runs/3yyem3wy?nw=nwuseramazingvince
Same hyperparms but using fasht5 model loaded with trust_remote_code=True. The loss increases over training run.
https://wandb.ai/amazingvince/t5-summarization-test/runs/cudb9wm2?nw=nwuseramazingvince
Curious if you guys have seen any of these things or have ideas as to what I am doing wrong. Thanks again for building this repo and making it public. I have been annoyed at the lack of resources and attention T5 has gotten over the last few years.
The text was updated successfully, but these errors were encountered: