Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bloom preprocessor #1424

Merged
merged 7 commits into from
Feb 10, 2024
Merged

Conversation

abuelnasr0
Copy link
Contributor

No description provided.

@abuelnasr0
Copy link
Contributor Author

@mattdangerw The preset files needs to be updated on kaggle, because the sequence length of the model is 2048 not 512.I set default to 512 instead of 2048. I am sorry for that.
It is not used by the model and it will not affect anything, but as we are saving it in the config, it must be the correct value that the original preset was pretrained with.

here is a link to the variation : https://www.kaggle.com/models/mohamedabuelnasr/bloom/frameworks/keras/variations/bloom_560m_multi

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Feb 9, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Feb 9, 2024
@mattdangerw
Copy link
Member

@mattdangerw The preset files needs to be updated on kaggle, because the sequence length of the model is 2048 not 512.I set default to 512 instead of 2048. I am sorry for that.

@abuelnasr0 all good! definitely expected to keep working out kinks as we keep landing a new model.

Sounds like we just need to copy version 2 here -> https://www.kaggle.com/models/mohamedabuelnasr/bloom/frameworks/keras/variations/bloom_560m_multi into the keras team bloom https://www.kaggle.com/models/keras/bloom, is that right?

Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me.

Let me know if I got the Kaggle update right and I will go ahead and copy that over.

Also checking out the jax failure but I believe it is driver issues and unrelated.

`y` and `sample_weight` are both optional, can have any format, and will be
passed through unaltered.

`BloomPreprocessor` forces the input to have only one segment, as BLOOM is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just delete this paragraph, it's not very helpful (I can delete elsewhere too). People are doing all sort of "multi segment" stuff with decoder models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just deleting this? or enabling the multisegment packing by replacing StartEndPacker with MultiSegmentPacker?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh just delete this comment for now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted it but I left the ValueError that checks that the number of segments is only one. because otherwise the packer will raise an error.

@abuelnasr0
Copy link
Contributor Author

Sounds like we just need to copy version 2 here -> https://www.kaggle.com/models/mohamedabuelnasr/bloom/frameworks/keras/variations/bloom_560m_multi into the keras team bloom https://www.kaggle.com/models/keras/bloom, is that right?

@mattdangerw Yes that is right. The only changed file is config.json. where max_sequence_length is updated to 2048 instead of 512.

@mattdangerw
Copy link
Member

updated https://www.kaggle.com/models/keras/bloom, take a look!

will merge this whenever the that last nit is done, no rush!

@mattdangerw mattdangerw merged commit 2cfd900 into keras-team:master Feb 10, 2024
6 checks passed
@abuelnasr0 abuelnasr0 deleted the bloompreprocessor branch March 8, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants