Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XLNetBackbone #928

Closed
wants to merge 6 commits into from
Closed

Conversation

shivance
Copy link
Collaborator

@shivance shivance commented Mar 26, 2023

@shivance shivance mentioned this pull request Mar 26, 2023
@shivance
Copy link
Collaborator Author

shivance commented Mar 26, 2023

Still WIP.

@shivance shivance marked this pull request as draft March 27, 2023 14:34
@shivance
Copy link
Collaborator Author

Hey @mattdangerw & @chenmoneygithub Turns out that addition of XLNet needs MultiHeadRelativeAttention and TwoStreamRelativeAttention apart from relative encoding, transformerXL layers.

Should I break this PR up into multiple ones? I'm not able to make entire model work correctly, just yet.

@mattdangerw
Copy link
Member

@shivance timely question! We actually have a contribution guide for models about to land here. #820

Could you try following that and give us feedback?

Re the overall stucture, lets keep everything in a models/xlnet directory for now (don't contribute directly to keras_nlp.layers just yet, though we can follow up with that), and while looking at huggingface or model garden for inspiration is totally fine, try to make sure we stick to the "local" style of KerasNLP wherever possible when writing layers, models etc.

We also should make sure we have the full forward pass matching correctly before we get too deep into review on style, etc. You can usually try making a colab for starters that aligns the forward pass with a reference implementation and share that here!

@abheesht17
Copy link
Collaborator

@shivance, let me know when this is ready for review!

@shivance
Copy link
Collaborator Author

shivance commented Apr 6, 2023

@shivance, let me know when this is ready for review!

Really Sorry for slacking off here, am not able to find time cuz of personal reasons. Will follow up soon.

@abheesht17
Copy link
Collaborator

No hurry, take your time! Take care!

@shivance shivance closed this Jun 4, 2023
@susnato
Copy link
Contributor

susnato commented Jun 4, 2023

Hi @shivance can I please continue this PR?

@shivance
Copy link
Collaborator Author

shivance commented Jun 4, 2023

@susnato I have done basic ground work already, however the PR is bug prone, I'm not planning to work on it any sooner though as I've been onto #1052 . So you could probably fork my branch, and continue with my commits (which I made so far). What say?

@susnato
Copy link
Contributor

susnato commented Jun 4, 2023

Hi @shivance thanks for quick reply, I would love to continue your work. How about I fork your branch and push the commits on your repo(to shivance/keras-nlp_branch...) and you merge them, that way we both will share credits? or if you want I can copy the existing code and make a new PR. What do you say?

@shivance
Copy link
Collaborator Author

shivance commented Jun 4, 2023 via email

@susnato
Copy link
Contributor

susnato commented Jun 5, 2023

then please add me as a collaborator @shivance .

@shivance shivance reopened this Jun 6, 2023
@shivance
Copy link
Collaborator Author

shivance commented Jun 6, 2023

@susnato just gave!

@susnato
Copy link
Contributor

susnato commented Jun 10, 2023

Hi @mattdangerw, I am currently working on this integration. I have one small doubt -
As you said here
"We also should make sure we have the full forward pass matching correctly before we get too deep into review on style"
I am using the HuggingFace Implementation and weights as a reference for forward pass since it is Tensorflow-2. Is it ok or should I switch to another implementation?

@susnato
Copy link
Contributor

susnato commented Jun 16, 2023

Hi @mattdangerw @abheesht17 , In xlnet implementation there are some optional arguments such as mems and perm_mask which are mostly used during pretraining but during inference we mostly leave it as None, but as in keras-nlp we don't define the call method for Backbones(as I noticed in all models), instead we define all the Input layers in __init__ but that makes things complicated as if we define a Input layer for mems, we can't leave it Noneduring the inference otherwise they will give error.

Should I then define the call method in the backbone?(That would make this backbone an exception) or should I do a workaround(eg, instead of passing None we can pass -1e+9)?

@susnato
Copy link
Contributor

susnato commented Jun 17, 2023

Hi @shivance I am sorry but I think it's better to open a new PR. Since you created this PR I need to tag you all time to ask for changing description, to make it ready for review from draft or asking reviews from maintainers, and also you might get bothered about getting tagged all the time, so I am making a new PR, so please close this one also please do not worry, I am going to continue from your commits.

@susnato susnato mentioned this pull request Jun 17, 2023
@shivance shivance closed this Jun 27, 2023
@shivance shivance deleted the xlnet-backbone branch June 27, 2023 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add XLNetBackbone to models/
4 participants