Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "full" gelu without approximation #209

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

se-schmitt
Copy link

  • adds the activation function gelu_noapprox which calculates the full gelu (pytorch default case, see here).

Context: I have implemented a thermodynamic model based on a ChemBERTa model here. To obtain the same results (within numerical accuracy) as the original model (which was implemented using pytorch, see here), the "full" gelu activation function is needed (locally tested). So I would be happy to have this feature added in Transformers.jl, so that I can submit the corresponding changes to Clapeyron.jl.

@se-schmitt
Copy link
Author

@chengchingwen have you had a chance to look at this?

@chengchingwen
Copy link
Owner

If the gelu without approx is the default in pytorch, we should also think about having that implemented in NNlib.jl. The question here is that there is no "gelu_noapprox" in huggingface/transformers so it wouldn't be loaded automatically unless you manually update the config.

So I would suggest:

  1. open an issue/PR in NNlib.jl with the gelu implemented with erf
  2. instead of adding "gelu_noapprox" to ACT2FN, directly replace the gelu for [gelu, gelu_python].
  3. It would be better to have a small test testing the gelu values and gradients

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants