Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding the model #8

Open
BradKML opened this issue Sep 20, 2022 · 0 comments
Open

Questions Regarding the model #8

BradKML opened this issue Sep 20, 2022 · 0 comments

Comments

@BradKML
Copy link

BradKML commented Sep 20, 2022

  1. The XL models intent accuracy are ~1% away from mBERT in general, and in some of the language subcategories. Even though on the surface a hypothetical XXL model would be able to parity mBERT (as per comparing L and XL models), is there a possibility that it can have diminishing returns?
  2. Are there any way of demonstrating the interpretability of Transformer-based models (e.g. BERT and GPT-likes), are there similar mechanisms for the Mixer (since MLP-Mixer visualization exists)? https://jalammar.github.io/illustrated-transformer/ https://medium.com/ml-summaries/mlp-mixer-an-all-mlp-architecture-for-vision-paper-summary-e50fa915e04d
  3. On a speculative note, when can the model be scaled to parity GPT-Neo or its commercial counterparts?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant