Questions Regarding the model #8

BradKML · 2022-09-20T08:13:04Z

The XL models intent accuracy are ~1% away from mBERT in general, and in some of the language subcategories. Even though on the surface a hypothetical XXL model would be able to parity mBERT (as per comparing L and XL models), is there a possibility that it can have diminishing returns?
Are there any way of demonstrating the interpretability of Transformer-based models (e.g. BERT and GPT-likes), are there similar mechanisms for the Mixer (since MLP-Mixer visualization exists)? https://jalammar.github.io/illustrated-transformer/ https://medium.com/ml-summaries/mlp-mixer-an-all-mlp-architecture-for-vision-paper-summary-e50fa915e04d
On a speculative note, when can the model be scaled to parity GPT-Neo or its commercial counterparts?

Provide feedback