You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
many thanks for open sourcing the repo for ViDeBERTa!
I'm very interested in the v3 pretraining of a DeBERTa model. In the current version of the pretraining code, I can see that the normal DeBERTa package is called:
However, the publicly available DeBERTa code does not yet include the support of Gradient Disentangled Embedding Sharing (GDES), see e.g.: microsoft/DeBERTa#93.
Did you modify the code to add support for GDES? I would highly be interested in that implementation.
Many thanks and cheers,
Stefan
The text was updated successfully, but these errors were encountered:
Thank @stefan-it for your interest in the v3 pretraining of DeBERTa.
In this work, we have modified the code of DeBERTa to add GDES in pretraining, following the DeBERTaV3 paper. If you are interested in that implementation, you can take a look on the latest v3 pretraining code at the original source: https://github.com/microsoft/DeBERTa.
Hi @DaoTranbk and @HyTruongSon,
many thanks for open sourcing the repo for ViDeBERTa!
I'm very interested in the v3 pretraining of a DeBERTa model. In the current version of the pretraining code, I can see that the normal DeBERTa package is called:
ViDeBERTa/pre-training/bash/pre-train_model.sh
Line 13 in 8270cce
However, the publicly available DeBERTa code does not yet include the support of Gradient Disentangled Embedding Sharing (GDES), see e.g.: microsoft/DeBERTa#93.
Did you modify the code to add support for GDES? I would highly be interested in that implementation.
Many thanks and cheers,
Stefan
The text was updated successfully, but these errors were encountered: