You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Everaldo has played for Guarani and Santa Cruz in the Campeonato Brasileiro, before moving to Mexico where he played for Chiapas and Necaxa." , entity: Guarani .
When training the model through the input, [MASK] token is added for masking Guarani entity.
Then, the model is trained by predicting [MASK] as Guarani through Cross Entropy Loss.
However, when we analyze entity_vocab.json, there isn't "Guarani".
The entity_vocab.json only have "Guarani language", "Guarani FC", "Tupi\u2013Guarani languages", "Guarani mythology".
In that example, I believe that Guarani means Guarani FC.
Therefore, is the model trained to predict [MASK] as Guarani FC?
If yes, we need to let the model know Guarani means Guarani FC.
And, I guess that we need to match Guarani with Guarani FC.
The pretraining data of LUKE is constructed with text from Wikipedia, which is already annotated with ground-truth entities.
So, the ambiguity of entity mentions will not be an issue in pretraining.
In your example, if "Guarani" is not in the entity vocabulary, the answer entity of [MASK] will be [UNK], or such entities are ignored depending on the setting.
Hi, first of all, thank you for the nice work.
Let's take the below input example.
"Everaldo has played for Guarani and Santa Cruz in the Campeonato Brasileiro, before moving to Mexico where he played for Chiapas and Necaxa." , entity: Guarani .
When training the model through the input, [MASK] token is added for masking Guarani entity.
Then, the model is trained by predicting [MASK] as Guarani through Cross Entropy Loss.
However, when we analyze entity_vocab.json, there isn't "Guarani".
The entity_vocab.json only have "Guarani language", "Guarani FC", "Tupi\u2013Guarani languages", "Guarani mythology".
In that example, I believe that Guarani means Guarani FC.
Therefore, is the model trained to predict [MASK] as Guarani FC?
If yes, we need to let the model know Guarani means Guarani FC.
And, I guess that we need to match Guarani with Guarani FC.
The preprocessing in https://github.com/studio-ousia/luke/blob/master/pretraining.md, deals with such issues ?
Thank you.
The text was updated successfully, but these errors were encountered: