-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweaking ED pipeline #121
Comments
Very interesting project, and nice failure analysis! My suspicion is that the REL data is too old for your more contemporary data. E.g., the Shadi Hamid WP page was only filled with some text in 2020, and I think Lauren Bonner does not have a Wikipedia page. End 2013, the Charlamagne Tha God page only had one occurrence of "Comedy", the rest is really about works and shows, so harder to match. The Dave Smith page was created in 2021, so not in our data. (PS: Did you use the 2014 or the 2019 data?) I think you need a newer Wikipedia target (and newer embeddings) to get better results for these cases. I'll ask the team if anyone is working on this anyways, or can help out. |
@arjenpdevries: Thanks for the prompt response. I am using Okay, so Still, As for Also, thanks for clarifying Looking forward to the newer WP target and embeddings, if they are available. Thanks! |
The golden route would be to link to a different KG that actually specializes in your domain - but we have not really experimented with other KGs either, yet. Or make a Wikipedia page for Lauren Bonner ;-) I assume BLINK has the same problem in this specific case, it also links to WP as a target, doesn't it? |
Indeed a nice analysis, thanks for sharing it with us. I also recommend using REL with a new Wikipedia version. A version of REL trained on Wikipedia dump 2021-03 is made available by Giovanni Sorice from University of Pisa at the SoBigData platform. Note that the SoBigData API of REL is made by a third party and the results may not necessarily be in line with what we report in the paper and confirm. Updating REL to new Wikipedia versions is also in our todo list, but it might take a while until it is available (other features of REL have priority now). If you want to deploy REL on another Wikipedia version, you can follow this tutorial: https://rel.readthedocs.io/en/latest/tutorials/deploy_REL_new_wiki/ |
Also for the |
1 similar comment
This comment was marked as duplicate.
This comment was marked as duplicate.
@hasibi: There seem to be two scores - one for NER and the other for ED. It seems I was looking at the NER score. Can you provide more details about the ED score? Is the lower the better? Would you recommend a threshold that achieves a good trade-off between precision and recall? Concatenating podcast and episode description (episode link), I get the following result: [
[
0,
19,
"Charlamagne Tha God",
"Charlamagne_tha_God",
0.3872783780234141,
0.7140538295110067,
"PER"
],
[
540,
13,
"Michael Cohen",
"Michael_Cohen_(lawyer)",
0.45666060472584635,
0.9998109936714172,
"PER"
],
[
564,
10,
"Jim Norton",
"Jim_Norton_(comedian)",
0.4219384342889336,
0.9998306930065155,
"PER"
],
[
650,
19,
"Charlamagne Tha God",
"Charlamagne_tha_God",
0.3872783780234141,
0.9017136096954346,
"PER"
],
[
677,
16,
"Stephen A. Smith",
"Stephen_A._Smith",
0.3872783780234141,
0.9961599707603455,
"PER"
]
] |
I was talking about ED scores. The higher, the better. We do not recommend a threshold, as we optimize on F1. If you want to get higher precision, in the expense of lowering recall, you can choose your own threshold. This threshold is mainly introduced for downstream tasks, where confidence of the entity linker is needed. |
Hi,
Thanks again for the great work!
I am currently evaluating REL for ED purposes and comparing it against other ED techniques, chiefly against BLINK from Facebook AI Research. They both take into account the context in which a mention occurs, are two-staged, and use neural approaches. BLINK does well, but can be slow and requires a GPU to run, which is a limitation for me.
Although REL is fast and lightweight, I find that it often misses a few obvious cases. I am looking for some guidance as to how I can tweak the internal workings of REL to achieve accurate results.
The following results have been obtained by running REL on a podcast description and a particular episode description - separated by a newline.
That is, in the code
For this episode, mention
Shadi Hamid
is identified asBrookings_Institution
with score0.9991938769817352
and NER tagPER
. This is particularly egregious. Shadi Hamid's Wikipedia page is not being returned as the 1st candidate.For this episode, mention
Lauren Bonner
from the podcast description is being identified asLauren_Samuels
with score0.9993583559989929
even though the last names are quite different while mentionRay J
is (correctly) identified asRay_J
albeit with a lower score0.8136761486530304
.For this episode, mention
Charlamagne Tha God
from the podcast description gets only0.7140538295110067
score even though words likecomedians, outspoken celebrities, and thought-leaders
appear in the context (which should make it easy to match his embedding learned from his Wikipedia profile which contains similar words).For this episode, mention
Dave Smith
is always identified asDave_Smith_(engineer)
with very high confidence, even thoughDave_Smith_(comedian)
, the correct answer appears in the candidate set and has even words such asgovernment, foreign policy, and all things Libertarian
in the context which should have had a greater match with his description on Wikipedia.The last point is particularly important since
Dave Smith
is quite a common name and there are at least 4Dave Smith
s in Wikipedia - but with very differing descriptions.Thanks!
The text was updated successfully, but these errors were encountered: