-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing middle names in ingestion materials #3985
Comments
Thanks for posting this. This is a recurring issue which is cropping up again with EMNLP ingestion. I'll post a summary of the history of this issue later today. |
One question I have for you, @mbollmann: does the new Python module make it easy to get a list of full names, so that we could match against them? I am likely going to write something to handle this later today, and it would be nice to use the new module instead of the old code. I'll look to the module first but I'll have to prioritize speed, which favors the code base I currently understand. |
What exactly are you trying to achieve? I suspect there’s a better way to do this than starting from a list of full names. FWIW, you could have a look at the documentation (e.g. https://acl-anthology-py.readthedocs.io/en/stable/guide/accessing-authors/) and see if that helps; if not and you decide to fall back on the old codebase, I’m also happy to take a look at the script afterwards to suggest how I’d port it. But I’ll be away from a computer the next ~24 hours, so I can’t try out things in the meantime. |
For the record, my argument for changing that decision is that currently, it seems that authors have literally no control over getting their names right in the metadata; conversely, if we had the issue of middle names appearing in the metadata that authors don’t want, they have the control to change that themselves in their OpenReview profiles. Therefore I think the latter is the much preferable solution. |
Here's the promised background:
Here is an example from the aclpub2 export, a - dblp_id: https://dblp.org/pid/96/4410
emails: wcampbell@ll.mit.edu
first_name: William
last_name: Campbell
middle_name: M.
name: William M. Campbell
username: ~William_M._Campbell1 We can see the middle name here has been dropped, likely heuristically. We also have information that could help resolve this user. Note that this user currently has two author pages, but that the version with the middle initial is correct. There is also a third variant here. |
Note that in #4024 I restored the use of the middle name that is parsed out from aclpub2. This affected 888 name instances in EMNLP 2024 and workshops, and 743 individuals. This provides a sense of the magnitude of the decision here. I do agree, though, that using the name provided is the best approach, since it provides authors with full control over how their name presents. |
Thanks for the background, Matt! I think it’s good to have this documented in one place.
That would happen in the ingestion script, I assume? Maybe I can take a look at that one first after #3996 is finished.
Maybe I’m a bit old-school here, but I would probably work with GROBID instead of going for LLMs. 😄 |
For quite some time now, it has been a recurring issue that author’s middle names are not appearing in the ingestion material for the Anthology. This creates a lot of irritation among authors who keep having to file corrections, despite their name being entered fully and correctly on OpenReview, and a lot of work for the Anthology. I could easily find dozens of issues related to missing middle names within a few minutes of searching:
#3218
#3353
#3408
#3467
#3492
#3532
#3559
#3577
#3588
#3898
#3735
#3984
I remember hearing at some point that there was a conscious decision to automatically remove middle names from author names when preparing ingestion materials (@mjpost?), but I couldn’t quickly find the source of that information or any other discussion around this, so I’m opening this issue instead. I’m not even sure if the cause for this would be in our own ingestion scripts or in ACLPUB2 (and the issue better be opened there).
In any case, if there is a step in the process that automatically filters out middle names, I would strongly question that decision and suggest that it be changed.
The text was updated successfully, but these errors were encountered: