Skip to content

added some medical suffixes #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions nameparser/config/prefixes.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,18 @@
'do',
'dos',
'du',
'el',
'ibn',
'la',
'le',
'mc',
'mac',
'san',
'santa',
'st',
'ste',
'van',
'vel',
'van',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Van is sometimes a first name, so including it in prefixes would break parsing for all the Vans of the world. Skimming the US birth names database there do appear to be people named Van, eg 183 people born in 1983.

% python tests.py "Van middle last"
<HumanName : [
	title: '' 
	first: 'Van middle last' 
	middle: '' 
	last: '' 
	suffix: ''
	nickname: ''
]>

Similar comment with Mac. I went to school with a guy named Mac.

Mc is fine because there's no vowel so it can't be a first name. Although I guess it could be a title abbreviation, Master of Ceremonies, and I'm not sure how that would play out.

El is an article in Spanish, so I'd kinda like to know how it is used in a name. Is it used as the Spanish article in a title like el senator, or as a prefix like del?

'von',
])
3 changes: 3 additions & 0 deletions nameparser/config/suffixes.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
'arrc',
'bart',
'bem',
'bn',
'bt',
'cb',
'cbe',
Expand Down Expand Up @@ -97,6 +98,7 @@
'msc'
'msm',
'mvo',
'np',
'obe',
'obi',
'om',
Expand All @@ -109,6 +111,7 @@
'qgm',
'qpm',
'rd',
'rn',
'rrc',
'rvm',
'sgm',
Expand Down
2 changes: 0 additions & 2 deletions nameparser/config/titles.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,10 @@
'brother',
'dame',
'father',
'king',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both king and queen are including in the set of titles that indicate first names when placed before a single name, e.g. King David and Queen Mary, so this pull request will break some tests. In 2005 there were 148 people born in the US named King, so maybe it is a more useful case to handle than the title. I'm know people have used this parser on datasets that include kings and queens before though, but I guess we can let them customize the titles constant to pick them up.

We should update the test cases that include "king" to use one of the other titles in that set.

'maid',
'master',
'mother',
'pope',
'queen',
'sir',
'sister',
'uncle',
Expand Down