-
-
Notifications
You must be signed in to change notification settings - Fork 929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(person): Improvements to Dutch name generation, in particular regarding affixes #1778
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## next #1778 +/- ##
==========================================
- Coverage 99.64% 99.63% -0.02%
==========================================
Files 2347 2346 -1
Lines 235657 235735 +78
Branches 1145 1142 -3
==========================================
+ Hits 234811 234863 +52
- Misses 824 850 +26
Partials 22 22
|
#1637 has been merged. Please rebase. |
Rebased onto next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the last name list in alphabetical order
I've sorted the list of names ASCIIbetically. NB this is not the Dutch sorting order. |
Thank you! @faker-js/maintainers Should we sort the names ASCIIbetically or the Dutch way? |
Oh, and I misspoke - it's not ASCIIbetical (because that would imply case-sensitive sorting - Z before a). It's currently sorted with regular English case-insensitive sorting. So, in a nice example of "be careful what you wish for" we have the following sorting options:
The one that is implemented with the latest commit is #2. So, which sorting method should I use? Personally, I don't mind either way. |
One of the reasons why we want the sorted lists is to aid in the review process to detect similar or identical entries faster. |
I've added two more sources for the family names. Since the purpose of sorting is detecting duplicates, I think the current English sorting is fine and we can leave it at that. |
Well, you added more links, but you didn't add any new names, so it is basically still the same list from the first source in the list. |
I don't get what you mean by legal implications. It's quite obvious that wikipedia doesn't own this list, and that the original source is in fact from https://www.meertens.knaw.nl/nfb/documenten/top100.pdf, as stated in the article. A list of "top 100 most occurring family names in 2007" are facts, and facts are in fact not copyrightable. |
However, if it helps I've made a small modification to the list so it's no longer exactly the same as the source. |
IANAL: The facts itself might not be, but the work to compile it may be or at least, some claim it to be: https://blog.polco.us/copyright-apply-survey Thanks for your efford to address this, but 1% differnce might not be enough. |
Maybe go back to the original |
All right, I made a major modification to the list. It's now a fully original compilation of names, a random selection of various sources created by me personally. Any resemblance with any other list you might find is purely coincidental, of course lists of common names are bound to have overlap.
Funnily enough, after adding the appropriate tussenvoegsels, nearly every name on the original list is also in the top 100 list. I only detected two differences: 'Klein', which I added, and 'Stichting', which isn't a family name. |
Nice work guys, I appreciate your attention to detail! |
Addresses #1777, plus some minor improvements to Dutch name generation.
I've based this on PR #1637 (fullname name patterns) assuming that will be approved as well.