Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(internet): userName, email and slugify return only ascii #1554

Merged

Conversation

matthewmayer
Copy link
Contributor

@matthewmayer matthewmayer commented Nov 13, 2022

tentative fix for #1105 and #1437

before this PR is applied:

Object.keys(faker.locales).forEach(locale=>{faker.setLocale(locale); console.log(`${locale}: ${faker.internet.email()}`)})
af_ZA: Harvey_Ferreira60@gmail.com
ar: .@yahoo.com
az: Kellie_Hansen@yahoo.com
cz: Krytof9@atlas.cz
de: Lisann_Tsamonikian@yahoo.com
de_AT: Lenja2@gmail.com
de_CH: Marlies29@hotmail.com
el: .@gmail.com
en: Isobel40@yahoo.com
en_AU: Eliza_Edwards@yahoo.com
en_AU_ocker: Oliver46@gmail.com
en_BORK: Vita.Buckridge78@yahoo.com
en_CA: Fausto18@gmail.com
en_GB: Adrienne.Konopelski@yahoo.com
en_GH: person.female_first_name.Kusi@hotmail.com
en_IE: Oswaldo.Dietrich@hotmail.com
en_IN: Baalaaditya15@yahoo.co.in
en_NG: Titi.Christian94@yahoo.com
en_US: Erik83@hotmail.com
en_ZA: Amelia_Connelly33@yahoo.com
es: Esteban93@gmail.com
es_MX: Mayte.Ruiz@nearbpo.com
fa: 72@yahoo.com
fi: Oskari.Hmlinen@hotmail.com
fr: Flavie_Nguyen@hotmail.fr
fr_BE: Freda7@advalvas.be
fr_CA: Daija_Osinski@yahoo.ca
fr_CH: Arion13@hotmail.com
ge: _@posta.ge
he: 14@gmail.com
hr: David.Zdelar48@gmail.com
hu: Dina63@outlook.com
hy: .@gmail.com
id_ID: Paul_OKeefe@yahoo.co.id
it: Igor24@libero.it
ja: 太一.中村73@gmail.com
ko: 71@yahoo.co.kr
lv: Grover_Kshlerin@apollo.lv
mk: 44@hotmail.com
nb_NO: Herman_Strand@yahoo.com
ne: Raju26@gmail.com
nl: Nick.Janssen33@gmail.com
nl_BE: Amy44@gmail.com
pl: Gerald.Urbanowicz44@yahoo.com
pt_BR: Lvia98@yahoo.com
pt_PT: Edgar60@mail.pt
ro: Trenton46@hotmail.com
ru: Seamus.Carter@yahoo.com
sk: Bethany.Parisian@zoznam.sk
sv: Monica.Axelsson@gmail.com
tr: Brbars1@yahoo.com
uk: Hellen_Price34@ukr.net
ur: .@gmail.com
vi: VinhDiu.Mai@yahoo.com
zh_CN: 鑫鹏_宋@gmail.com
zh_TW: 樂駒76@hotmail.com
zu_ZA: Maphikelela.Mabhida@hotmail.com

after this PR is applied:

af_ZA: Margaret_Human@hotmail.com
ar: da8294716@yahoo.com
az: Tobin.Frami@yahoo.com
cz: Nepomuk_Fiedler70@atlas.cz
de: Mina_Markowski@gmail.com
de_AT: Lucy_Dietzsch@yahoo.com
de_CH: Richard.Huber@hotmail.com
dv: bg650967@yahoo.com
el: bm4452630@hotmail.com
en: Berenice.Swaniawski@gmail.com
en_AU: Jasmine_Greenholt49@yahoo.com
en_AU_ocker: Amelia_Nguyen7@yahoo.com
en_BORK: Blanca.Roob@gmail.com
en_CA: Marge_Goyette@hotmail.com
en_GB: Cierra45@hotmail.com
en_GH: person.female_first_name_Acheampong76@hotmail.com
en_IE: Gianni40@gmail.com
en_IN: Tejas14@gmail.com
en_NG: Fatima52@yahoo.com
en_US: Monserrat83@yahoo.com
en_ZA: Stephen_Hearne72@gmail.com
es: Luz_Collazo@gmail.com
es_MX: Ivanna.Llamas@hotmail.com
fa: bl1424060@yahoo.com
fi: Mika11@gmail.com
fr: Philippine28@yahoo.fr
fr_BE: Salvador98@netbel.be
fr_CA: Josue.Sauer@gmail.com
fr_CH: Eline.Sutermeister94@hotmail.com
ge: md9854879@gmail.com
he: gl1440764@hotmail.com
hr: Slavica29@gmail.com
hu: Lilina.Nmeth47@hotamil.com
hy: an7859723@yahoo.com
id_ID: Anita.Gottlieb@yahoo.co.id
it: Erminia_Laezza@email.it
ja: oq961736@gmail.com
ko: ix4305140@hanmail.net
lv: Amira_Renner@yahoo.com
mk: jn6356977@yahoo.com
nb_NO: Kasper_Pettersen91@yahoo.com
ne: Sanjay.Maharjan94@worldlink.com.np
nl: Anne_Klein@gmail.com
nl_BE: Arne_Goossens82@hotmail.com
pl: Mateusz_Knapik9@hotmail.com
pt_BR: Mariana36@hotmail.com
pt_PT: Beatriz55@hotmail.com
ro: Lorenza58@gmail.com
ru: Gilberto_Murphy@mail.ru
sk: Wyman_Bashirian83@azet.sk
sv: Cecilia92@gmail.com
tr: Bulgak_Erberk86@yahoo.com
uk: Krista_Powlowski@ex.ua
ur: zq6319489@gmail.com
vi: DimTrinh43@yahoo.com
zh_CN: fd7075767@yahoo.com
zh_TW: jn4619221@gmail.com
zu_ZA: Ayize.Mehloluhlaza62@yahoo.com

@codecov
Copy link

codecov bot commented Nov 13, 2022

Codecov Report

Merging #1554 (d461e30) into next (5e51335) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #1554      +/-   ##
==========================================
- Coverage   99.64%   99.63%   -0.01%     
==========================================
  Files        2221     2222       +1     
  Lines      239460   239816     +356     
  Branches     1047     1056       +9     
==========================================
+ Hits       238604   238950     +346     
- Misses        835      845      +10     
  Partials       21       21              
Impacted Files Coverage Δ
src/modules/helpers/index.ts 98.55% <100.00%> (+<0.01%) ⬆️
src/modules/internet/char-mappings.ts 100.00% <100.00%> (ø)
src/modules/internet/index.ts 100.00% <100.00%> (ø)
src/modules/internet/user-agent.ts 90.54% <0.00%> (-2.71%) ⬇️

@ST-DDT ST-DDT added c: bug Something isn't working c: feature Request for new feature p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision labels Nov 13, 2022
@ST-DDT ST-DDT added this to the v8.0 - Module Re-Shuffling milestone Nov 13, 2022
src/modules/internet/index.ts Outdated Show resolved Hide resolved
test/internet.spec.ts Show resolved Hide resolved
…locales, revert slugify to ascii only - add test
…locales, revert slugify to ascii only - fix test
@ST-DDT ST-DDT added the s: on hold Blocked by something or frozen to avoid conflicts label Nov 14, 2022
Matt Mayer and others added 2 commits November 17, 2022 16:05
@ST-DDT ST-DDT linked an issue Nov 20, 2022 that may be closed by this pull request
…locales, revert slugify to ascii only - allow stripping diacritics
@matthewmayer
Copy link
Contributor Author

matthewmayer commented Nov 20, 2022

I added a further refinement to slugify which strips simple diacritics. This helps in languages like French or Vietnamese, where you have names like Adélaïde or Ngân Hà. Previously those accented letters would get removed completed, leading to email addresses like:

adlade23@gmail.com
ngn.h23@gmail.com

now instead you'd get
adelaide23@gmail.com
ngan.ha23@gmail.com

which seems nicer. I also added some additional tests for slugify.

@Shinigami92
Copy link
Member

Team Decision:

  • We wont allow adding a dependency to faker to convert the given parameters to related outputs
    This can be done by the user in front of the method and then just pass the converted args to the method
  • We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way
    How that is done is an implementation detail and in the hand of the PR author
  • The method internet.userName should only return ascii characters
  • A new method internet.displayName should have (kinda) the same implementation as internet.userName has right now, but we can improve e.g. the options/parameter as we are already making breaking changes in v8.0

@ST-DDT ST-DDT removed s: needs decision Needs team/maintainer decision s: on hold Blocked by something or frozen to avoid conflicts labels Nov 24, 2022
@matthewmayer
Copy link
Contributor Author

matthewmayer commented Nov 25, 2022

"We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way. How that is done is an implementation detail and in the hand of the PR author"

So I tried a new approach in the latest commit, basically reimplementing the userName function with a few strategies:

Ascii names work as before

faker.internet.userName("Matt","Mayer")
'Matt_Mayer22'

Simple accents are stripped using Unicode NFD

faker.internet.userName("Hélene","Müller")
'Helene_Muller11'

A simple mapping can be use to transliterate alphabetical languages like Cyrillic (could also add say Greek and Thai).

faker.internet.userName("Фёдор","Достоевский")
'Fedor.Dostoevskii50'

As a final fallback, just concatenate the hex unicode char codes of each character.

faker.internet.userName("大羽","陳")
'59277fbd33'

This works tolerably well but leads to some very long usernames in the final fallback scenario

af_ZA: Theuns.Visser7@gmail.com
ar: 64662864a644_62764462e64464a64164a@hotmail.com
az: Kolby51@yahoo.com
cz: Pravdomil99@atlas.cz
de: Malte.Scherz@yahoo.com
de_AT: Defne.Hentschke60@yahoo.com
de_CH: Roland_Walter@yahoo.com
dv: 7907a77837a7.7847a67a07b07997a77877aa83@yahoo.com
el: 3a33c93c43b73c13b93bf3c2.3943b13b33ba3bb3b73c2@hotmail.com
en: Darryl.Dare22@hotmail.com
en_AU: Max.Hill3@yahoo.com
en_AU_ocker: Charlotte_Brown@yahoo.com
en_BORK: Asa.Abshire@hotmail.com
en_CA: Rosanna_Lesch43@hotmail.com
en_GB: Freida_Schaden@gmail.com
en_GH: {{person.male_first_name}}_Yirenkyi78@hotmail.com
en_IE: Margaretta94@gmail.com
en_IN: Bankim99@gmail.com
en_NG: Gbeminiyi.Atanda@yahoo.com
en_US: Alan65@hotmail.com
en_ZA: Jamie.Boucher@yahoo.com
es: Rosalia.Casarez39@hotmail.com
es_MX: Sergio_Barragan@corpfolder.com
fa: 6866276446276a973@yahoo.com
fi: Tarja23@hotmail.com
fr: Savin31@gmail.com
fr_BE: Samuel_Ratke@skynet.be
fr_CA: Elias1@hotmail.com
fr_CH: Iliana_Chappuis@outlook.com
ge: 10dc10dd10d310d010e0_10d110d010e010d310d010d510d410da10d810eb10d4@posta.ge
he: 5d25d55df.5d25d15d95e97@yahoo.com
hr: Gradimir_Uzanicki@gmail.com
hu: Benjamin.Orban@outlook.com
hy: 53f561580565576_53f56158056157a56557f57556157644@gmail.com
id_ID: Lindsay.Cummerata28@yahoo.co.id
it: Benigno71@yahoo.com
ja: 7d508863_677e672c31@yahoo.com
ko: 110b116f11ab1112116711bc.1103116111bc@gmail.com
lv: Hollis_Bogan52@mail.lv
mk: Mirche84@gmail.com
nb_NO: Synne42@hotmail.com
ne: Bibek_Limbu@worldlink.com.np
nl: Bas87@gmail.com
nl_BE: Oona24@gmail.com
pl: Apollo85@hotmail.com
pt_BR: Tertuliano_Moreira@hotmail.com
pt_PT: Pilar51@sapo.pt
ro: Gust19@hotmail.com
ru: Brooklyn58@hotmail.com
sk: Jovany58@gmail.com
sv: Emilia.Lind@yahoo.com
tr: Atalan49@gmail.com
uk: Anne.White39@gmail.com
ur: 64563364862f78@hotmail.com
vi: AnhVu.110ang@yahoo.com
zh_CN: 806a506580@hotmail.com
zh_TW: 70ab660e_4faf@yahoo.com
zu_ZA: Maphikelela61@gmail.com

Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl wise this looks good to me.
Could you add an example to the jsdocs that uses non ascii input?
Also I think it might be better if you move the char mapping to separate file and import it. Not sure whether it should be ts or json. But that is optional for now IMO.

We could do a substring if it gets too long but that's a different issue.

@Shinigami92
Copy link
Member

We also discusses that we want displayName with (for now) the old behavior.
Can we please do this in this PR as well? Otherwise I'm afraid of missing it out.
If you explicitly not want to do this in this PR, please leave a reason and also open a new issue so we can track it and not miss it before v8.0 release.

Matt Mayer and others added 5 commits November 27, 2022 10:56
ST-DDT
ST-DDT previously approved these changes Nov 27, 2022
Co-authored-by: Eric Cheng <ericcheng9316@gmail.com>
import-brain
import-brain previously approved these changes Nov 30, 2022
ST-DDT
ST-DDT previously approved these changes Nov 30, 2022
src/modules/internet/char-mappings.ts Outdated Show resolved Hide resolved
src/modules/internet/index.ts Show resolved Hide resolved
src/modules/internet/index.ts Outdated Show resolved Hide resolved
src/modules/internet/index.ts Outdated Show resolved Hide resolved
src/modules/internet/index.ts Outdated Show resolved Hide resolved
@matthewmayer matthewmayer dismissed stale reviews from ST-DDT and import-brain via c9caafc December 1, 2022 09:41
@ST-DDT ST-DDT changed the title fix(internet): make faker.internet.userName, faker.internet.email and faker.helpers.slugify return only ascii, add faker.internet.displayName fix(internet): make userName(), email() and helpers.slugify() return only ascii, add displayName() for non-ascii Dec 2, 2022
@ST-DDT ST-DDT requested review from Shinigami92 and a team December 2, 2022 15:29
@ST-DDT ST-DDT changed the title fix(internet): make userName(), email() and helpers.slugify() return only ascii, add displayName() for non-ascii fix(internet): userName, email and slugify return only ascii Dec 3, 2022
@ST-DDT ST-DDT merged commit 4ed45fa into faker-js:next Dec 3, 2022
@ST-DDT ST-DDT added the s: accepted Accepted feature / Confirmed bug label Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working c: feature Request for new feature p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Weird email and username in Chinese locale package
4 participants