fix(internet): userName, email and slugify return only ascii #1554

matthewmayer · 2022-11-13T10:10:23Z

tentative fix for #1105 and #1437

before this PR is applied:

Object.keys(faker.locales).forEach(locale=>{faker.setLocale(locale); console.log(`${locale}: ${faker.internet.email()}`)})

af_ZA: Harvey_Ferreira60@gmail.com
ar: .@yahoo.com
az: Kellie_Hansen@yahoo.com
cz: Krytof9@atlas.cz
de: Lisann_Tsamonikian@yahoo.com
de_AT: Lenja2@gmail.com
de_CH: Marlies29@hotmail.com
el: .@gmail.com
en: Isobel40@yahoo.com
en_AU: Eliza_Edwards@yahoo.com
en_AU_ocker: Oliver46@gmail.com
en_BORK: Vita.Buckridge78@yahoo.com
en_CA: Fausto18@gmail.com
en_GB: Adrienne.Konopelski@yahoo.com
en_GH: person.female_first_name.Kusi@hotmail.com
en_IE: Oswaldo.Dietrich@hotmail.com
en_IN: Baalaaditya15@yahoo.co.in
en_NG: Titi.Christian94@yahoo.com
en_US: Erik83@hotmail.com
en_ZA: Amelia_Connelly33@yahoo.com
es: Esteban93@gmail.com
es_MX: Mayte.Ruiz@nearbpo.com
fa: 72@yahoo.com
fi: Oskari.Hmlinen@hotmail.com
fr: Flavie_Nguyen@hotmail.fr
fr_BE: Freda7@advalvas.be
fr_CA: Daija_Osinski@yahoo.ca
fr_CH: Arion13@hotmail.com
ge: _@posta.ge
he: 14@gmail.com
hr: David.Zdelar48@gmail.com
hu: Dina63@outlook.com
hy: .@gmail.com
id_ID: Paul_OKeefe@yahoo.co.id
it: Igor24@libero.it
ja: 太一.中村73@gmail.com
ko: 71@yahoo.co.kr
lv: Grover_Kshlerin@apollo.lv
mk: 44@hotmail.com
nb_NO: Herman_Strand@yahoo.com
ne: Raju26@gmail.com
nl: Nick.Janssen33@gmail.com
nl_BE: Amy44@gmail.com
pl: Gerald.Urbanowicz44@yahoo.com
pt_BR: Lvia98@yahoo.com
pt_PT: Edgar60@mail.pt
ro: Trenton46@hotmail.com
ru: Seamus.Carter@yahoo.com
sk: Bethany.Parisian@zoznam.sk
sv: Monica.Axelsson@gmail.com
tr: Brbars1@yahoo.com
uk: Hellen_Price34@ukr.net
ur: .@gmail.com
vi: VinhDiu.Mai@yahoo.com
zh_CN: 鑫鹏_宋@gmail.com
zh_TW: 樂駒76@hotmail.com
zu_ZA: Maphikelela.Mabhida@hotmail.com

after this PR is applied:

af_ZA: Margaret_Human@hotmail.com
ar: da8294716@yahoo.com
az: Tobin.Frami@yahoo.com
cz: Nepomuk_Fiedler70@atlas.cz
de: Mina_Markowski@gmail.com
de_AT: Lucy_Dietzsch@yahoo.com
de_CH: Richard.Huber@hotmail.com
dv: bg650967@yahoo.com
el: bm4452630@hotmail.com
en: Berenice.Swaniawski@gmail.com
en_AU: Jasmine_Greenholt49@yahoo.com
en_AU_ocker: Amelia_Nguyen7@yahoo.com
en_BORK: Blanca.Roob@gmail.com
en_CA: Marge_Goyette@hotmail.com
en_GB: Cierra45@hotmail.com
en_GH: person.female_first_name_Acheampong76@hotmail.com
en_IE: Gianni40@gmail.com
en_IN: Tejas14@gmail.com
en_NG: Fatima52@yahoo.com
en_US: Monserrat83@yahoo.com
en_ZA: Stephen_Hearne72@gmail.com
es: Luz_Collazo@gmail.com
es_MX: Ivanna.Llamas@hotmail.com
fa: bl1424060@yahoo.com
fi: Mika11@gmail.com
fr: Philippine28@yahoo.fr
fr_BE: Salvador98@netbel.be
fr_CA: Josue.Sauer@gmail.com
fr_CH: Eline.Sutermeister94@hotmail.com
ge: md9854879@gmail.com
he: gl1440764@hotmail.com
hr: Slavica29@gmail.com
hu: Lilina.Nmeth47@hotamil.com
hy: an7859723@yahoo.com
id_ID: Anita.Gottlieb@yahoo.co.id
it: Erminia_Laezza@email.it
ja: oq961736@gmail.com
ko: ix4305140@hanmail.net
lv: Amira_Renner@yahoo.com
mk: jn6356977@yahoo.com
nb_NO: Kasper_Pettersen91@yahoo.com
ne: Sanjay.Maharjan94@worldlink.com.np
nl: Anne_Klein@gmail.com
nl_BE: Arne_Goossens82@hotmail.com
pl: Mateusz_Knapik9@hotmail.com
pt_BR: Mariana36@hotmail.com
pt_PT: Beatriz55@hotmail.com
ro: Lorenza58@gmail.com
ru: Gilberto_Murphy@mail.ru
sk: Wyman_Bashirian83@azet.sk
sv: Cecilia92@gmail.com
tr: Bulgak_Erberk86@yahoo.com
uk: Krista_Powlowski@ex.ua
ur: zq6319489@gmail.com
vi: DimTrinh43@yahoo.com
zh_CN: fd7075767@yahoo.com
zh_TW: jn4619221@gmail.com
zu_ZA: Ayize.Mehloluhlaza62@yahoo.com

…locales, revert slugify to ascii only

codecov · 2022-11-13T10:38:17Z

Codecov Report

Merging #1554 (d461e30) into next (5e51335) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             next    #1554      +/-   ##
==========================================
- Coverage   99.64%   99.63%   -0.01%     
==========================================
  Files        2221     2222       +1     
  Lines      239460   239816     +356     
  Branches     1047     1056       +9     
==========================================
+ Hits       238604   238950     +346     
- Misses        835      845      +10     
  Partials       21       21

Impacted Files	Coverage Δ
src/modules/helpers/index.ts	`98.55% <100.00%> (+<0.01%)`	⬆️
src/modules/internet/char-mappings.ts	`100.00% <100.00%> (ø)`
src/modules/internet/index.ts	`100.00% <100.00%> (ø)`
src/modules/internet/user-agent.ts	`90.54% <0.00%> (-2.71%)`	⬇️

src/modules/internet/index.ts

test/internet.spec.ts

…locales, revert slugify to ascii only - add test

…locales, revert slugify to ascii only - fix test

Co-authored-by: Shinigami <chrissi92@hotmail.de>

…locales, revert slugify to ascii only - allow stripping diacritics

matthewmayer · 2022-11-20T16:38:05Z

I added a further refinement to slugify which strips simple diacritics. This helps in languages like French or Vietnamese, where you have names like Adélaïde or Ngân Hà. Previously those accented letters would get removed completed, leading to email addresses like:

adlade23@gmail.com
ngn.h23@gmail.com

now instead you'd get
adelaide23@gmail.com
ngan.ha23@gmail.com

which seems nicer. I also added some additional tests for slugify.

Shinigami92 · 2022-11-24T17:49:13Z

Team Decision:

We wont allow adding a dependency to faker to convert the given parameters to related outputs
This can be done by the user in front of the method and then just pass the converted args to the method
We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way
How that is done is an implementation detail and in the hand of the PR author
The method internet.userName should only return ascii characters
A new method internet.displayName should have (kinda) the same implementation as internet.userName has right now, but we can improve e.g. the options/parameter as we are already making breaking changes in v8.0

matthewmayer · 2022-11-25T06:16:50Z

"We will convert non-ascii chars to ascii chars, in a way that the input is related to the outcome, but not in a literal linguistic way. How that is done is an implementation detail and in the hand of the PR author"

So I tried a new approach in the latest commit, basically reimplementing the userName function with a few strategies:

Ascii names work as before

faker.internet.userName("Matt","Mayer")
'Matt_Mayer22'

Simple accents are stripped using Unicode NFD

faker.internet.userName("Hélene","Müller")
'Helene_Muller11'

A simple mapping can be use to transliterate alphabetical languages like Cyrillic (could also add say Greek and Thai).

faker.internet.userName("Фёдор","Достоевский")
'Fedor.Dostoevskii50'

As a final fallback, just concatenate the hex unicode char codes of each character.

faker.internet.userName("大羽","陳")
'59277fbd33'

This works tolerably well but leads to some very long usernames in the final fallback scenario

af_ZA: Theuns.Visser7@gmail.com
ar: 64662864a644_62764462e64464a64164a@hotmail.com
az: Kolby51@yahoo.com
cz: Pravdomil99@atlas.cz
de: Malte.Scherz@yahoo.com
de_AT: Defne.Hentschke60@yahoo.com
de_CH: Roland_Walter@yahoo.com
dv: 7907a77837a7.7847a67a07b07997a77877aa83@yahoo.com
el: 3a33c93c43b73c13b93bf3c2.3943b13b33ba3bb3b73c2@hotmail.com
en: Darryl.Dare22@hotmail.com
en_AU: Max.Hill3@yahoo.com
en_AU_ocker: Charlotte_Brown@yahoo.com
en_BORK: Asa.Abshire@hotmail.com
en_CA: Rosanna_Lesch43@hotmail.com
en_GB: Freida_Schaden@gmail.com
en_GH: {{person.male_first_name}}_Yirenkyi78@hotmail.com
en_IE: Margaretta94@gmail.com
en_IN: Bankim99@gmail.com
en_NG: Gbeminiyi.Atanda@yahoo.com
en_US: Alan65@hotmail.com
en_ZA: Jamie.Boucher@yahoo.com
es: Rosalia.Casarez39@hotmail.com
es_MX: Sergio_Barragan@corpfolder.com
fa: 6866276446276a973@yahoo.com
fi: Tarja23@hotmail.com
fr: Savin31@gmail.com
fr_BE: Samuel_Ratke@skynet.be
fr_CA: Elias1@hotmail.com
fr_CH: Iliana_Chappuis@outlook.com
ge: 10dc10dd10d310d010e0_10d110d010e010d310d010d510d410da10d810eb10d4@posta.ge
he: 5d25d55df.5d25d15d95e97@yahoo.com
hr: Gradimir_Uzanicki@gmail.com
hu: Benjamin.Orban@outlook.com
hy: 53f561580565576_53f56158056157a56557f57556157644@gmail.com
id_ID: Lindsay.Cummerata28@yahoo.co.id
it: Benigno71@yahoo.com
ja: 7d508863_677e672c31@yahoo.com
ko: 110b116f11ab1112116711bc.1103116111bc@gmail.com
lv: Hollis_Bogan52@mail.lv
mk: Mirche84@gmail.com
nb_NO: Synne42@hotmail.com
ne: Bibek_Limbu@worldlink.com.np
nl: Bas87@gmail.com
nl_BE: Oona24@gmail.com
pl: Apollo85@hotmail.com
pt_BR: Tertuliano_Moreira@hotmail.com
pt_PT: Pilar51@sapo.pt
ro: Gust19@hotmail.com
ru: Brooklyn58@hotmail.com
sk: Jovany58@gmail.com
sv: Emilia.Lind@yahoo.com
tr: Atalan49@gmail.com
uk: Anne.White39@gmail.com
ur: 64563364862f78@hotmail.com
vi: AnhVu.110ang@yahoo.com
zh_CN: 806a506580@hotmail.com
zh_TW: 70ab660e_4faf@yahoo.com
zu_ZA: Maphikelela61@gmail.com

ST-DDT

Impl wise this looks good to me.
Could you add an example to the jsdocs that uses non ascii input?
Also I think it might be better if you move the char mapping to separate file and import it. Not sure whether it should be ts or json. But that is optional for now IMO.

We could do a substring if it gets too long but that's a different issue.

Shinigami92 · 2022-11-25T10:12:12Z

We also discusses that we want displayName with (for now) the old behavior.
Can we please do this in this PR as well? Otherwise I'm afraid of missing it out.
If you explicitly not want to do this in this PR, please leave a reason and also open a new issue so we can track it and not miss it before v8.0 release.

…se base36 as fallback

Co-authored-by: ST-DDT <ST-DDT@gmx.de>

src/modules/internet/char-mappings.ts

Co-authored-by: Eric Cheng <ericcheng9316@gmail.com>

src/modules/internet/char-mappings.ts

src/modules/internet/index.ts

fix(internet): use a ascii-only fallback for email addresses in some …

405b129

…locales, revert slugify to ascii only

matthewmayer requested a review from a team as a code owner November 13, 2022 10:10

matthewmayer mentioned this pull request Nov 13, 2022

Weird email and username in Chinese locale package #1105

Closed

fix(internet): use a ascii-only fallback for email addresses in some …

4b137b4

…locales, revert slugify to ascii only

ST-DDT reviewed Nov 13, 2022

View reviewed changes

src/modules/internet/index.ts Outdated Show resolved Hide resolved

ST-DDT assigned matthewmayer Nov 13, 2022

ST-DDT added c: bug Something isn't working c: feature Request for new feature p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision labels Nov 13, 2022

ST-DDT added this to the v8.0 - Module Re-Shuffling milestone Nov 13, 2022

Shinigami92 reviewed Nov 13, 2022

View reviewed changes

src/modules/internet/index.ts Outdated Show resolved Hide resolved

test/internet.spec.ts Show resolved Hide resolved

matthewmayer added 2 commits November 13, 2022 23:47

fix(internet): use a ascii-only fallback for email addresses in some …

bf18f4a

…locales, revert slugify to ascii only - add test

fix(internet): use a ascii-only fallback for email addresses in some …

ddbb19a

…locales, revert slugify to ascii only - fix test

ST-DDT added the s: on hold Blocked by something or frozen to avoid conflicts label Nov 14, 2022

Matt Mayer and others added 2 commits November 17, 2022 16:05

Update src/modules/internet/index.ts

219173d

Co-authored-by: Shinigami <chrissi92@hotmail.de>

Merge branch 'next' into chore/fallback-ascii-email-addresses

8560c5d

ST-DDT linked an issue Nov 20, 2022 that may be closed by this pull request

Weird email and username in Chinese locale package #1105

Closed

fix(internet): use a ascii-only fallback for email addresses in some …

c37ffcd

…locales, revert slugify to ascii only - allow stripping diacritics

matthewmayer mentioned this pull request Nov 24, 2022

fix: ASCII-fy domainWord() #1520

Merged

ST-DDT removed s: needs decision Needs team/maintainer decision s: on hold Blocked by something or frozen to avoid conflicts labels Nov 24, 2022

fix(internet): make faker.internet.userName return ASCII only

d4b958b

ST-DDT reviewed Nov 25, 2022

View reviewed changes

fix(internet): make faker.internet.userName return ASCII only - and u…

83cd519

…se base36 as fallback

Matt Mayer and others added 5 commits November 27, 2022 10:56

Update src/modules/internet/index.ts

b717385

Co-authored-by: ST-DDT <ST-DDT@gmx.de>

Update src/modules/internet/index.ts

4b0ee08

Co-authored-by: ST-DDT <ST-DDT@gmx.de>

Update test/internet.spec.ts

2e1b5f6

Co-authored-by: ST-DDT <ST-DDT@gmx.de>

fix(internet): improve tests and docs

5b2757a

fix(internet): improve tests and docs

4884f27

ST-DDT previously approved these changes Nov 27, 2022

View reviewed changes

Merge branch 'next' into chore/fallback-ascii-email-addresses

bc636c5

import-brain reviewed Nov 29, 2022

View reviewed changes

src/modules/internet/char-mappings.ts Outdated Show resolved Hide resolved

Update src/modules/internet/char-mappings.ts

3505045

Co-authored-by: Eric Cheng <ericcheng9316@gmail.com>

matthewmayer dismissed ST-DDT’s stale review via 3505045 November 29, 2022 02:51

import-brain previously approved these changes Nov 30, 2022

View reviewed changes

ST-DDT previously approved these changes Nov 30, 2022

View reviewed changes

Merge branch 'next' into chore/fallback-ascii-email-addresses

8657c47

Shinigami92 requested changes Dec 1, 2022

View reviewed changes

fix: comment fixes

c9caafc

matthewmayer dismissed stale reviews from ST-DDT and import-brain via c9caafc December 1, 2022 09:41

ST-DDT approved these changes Dec 1, 2022

View reviewed changes

ST-DDT added 2 commits December 1, 2022 12:24

Merge branch 'next' into chore/fallback-ascii-email-addresses

cdb5d4f

Merge branch 'next' into chore/fallback-ascii-email-addresses

d461e30

ST-DDT changed the title ~~fix(internet): make faker.internet.userName, faker.internet.email and faker.helpers.slugify return only ascii, add faker.internet.displayName~~ fix(internet): make userName(), email() and helpers.slugify() return only ascii, add displayName() for non-ascii Dec 2, 2022

ST-DDT requested review from Shinigami92 and a team December 2, 2022 15:29

Shinigami92 approved these changes Dec 2, 2022

View reviewed changes

ST-DDT changed the title ~~fix(internet): make userName(), email() and helpers.slugify() return only ascii, add displayName() for non-ascii~~ fix(internet): userName, email and slugify return only ascii Dec 3, 2022

ST-DDT merged commit 4ed45fa into faker-js:next Dec 3, 2022

ST-DDT added the s: accepted Accepted feature / Confirmed bug label Dec 3, 2022

matthewmayer mentioned this pull request Jan 17, 2023

Generated email addresses are sometimes invalid in certain locales #1739

Closed

xDivisionByZerox mentioned this pull request Apr 25, 2023

The format of the email address output in Japanese is not correct. #1437

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(internet): userName, email and slugify return only ascii #1554

fix(internet): userName, email and slugify return only ascii #1554

matthewmayer commented Nov 13, 2022 •

edited

Loading

codecov bot commented Nov 13, 2022 •

edited

Loading

matthewmayer commented Nov 20, 2022 •

edited

Loading

Shinigami92 commented Nov 24, 2022

matthewmayer commented Nov 25, 2022 •

edited

Loading

ST-DDT left a comment •

edited

Loading

Shinigami92 commented Nov 25, 2022

fix(internet): userName, email and slugify return only ascii #1554

fix(internet): userName, email and slugify return only ascii #1554

Conversation

matthewmayer commented Nov 13, 2022 • edited Loading

codecov bot commented Nov 13, 2022 • edited Loading

Codecov Report

matthewmayer commented Nov 20, 2022 • edited Loading

Shinigami92 commented Nov 24, 2022

matthewmayer commented Nov 25, 2022 • edited Loading

ST-DDT left a comment • edited Loading

Choose a reason for hiding this comment

Shinigami92 commented Nov 25, 2022

matthewmayer commented Nov 13, 2022 •

edited

Loading

codecov bot commented Nov 13, 2022 •

edited

Loading

matthewmayer commented Nov 20, 2022 •

edited

Loading

matthewmayer commented Nov 25, 2022 •

edited

Loading

ST-DDT left a comment •

edited

Loading