Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(locale): filter inappropriate words in the en locale #1633

Merged
merged 5 commits into from
Dec 24, 2022

Conversation

rcfox
Copy link
Contributor

@rcfox rcfox commented Dec 6, 2022

Addressing #1631

@ST-DDT ST-DDT added c: bug Something isn't working p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug c: locale Permutes locale definitions labels Dec 6, 2022
@ST-DDT ST-DDT added this to the v8.0 - Module Re-Shuffling milestone Dec 6, 2022
@ST-DDT
Copy link
Member

ST-DDT commented Dec 6, 2022

IMO the English locale is big enough that we can be quite thoroughly with the removals.
Some of the words don't feel offending by themselves, but I agree, that they might cause irritation even by unawareness of their (original) meaning. (Especially when mixed with random words e.g. any adjective+color.)

@ST-DDT
Copy link
Member

ST-DDT commented Dec 8, 2022

Team decision

We will check:

  • each of your suggested removals once your PR is complete, for some we aren't sure yet, but the general direction is good
  • the en locale for additional entries that needs to be removed
  • whether there are list's of words in the internet that can be considered offensive/unprofessional and use it for additional filtering

@rcfox rcfox changed the title WIP fix(locale): Filter inappropriate words in the en locale fix(locale): Filter inappropriate words in the en locale Dec 11, 2022
@rcfox rcfox marked this pull request as ready for review December 11, 2022 13:40
@rcfox rcfox requested a review from a team as a code owner December 11, 2022 13:40
@rcfox rcfox force-pushed the filter-inappropriate-words-en branch from 8301ae2 to 2779c33 Compare December 11, 2022 13:41
@rcfox
Copy link
Contributor Author

rcfox commented Dec 11, 2022

This PR is ready for review now. I do realize I probably went a little overboard on some of the words, so it's unlikely I'll have any arguments about words to reinstate. (With such long lists, I sort of had to turn part of my brain off and go by a gut feeling.)

I did initially try a using a list of bad words meant for moderation software, but felt they lacked nuance.

ST-DDT
ST-DDT previously approved these changes Dec 11, 2022
@ST-DDT
Copy link
Member

ST-DDT commented Dec 11, 2022

Please run pnpm run test -u to update the test snapshots.

@rcfox rcfox force-pushed the filter-inappropriate-words-en branch from 03d2a2f to 2266db7 Compare December 12, 2022 15:31
@rcfox
Copy link
Contributor Author

rcfox commented Dec 12, 2022

Rebased on latest 'next' branch. Updated test snapshots. Squashed the commits into a single commit.

ST-DDT
ST-DDT previously approved these changes Dec 12, 2022
@codecov
Copy link

codecov bot commented Dec 12, 2022

Codecov Report

Merging #1633 (80e33c3) into next (d7ccd70) will increase coverage by 0.00%.
The diff coverage is n/a.

Additional details and impacted files
@@           Coverage Diff            @@
##             next    #1633    +/-   ##
========================================
  Coverage   99.64%   99.65%            
========================================
  Files        2240     2240            
  Lines      240291   240106   -185     
  Branches     1072     1076     +4     
========================================
- Hits       239432   239271   -161     
+ Misses        838      814    -24     
  Partials       21       21            
Impacted Files Coverage Δ
src/locales/en/word/adjective.ts 100.00% <ø> (ø)
src/locales/en/word/adverb.ts 100.00% <ø> (ø)
src/locales/en/word/noun.ts 100.00% <ø> (ø)
src/locales/en/word/verb.ts 100.00% <ø> (ø)
src/modules/internet/user-agent.ts 91.08% <0.00%> (+6.48%) ⬆️

@ST-DDT ST-DDT requested a review from a team December 12, 2022 18:31
Copy link
Member

@Shinigami92 Shinigami92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is a merge conflict with test/__snapshots__/system.spec.ts.snap
Please rebase your PR and solve it


I would like to softly "enforce" a required approval from @xDivisionByZerox for this PR

@Shinigami92 Shinigami92 added the needs rebase There is a merge conflict label Dec 14, 2022
@rcfox
Copy link
Contributor Author

rcfox commented Dec 14, 2022

These merge conflicts are going to keep cropping up as other changes are merged in. I'm going to wait until this ticket is otherwise ready to be merged and handle conflicts then.

Copy link
Member

@Shinigami92 Shinigami92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I added some ":thinking:" to the first two files

I think we need to add a sentence / explanation to a file why some words are not allowed

Maybe the contrib guidelines, maybe a comment at the top of the file?

I'm not sure about the colors, we could say: no colors => use colors module for that and I'm fine
I'm not sure about some words like "lame" as yeah they might be like "boring" but I don't think they need to be totally omitted

"obese", "heterosexual", "homosexual" => I think it is more inclusive to have these words in our set as the otherway around

There are some alcoholic words 🤔 maybe we need to transfer these to the requested food module (like colors)

"dwarf" => it is just a fantasy word 🤔

I understand why to remove sexual affected words, but not sure where to draw the line between normal talk and "above-18" words

src/locales/en/word/adjective.ts Show resolved Hide resolved
src/locales/en/word/adjective.ts Show resolved Hide resolved
src/locales/en/word/adjective.ts Show resolved Hide resolved
src/locales/en/word/adverb.ts Show resolved Hide resolved
src/locales/en/word/noun.ts Show resolved Hide resolved
src/locales/en/word/noun.ts Show resolved Hide resolved
src/locales/en/word/noun.ts Show resolved Hide resolved
src/locales/en/word/noun.ts Show resolved Hide resolved
src/locales/en/word/noun.ts Show resolved Hide resolved
src/locales/en/word/noun.ts Show resolved Hide resolved
@ST-DDT
Copy link
Member

ST-DDT commented Dec 14, 2022

I think we need to add a sentence / explanation to a file why some words are not allowed

"obese", "heterosexual", "homosexual" => I think it is more inclusive to have these words in our set as the otherway around

I understand why to remove sexual affected words, but not sure where to draw the line between normal talk and "above-18" words

In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

Since there is no need to include any words regarding

age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation

we should avoid adding any of them, because we still have plenty of other words.
(Just to prevent accidentally offending someone by generating good ___ or bad ___.)
The list of words is supposed to free of any intent to avoid creating offense by joining valuing words to words with intent.
It is easier to remove words referring to specific persons, ... than removing words that contain value.

Maybe the contrib guidelines

Maybe, I just don't know how to exactly formulate that.

maybe a comment at the top of the file?

This is somewhat special to the word category in general and not a single file.

terryknowlton
terryknowlton previously approved these changes Dec 19, 2022
Copy link

@terryknowlton terryknowlton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 We appreciate the cleanup.

import-brain
import-brain previously approved these changes Dec 19, 2022
Shinigami92
Shinigami92 previously approved these changes Dec 23, 2022
Copy link
Member

@Shinigami92 Shinigami92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also discussed to remove entries from locales where a file contains more then 1k entries (like names or nouns) so files are not getting bigger and bigger
1k entries are more then enough for normal use cases

@ST-DDT ST-DDT dismissed stale reviews from Shinigami92, import-brain, terryknowlton, and themself via 67f96e7 December 24, 2022 01:13
@ST-DDT ST-DDT changed the title fix(locale): Filter inappropriate words in the en locale fix(locale): filter inappropriate words in the en locale Dec 24, 2022
@ST-DDT ST-DDT enabled auto-merge (squash) December 24, 2022 01:13
@ST-DDT ST-DDT removed the needs rebase There is a merge conflict label Dec 24, 2022
@ST-DDT ST-DDT requested review from Shinigami92 and a team December 24, 2022 01:15
Copy link
Member

@xDivisionByZerox xDivisionByZerox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve this for two reasons, even tho I'm not happy with some words being removed:

  1. It's reasonable to have this in the alpha release, as people are actually interested in this
  2. As @ST-DDT said: We have so many words in the en locale, it doesn't matter if we get rid of some

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working c: locale Permutes locale definitions p: 1-normal Nothing urgent s: accepted Accepted feature / Confirmed bug
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants