Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 'augment-en' script to pull keywords from platform data #226

Merged
merged 4 commits into from
Sep 30, 2024

Conversation

JoshuaKGoldberg
Copy link
Collaborator

@JoshuaKGoldberg JoshuaKGoldberg commented Mar 30, 2024

Does not fix #194, but somewhat sidesteps the issue by adding in a lot of keywords.

This is mostly the same strategy as #194 (comment). scripts/augment-en.js pulls in all keywords and names from the Emojipedia/Unicode, GitHub, Fluent UI, and Twitter platforms via emoji-platform-data. It filters out duplicate and unnecessary keywords, then pushes any new ones to dist/emoji-*.json.

Note that this PR adds a couple more filters, to keep the change a little smaller:

  • Keywords that already appear in the emoji's title (e.g. "health" as a part of "health_worker")
  • Keywords that include _ (e.g. "big_grin" in 😆)
  • Keywords that don't have any letters (e.g. "><" in 😆)

In total, this augments 1728 emoji with a total of 5956 keywords.

Leaving this in draft for now as I haven't gotten approval for this strategy 😄 un-drafted, with the same note. I won't be upset at all if it's just blanket denied. But it was a ton of fun to get to this PR!

@JoshuaKGoldberg
Copy link
Collaborator Author

Un-drafting as an excuse to ping you @muan. 😁

@muan
Copy link
Owner

muan commented Jun 3, 2024

Sorry! I will try to take some time to look at this.

scripts/augment.js Outdated Show resolved Hide resolved
@muan
Copy link
Owner

muan commented Jun 16, 2024

Sorry it took me so long.

The other thing I'd note still is that I don't think this fixes #194. There is still a question for whether "penis" is an acceptable keyword for 🍆. I think that's worth of documenting. I think we have talked about this over the email for a bit a long time ago. I thought this PR was that actually, so I didn't read it sooner. Once again apologies for being absent-minded.

So, just that review comment, and if the fixes 194 keyword can be taken out that'd be great. Or you can simply reopen afterwards.

@muan
Copy link
Owner

muan commented Jun 16, 2024

Also this feels like a minor version bump. What do you think?

@JoshuaKGoldberg
Copy link
Collaborator Author

There is still a question for whether "penis" is an acceptable keyword for 🍆.

I was seeing this as answering that by defining emojilib as including:

  • All the keywords that it includes today
  • The keywords defined in known common platforms (emoji-platform-data)

...meaning, for "penis" and 🍆, that'd be a no from me.

Also this feels like a minor version bump. What do you think?

I think technically it would be... but I worry about adding in so much new data in a minor. Can I suggest a major just to play it safe?

@JoshuaKGoldberg JoshuaKGoldberg requested a review from muan June 16, 2024 14:58
@muan
Copy link
Owner

muan commented Jun 16, 2024

meaning, for "penis" and 🍆, that'd be a no from me.

I think this library should always accept user defined keyword which was how most of the existing keywords came from. That was what I think was missing from CLDR therefore the existence of this library.

Otherwise it seems like people can just bypass this library altogether by combining unicode emoji json with emoji platform data and get pretty much this, right?

@JoshuaKGoldberg
Copy link
Collaborator Author

Ah gotcha, thanks, that's helpful. I realize now that you're right, this really doesn't fix #194. It completely sidesteps it and just adds a bunch of new keywords 😄! Removed the auto-close note.

people can just bypass this library altogether by combining unicode emoji json with emoji platform data and get pretty much this

Yes, exactly. IMO it'd be a lot easier for maintenance to stick with just that exact list of keywords, so politically fraught decisions like "does searching for hummus give 🇮🇱 and/or 🇵🇸?" don't need to be decided by open source maintainers (e.g. you and me).

@JoshuaKGoldberg JoshuaKGoldberg changed the title Added 'augment' script to pull keywords from platform data Added 'augment-en' script to pull keywords from platform data Jun 17, 2024
@JoshuaKGoldberg
Copy link
Collaborator Author

😐 accidentally force-pushed a removal of branch history... and now GitHub isn't letting me re-open this PR. Re-made this PR at #227. Sorry for the noise.

@JoshuaKGoldberg
Copy link
Collaborator Author

JoshuaKGoldberg commented Jun 17, 2024

Edit: and now I can. Very confusing. Maybe it was from adding the merge from main commit...

"fast-xml-parser": "^4.3.6",
"prettier": "^1.18.2",
"promptly": "^3.2.0",
"tape": "^5.1.1",
"unicode-emoji-json": "0.6.0"
}
},
"node_modules/@graphql-typed-document-node/core": {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I don't love that adding emoji-platform-data as a dev dependency also transitively adds in the packages emojipedia depends on for its Node.js API. Seems silly. Tracked in JoshuaKGoldberg/emoji-platform-data#200.

@JoshuaKGoldberg
Copy link
Collaborator Author

👋 @muan sorry, I think I lost track here - are you waiting on me for anything?

@muan
Copy link
Owner

muan commented Sep 30, 2024

I am so extremely sorry about the delay. I have been terrible at emails.

I agree with #226 (comment) and I'd rather if they aren't added at all. But uh, seems like it doesn't affect user experience and we can remove it at a latter date if that end up being annoying to maintain.

@muan muan merged commit e74c331 into muan:main Sep 30, 2024
@muan
Copy link
Owner

muan commented Sep 30, 2024

Also please feel free to merge in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What constitutes an acceptable keyword?
2 participants