Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve keyword normalization #164

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Conversation

hukkin
Copy link
Contributor

@hukkin hukkin commented Jan 22, 2025

This PR attempts to improve keyword normalization in a few ways:

  • Lower case keywords in emojis.json. There are upper case letters in the data that are otherwise not searchable. With this e.g this works.
    $ em -s åland
    Copied! 🇦🇽  flag_aland_islands
    Main branch would currently return no results because the letter "å" is upper cased in emojis.json.
  • Normalize dashes (-) in emojis.json. They are already normalized in user input. This makes keywords such as blue-square searchable. Dashes are almost exclusively used as word separator in keywords, so normalization makes sense.
  • Don't normalize dots (.) in user input. They aren't normalized in emojis.json so are not searchable in main branch without this. This makes more sense than normalizing in emojis.json too, IMO, because dots are not used as a word separator in emojis.json unlike spaces, dashes and underscores, but only as part of abbreviations such as "mr.", "mrs.", "st.", "u.s.". This much improves searches such as
    $ em -s st.
    🇧🇱  flag_st_barthelemy
    🇫🇷  flag_france
    🇰🇳  flag_st_kitts_nevis
    🇱🇨  flag_st_lucia
    🇲🇫  flag_st_martin
    🇵🇲  flag_st_pierre_miquelon
    🇸🇭  flag_st_helena
    🇻🇨  flag_st_vincent_grenadines

Copy link

codecov bot commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.56%. Comparing base (6eb4993) to head (59f52c4).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #164   +/-   ##
=======================================
  Coverage   95.56%   95.56%           
=======================================
  Files           2        2           
  Lines         203      203           
=======================================
  Hits          194      194           
  Misses          9        9           
Flag Coverage Δ
macos-latest 92.61% <100.00%> (ø)
ubuntu-latest 93.10% <100.00%> (ø)
windows-latest 92.61% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hugovk hugovk added the changelog: Changed For changes in existing functionality label Jan 23, 2025
@hugovk
Copy link
Owner

hugovk commented Jan 23, 2025

Thank you!

Do you think you'll have some more improvements or shall we do a release? No rush either way :)

@hugovk hugovk merged commit d8334bd into hugovk:main Jan 23, 2025
31 of 32 checks passed
@hukkin hukkin deleted the improve-normalization branch January 23, 2025 09:09
@hukkin
Copy link
Contributor Author

hukkin commented Jan 23, 2025

Thanks for having the time to review all of these. I don't have any more improvements planned right now.

A release sounds good, but also no rush!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: Changed For changes in existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants