Skip to content

Improve aspell dictionary #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 18, 2025
Merged

Improve aspell dictionary #260

merged 3 commits into from
Mar 18, 2025

Conversation

lumynou5
Copy link
Contributor

This patch improves the personal dictionary used by aspell to check commit messages by

  • Adding words,
  • Sorting them, and
  • Removing duplicated entries.

Discussion is welcome. If you find some words cannot pass the check, please leave a comment. They will be squashed into a single commit containing all added words, with "Co-authored-by" trailers for the contributors of the words.

Change-Id: I332fafc989700ea9d3dbc2fcddf746af0fa46a8d

@jserv
Copy link
Contributor

jserv commented Mar 12, 2025

We should automate the sorting of words in the user-defined dictionary.

@lumynou5
Copy link
Contributor Author

We should automate the sorting of words in the user-defined dictionary.

I think we can check if the words are sorted correctly like the clang-format check in the pre-commit hook, so they have similar behaviors.

@lumynou5
Copy link
Contributor Author

We should automate the sorting of words in the user-defined dictionary.

What do you think? Not to automate the sorting but check the order.

@jserv
Copy link
Contributor

jserv commented Mar 17, 2025

Not to automate the sorting but check the order.

It is pretty well. Show the proposed changes.

@lumynou5
Copy link
Contributor Author

Done. I also rebased the branch on master.

@jserv
Copy link
Contributor

jserv commented Mar 17, 2025

For sorting purposes, it is generally better to ensure all keys in a user-defined dictionary are lowercase in advance. This prevents potential sorting inconsistencies since uppercase and lowercase letters have different ASCII/Unicode values, which affects their sort order. Pre-converting all keys to lowercase creates a consistent comparison basis and makes the sorting behavior more predictable and intuitive to users.

@lumynou5
Copy link
Contributor Author

This prevents potential sorting inconsistencies since uppercase and lowercase letters have different ASCII/Unicode values, which affects their sort order.

The -d/--dictionary-order doesn't sort the entries by character codepoints and ignore the case. You can check the file:

aarch
abbrev
abcdefghijklmnopqrstuvwxyz
acct
AddressSanitizer
adjtime

Lowercase letters doesn't make entries follow ones with uppercase letters (as ASCII); instead, "Ad" is immediately following ac.

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase the latest master for the sake of force-push.

Change-Id: If1289930411b52f2fa419423a00f59f8a6a40287
Change-Id: I38c4db8fe54347734bea02bac7cd82f1b674a0ff
Suggested-by: Jim Huang <jserv@ccns.ncku.edu.tw>
Change-Id: Iff9a9eaed081f340ecbec758e741a3fb66863753
@lumynou5 lumynou5 marked this pull request as ready for review March 18, 2025 01:53
@lumynou5
Copy link
Contributor Author

I marked the PR as ready, so you can merge it if you think it's ok to merge now.

@jserv jserv merged commit 2b54e1e into sysprog21:master Mar 18, 2025
1 of 2 checks passed
@jserv
Copy link
Contributor

jserv commented Mar 18, 2025

Thank @lumynou5 for contributing!

@lumynou5 lumynou5 deleted the extend-dict branch March 18, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants