Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't assume good/badwords files are utf8-encoded #1267

Merged
merged 1 commit into from
Oct 18, 2023

Conversation

windymilla
Copy link
Collaborator

Apparently, they can be "Latin-1" - previous work assumed utf8.

Apparently, they can be "Latin-1" - previous work assumed utf8.
@windymilla windymilla requested review from cpeel and srjfoo October 17, 2023 19:32
Copy link
Member

@cpeel cpeel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we rolled out Unicode we converted all good & bad word files to UTF-8 and that's how Apache serves them up as well.

I'd really like to know where these non-UTF-8 files are coming from, because there's something wrong somewhere and it isn't with Guiguts.

@windymilla
Copy link
Collaborator Author

Sharon just explained to me that the goodwords were recreated, but the zip that the PPer downloads were not. This zip dates from 2018. I checked the goodwords file itself directly downloaded from the project page (last modified May 2020, presumably when it was converted to utf8) and it is utf8. So, it's just the old zips that could contain non-utf8 goodwords files. GG used to cope with that before my recent assumptive "improvements".

Copy link
Member

@cpeel cpeel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - thank you for that clarification!

@windymilla
Copy link
Collaborator Author

It's thanks to @srjfoo of course, very much on the ball! Thanks too, Casey, for checking up the explanation.

@windymilla windymilla linked an issue Oct 17, 2023 that may be closed by this pull request
@srjfoo
Copy link
Member

srjfoo commented Oct 17, 2023

Looking at the project history, the project was checked out at the time. It was just recently returned to the pool, whereupon Charlie checked it out. I don't remember for sure, but I don't think we regenerated files for projects that were checked out for PP, did we?

@windymilla windymilla merged commit 0df3bab into DistributedProofreaders:master Oct 18, 2023
1 check passed
@windymilla windymilla deleted the gw-unicode branch October 18, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug or need a way to use the error messages shown below
3 participants