Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode support #51

Merged
merged 1 commit into from
Jul 6, 2023
Merged

Fix unicode support #51

merged 1 commit into from
Jul 6, 2023

Conversation

Bluesy1
Copy link
Collaborator

@Bluesy1 Bluesy1 commented Jul 6, 2023

Currently, any unicode characters get encoded into messy ascii escapes, this method of fixing it shouldn't break anything as the tests pass with no changes to them.

This was noticed in open-resources/instructor_datascience_bank#152, where some unicode box symbols were being converted to their ascii escape sequences, which do not look nice when shown to a user.

Resolves #8

@Bluesy1 Bluesy1 requested a review from firasm July 6, 2023 01:08
@Bluesy1 Bluesy1 changed the title Theoretically fix unicode support Fix unicode support Jul 6, 2023
@Bluesy1 Bluesy1 added the bug Something isn't working label Jul 6, 2023
@@ -87,11 +87,9 @@ def parse_body_part(pnum, md_text):

# Store the content of the level 2 header
try:
content = codecs.unicode_escape_decode(
Copy link
Contributor

@firasm firasm Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was a reason why I had added this (it wasn't done this way initially)...let me try to figure out why this was done this way, I don't want to break other questions by making this change.

Copy link
Collaborator Author

@Bluesy1 Bluesy1 Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff mode on GitHub said it was to support LateX, but that doesn't make too much sense to me, since the tests questions that use LaTeX seem to be fine with the changes here

Edit: see this commit: 131ce85#diff-ec4c87e6a250f788996aabbdcca263c77e266b61454503016d38f2e678fdd945R161

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - so basically we are now allowing unicode characters in questions. Hopefully that fixes questions like this one: https://ca.prairielearn.com/pl/course/38/question/237974/preview

@firasm firasm self-requested a review July 6, 2023 18:48
@firasm
Copy link
Contributor

firasm commented Jul 6, 2023

Fingers crossed ... hopefully this doesn't have downstream effects that will show up in Term 2 next year.

@firasm firasm merged commit 87021de into main Jul 6, 2023
@Bluesy1
Copy link
Collaborator Author

Bluesy1 commented Jul 6, 2023

Fingers crossed ... hopefully this doesn't have downstream effects that will show up in Term 2 next year.

Hopefully - I really don't see what point decoding unicode has, and it seems to have been added to target LaTeX support, if this causes an issue hopefully some of the questions added this summer will find it, since a decent variance of symbols should be used, and maybe there's some strange symbol that this will break on.

@borna471
Copy link

borna471 commented Jul 7, 2023

So do I just 'git pull' on the branch where I was having this issue so the unicode characters are supported and appear properly now?

@firasm
Copy link
Contributor

firasm commented Jul 7, 2023

I think I'll need to manually run the script to regenerate questions on opb000 and then it should fix.

Locally, you should update problem_bank_scripts to see if it fixes the issue.

@Bluesy1 Bluesy1 deleted the fix-unicode-in-problems branch July 7, 2023 00:27
@borna471
Copy link

borna471 commented Jul 7, 2023

Ok sounds good, updated those and now those questions are good to go, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Script doesn't handle unicode characters correctly
3 participants