Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-ascii feedback #1384

Merged
merged 1 commit into from
Sep 20, 2021
Merged

Support non-ascii feedback #1384

merged 1 commit into from
Sep 20, 2021

Conversation

fanpu
Copy link
Contributor

@fanpu fanpu commented Sep 20, 2021

Description

See #975, issue recently resurfaced again with one of the courses in CMU

Motivation and Context

Confusing for students when no feedback is returned

How Has This Been Tested?

Test with the following file:

#include <stdio.h>
int main() {
  // non-ascii semicolon
  puts("hello");
}

This will cause the compiler to output non-ASCII output as an error. With this PR Autolab will be able to process and return the feedback successfully.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation, which is located at Autolab Docs
  • I have updated the documentation accordingly, included in this PR

Other issues / help required

If unsure, feel free to submit first and we'll help you along.

Copy link
Contributor

@xinyis991105 xinyis991105 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested by submitting hello.c with a non-ASCII character. Autograding didn't hang. The output is as expected, indicating the error. Thanks for the quick fix!

@fanpu fanpu merged commit 13bbbdb into master Sep 20, 2021
@fanpu fanpu deleted the feedback-file-nonascii-fix branch September 20, 2021 23:36
@cg2v
Copy link
Member

cg2v commented Sep 21, 2021

Why are both changes needed? It seems that either change alone would allow output that's encoded as UTF-8 to be written to the file. It's not obvious to me what the force_encoding change will do to output that has stray binary characters in it (I've seen this in proxylab logs)

@fanpu
Copy link
Contributor Author

fanpu commented Sep 21, 2021

With just the "wb" change, the following exception is still thrown from the write call:

Exception in autograde_done: Encoding::UndefinedConversionError ("\xEF" from ASCII-8BIT to UTF-8)
Completed 204 No Content in 159ms (ActiveRecord: 4.1ms)

You're right though that just the force_encoding change would be sufficient

@fanpu
Copy link
Contributor Author

fanpu commented Sep 21, 2021

It's not obvious to me what the force_encoding change will do to output that has stray binary characters in it (I've seen this in proxylab logs)

I'll keep that in mind, but since Autolab currently would not even be able to return anything, even if they do see gibberish UTF8 logs (I'm assuming this is due to bugs or logging binary content that their proxy is serving) I think that's fine.

@cg2v
Copy link
Member

cg2v commented Sep 24, 2021

I think I'm going to go with this instead:

    feedback.force_encoding("UTF-8")
    if not feedback.valid_encoding?
       feedback.force_encoding("Windows-1252")
       hexify = Proc.new {|c| "\\x" + c.bytes[0].to_s(16) }
       feedback = feedback.encode("UTF-8", invalid: :replace, fallback: hexify)

Which will replace nulls and other non-iso characters with \xNN, but not all 8 bit chars.

The question is do people prefer seeing

\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98
or
Œ\x8dŽ\x8f\x90‘’“”•–—˜

Different situations might want different results, but we can't really tell from the content

@fanpu
Copy link
Contributor Author

fanpu commented Sep 24, 2021

In most cases I would say the former, I'll submit another PR with your fix which I agree is better. Thanks so much!

@cg2v
Copy link
Member

cg2v commented Sep 24, 2021

if you want all hex, then change feedback.force_encoding("Windows-1252") to feedback.force_encoding("ASCII-8BIT")

@cg2v
Copy link
Member

cg2v commented Sep 24, 2021

Also note that I had python on the brain when I wrote that and it is missing an end

ugogon pushed a commit to ugogon/Autolab that referenced this pull request Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants