Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve viewFeedback output parsing, error robustness #1913

Merged
merged 9 commits into from
Jun 14, 2023

Conversation

damianhxy
Copy link
Member

@damianhxy damianhxy commented Jun 2, 2023

Summary

Summary generated by Reviewpad on 02 Jun 23 11:14 UTC

This pull request modifies app/controllers/assessments_controller.rb and app/views/assessments directory. It adds a TODO comment in a method and refactors another method to use rstrip instead of chomp. Moreover, it modifies app/views/assessments/_results_panel.html.erb to handle score values that cannot be parsed and app/views/assessments/_semantic.html.erb to check if @scoreHash is nil before accessing it.

Description

  • Strip trailing lines before parsing score and feedback on viewFeedback page
  • Rename valid_json? to valid_json_hash?, add check that we parsed a hash
  • Cast scores to float, add error handling
  • Hide Score section for semantic feedback if score JSON could not be parsed

Motivation and Context

Issue 1: If the Autograder output has trailing newlines, semantic feedback options / score JSON are not parsed.
Consequence: Semantic feedback is not triggered when it should be, and results panel does not show all scores (it falls back to showing the current problem's score)

Issue 2: valid_json? does not check that we parsed a hash
Consequence: A scalar like 2 or true would pass the check, and we get an error when we try to perform hash operations on it afterwards.

Issue 3: Insufficient casting / error handling for scores
Consequence: Scores in strings are not opportunistically cast to floats. Scores of non-numeric types prevent the page from loading.

Fixes #1718
Fixes #1877
Incorporates cg2v@f112dba (with a bugfix)

How Has This Been Tested?

Import the following assessment: randomlab_20230602.tar.zip

For each of the following tests, you can submit any file, it simply assigns 6 random scores from 0 to 99.

For each test, the new output code will be provided. Update driver.sh and recompile autograde.tar accordingly (e.g. expand autograde.tar, edit random/driver.sh, run gtar -cvf autograde.tar random), and upload it to the assessment. gtar is gnu-tar, available via brew.

Test 1: Normal Case

Before + After
Screenshot 2023-06-02 at 18 32 38

Test 2: Semantic UI + Trailing new lines

echo "{\"_presentation\": \"semantic\"}"
echo "{\"scores\": {\"Problem 1\": $rand1, \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"
echo ""

Before
Screenshot 2023-06-02 at 18 39 40
After
Screenshot 2023-06-02 at 18 40 23

Test 3: Scalars

echo "2"
echo "{\"scores\": {\"Problem 1\": $rand1, \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"

Before
Screenshot 2023-06-02 at 18 42 24
After
Screenshot 2023-06-02 at 18 42 56

Test 4: Score JSON with strings

echo "{\"scores\": {\"Problem 1\": \"$rand1\", \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"

Before
Screenshot 2023-06-02 at 18 44 45

After
Screenshot 2023-06-02 at 18 44 20

Test 5: Score JSON with non-numeric

echo "{\"scores\": {\"Problem 1\": true, \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"

Before
Screenshot 2023-06-02 at 18 45 36
After
Screenshot 2023-06-02 at 18 47 22

Test 6: Invalid Score JSON

echo "{\"scores\": {\"Problem 1\": invalid, \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"

Before + After
Screenshot 2023-06-02 at 18 49 54

Test 7: Score JSON with non-numeric + Semantic UI

echo "{\"_presentation\": \"semantic\"}"
echo "{\"scores\": {\"Problem 1\": true, \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"

Before
Screenshot 2023-06-02 at 18 53 07
After
Screenshot 2023-06-02 at 18 53 30

Test 8: Invalid Score JSON + Semantic UI

echo "{\"_presentation\": \"semantic\"}"
echo "{\"scores\": {\"Problem 1\": invalid, \"Problem 2\": $rand2, \"Problem 3\": $rand3, \"Problem 4\": $rand4, \"Problem 5\": $rand5, \"Problem 6\": $rand6}}"

Before
Screenshot 2023-06-02 at 18 55 51
After
Screenshot 2023-06-02 at 18 55 28

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have run rubocop for style check. If you haven't, run overcommit --install && overcommit --sign to use pre-commit hook for linting
  • My change requires a change to the documentation, which is located at Autolab Docs
  • I have updated the documentation accordingly, included in this PR

Other issues / help required

Currently, the viewFeedback page re-parses the score JSON, which could display the wrong score if the parseAutoresult or modifySubmissionScores hooks were used by the lab author.

In another PR, we ought to make use of the scores directly, rather than parsing them again.

@reviewpad reviewpad bot added the small Pull request is small label Jun 2, 2023
@damianhxy damianhxy marked this pull request as ready for review June 2, 2023 10:59
@reviewpad reviewpad bot requested a review from evanyeyeye June 2, 2023 11:00
@damianhxy
Copy link
Member Author

damianhxy commented Jun 2, 2023

@cg2v This PR incorporates / addresses two of the patches you apply to CMU prod, I was wondering if you had any thoughts on this?

For your patch on JSON scalars, it seems like the check does not work since hash is not defined. Here, I check the return value of JSON.parse instead.

For your patch on not parsing semantic feedback unless long, it is no longer needed since this PR performs the rstrip beforehand, and no longer calls chomp after indexing into lines.

Also, if there are any other patches you would like to merge upstream, I'd be happy to take a look at them! Some of them look really useful, such as the one preventing the scheduler race condition, workaround for visual run, and filtering out of sensitive variables.

@cg2v
Copy link
Member

cg2v commented Jun 2, 2023

#1463 changed the context in valid_json? since I first wrote that and I didn't notice. I am surprised that it has not caused any exceptions. This change is fine.

The problem with the short or empty output wasn't that nil was being fed to chomp, but that the array offset lines.length-2 could be negative

@damianhxy
Copy link
Member Author

The problem with the short or empty output wasn't that nil was being fed to chomp, but that the array offset lines.length-2 could be negative

From my understanding, in ruby, nil is returned for an out of range access, so a negative offset should be fine

@damianhxy damianhxy requested a review from jlge June 5, 2023 17:20
@damianhxy damianhxy changed the title Improve viewFeedback page error handling Improve viewFeedback output parsing, error robustness Jun 8, 2023
Copy link
Contributor

@20wildmanj 20wildmanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through testing scheme, everything works according to guide, LGTM

Sidenote: seems like when true gets fed into autograding output, although the parse failure error happens, the score given for that problem is 1, not 0 (probably b/c true casts to 1?)

@damianhxy
Copy link
Member Author

Sidenote: seems like when true gets fed into autograding output, although the parse failure error happens, the score given for that problem is 1, not 0 (probably b/c true casts to 1?)

Discussed offline - basically a side effect of #1914. Since we are re-parsing scores instead of making use of the final problem score, this is due to differences between parseScore in assessments_controller (cast to float using to_f) and saveAutograde in assessment_autograde_core (saved directly to db, which apparently interprets it as 1).

@damianhxy damianhxy added this pull request to the merge queue Jun 14, 2023
Merged via the queue into master with commit 5e33614 Jun 14, 2023
@damianhxy damianhxy deleted the semantic-feedback-fixes branch June 14, 2023 17:53
@damianhxy damianhxy removed request for evanyeyeye and jlge June 15, 2023 09:29
NicholasMy pushed a commit to UB-CSE-IT/Autolab that referenced this pull request Jan 3, 2024
* rstrip feedback in pareScore and parseFeedback

* Validate that feedback is a hash

* Cast scores to floats, add error messages

* Hide scores if score JSON can not be parsed

* Handle booleans in score hash

* Add TODO

* Update comment

* Remove unnecessary TypeError

* Handle non-numeric score in result sidebar

(cherry picked from commit 5e33614)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
small Pull request is small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve score JSON parsing Chomp newlines for semantic feedback
3 participants