Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data in fields where HTML characters were double escaped #6158

Closed

Conversation

ridz1208
Copy link
Collaborator

@ridz1208 ridz1208 commented Mar 13, 2020

Brief summary of changes

This fixes the issue where HTML special characters were escaped twice before being saved causing every save to re-escape the & of the previously escaped values.

The problematic characters are &><"

Example, if an instrument field contains a >:
after first save value becomes &amp;gt;
after second save &amp;amp;amp;gt;

Causing some values to look like this on projects running cronjobs nightly
&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;

Testing instructions (if applicable)

  1. before checking out this branch, go to an instrument and save any of the characters listed above.
  2. reload the page
  3. [optional] save multiple times (make sure to reload the page before re-saving because values do not get refetched from the database automatically in instruments)
  4. checkout this branch, and test out the script.
  5. end result should show normal characters on the front end. Database values should still be escaped but without duplication

Link(s) to related issue(s)

  • Resolves # (Reference the issue this fixes, if any.)

@ridz1208 ridz1208 added Category: Bug PR or issue that aims to report or fix a bug State: Discussion required PR or issue that requires the resolution of a discussion with the relevant parties to proceed Critical to release PR or issue is key for the release to which it has been assigned Priority: Projects labels Mar 13, 2020
@ridz1208
Copy link
Collaborator Author

@driusan discussion required for :

  1. where to fix the issue ? I just commented out the first escaping in the lorisform class. the sesond escaping occurs in the update() function of the database class. but i think that one should stay there
  2. which branch to submit this to.

@ridz1208
Copy link
Collaborator Author

@sruthymathew123 @nicolasbrossard can you check if any of the IBIS data contains these characters ?

running the tool without any arguments should find such fields

@ridz1208 ridz1208 force-pushed the 2020_03_15_double_escaped_fix branch from 0de2dfa to 5296092 Compare March 15, 2020 14:22
if (!is_array($newValue)) {
$newValue = htmlspecialchars($newValue);
}
// // Always sanitize user-controlled input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this fix because it removes an important security measure. I understand though that the double-escaping bug has caused data loss which is a severe issue.

I'll see if I can come up with an alternate fix this week. If I can't figure it out then I think we'll have to go ahead with this and find another way to address XSS issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What data loss? Double escaped entries can be restored with a script. It's very unlikely that a patient entered something like &amp; on a form.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to the number of &amp; added several fields got truncated by mySQL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@driusan This was discussed in a group chat on Slack and you can view the details there. <, >, and & seem to be common characters used by clinicians when entering instruments by hand (!) and so there is a need to transcribe them literally into forms.

Since they can't be black-listed, we need to do better at displaying these characters to the front-end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ridz1208 The problem arose because of MySQL 5.1 though right? If CCNA can't/won't upgrade then this might be a project override rather than something that needs to go into core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's mostly the " that are used by clinicians.

and @johnsaigle not sure I see what can be an override. everyone still has the issue, it's just that the other people's data wont get truncated if they are on more recent versions of mysql, they will still have the double escaping issue. and given that this bug existed since LORIS 17, I think we should cover a wide range of versions in our fix

Copy link
Contributor

@johnsaigle johnsaigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some research in earlier PRs and some basic testing, I think we can go ahead with this fix:

The htmlspecialchars() line was added in an earlier phase of LORIS. Right now, the extensive use of strict typing prevents many injection points that this previously addressed. Additionally, the use of ReactJS provides sanitization of output by default on many data points. Finally, HTMLEscapeArray in the DB class ensures that almost all data going into the database is escaped.

Removing this line may slightly increase the risk of XSS attacks but I think the risk is small and outweighed by the potential for data loss that it creates. It was always a bit of a hack and I think it's one that we can probably do without now.

Beyond that, I have some comments + questions.

Also, can you please run PHPCS against this? You would do so by adding a line in test/run-php-linter.sh.

@@ -0,0 +1,280 @@
<?php
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<?php
<?php declare(strict_types=1);

* which was previously scored but no longer meets the criteria above (i.e. if the
* administration was changed from `all` to `none`, the score will not be removed).
*
* @package behavioural
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @package behavioural

Not needed

$repair =true;
}
$useHistory =false;
if (in_array('use-history', $argv, true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the logic for these if-statements you should probably throw an error in case too many arguments are passed, for example if someone tries to do both report and repair

} catch (Exception $e) {
printError(
"There was an error instantiating instrument $instrumentName.
This instrument will be skipped."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to print the stack trace of the exception

tools/single_use/fix_double_escape.php Outdated Show resolved Hide resolved
@johnsaigle johnsaigle added the Category: Security PR or issue that aims to improve security label Mar 17, 2020
Comment on lines 154 to 159
$query = "SELECT primaryVals as CommentID, col, tbl, new, changeDate
FROM history
WHERE primaryCols='CommentID' AND
primaryVals IN ($listCIDs) AND
tbl IN ($listTables) AND
col IN ($listFields)";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this query would work, lets say

  • commentID XYZ has an issue with instrument A.
  • commentID WXY has an issue with instrument B.

Then your results will include

  • CommentId XYZ instrument B
  • Commentid WXY instruemnt A

Both of which might not be problematic. I think this query needs to be exact and exhaustive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the query be exact is very costly and I wasn't actually able to get results because the server kept disconnecting...
This way, the query is not as costly and we do most of the heavy lifting in the script itself by comparing to the array of confirmedTruncations

I know you mentioned it might be an issue with resourcing. Perhaps we can discuss offline and with IT and try to see if an exact query will work when more resources are placed on the history DB.

Comment on lines 166 to 168
// Reverse the array obtained from history data so that it is in anti-chronological order
// Does not preserve keys, assuming that the keys are indexes & not relevant info
$historyData = array_reverse($result, false);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesscall there is no garantee that the array returned in in chronological order. you should instead use an order by in the query itself

@@ -152,22 +216,28 @@



foreach ($escapedEntries as $instrumentName=>$cids) {
printOut("Opening $instrumentName");
foreach ($escapedEntries as $tableName=>$cids) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jesscall replacing instrument_name with tablename might be problematic since you need the actual instrument name to instanciate the instrument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, i see...
the reasoning for using table name in certain cases is very specific to how CCNA has set up Follow-Up instruments.
I can make adjustments so that tableName is used specifically for constructing the queries to information_schema and history. However, I don't think the logic for doing so is apparent in the Core-LORIS use case, will adding a comment that indicates a project-specific choice be sufficient ?

Copy link
Collaborator Author

@ridz1208 ridz1208 Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think there are project -specific stuff here. Every LORIS project might be using different instrument and table names. there is no reason here preventing you from associating each instruemnt to a table and use that mapping to instantiate them properly

its just a question of what to use where.

@johnsaigle johnsaigle removed the State: Discussion required PR or issue that requires the resolution of a discussion with the relevant parties to proceed label Mar 23, 2020
@jesscall jesscall added the State: Needs work PR awaiting additional work by the author to proceed label Mar 24, 2020
@jesscall
Copy link
Contributor

still working on this. trying to see what i can do about the query to history, might potentially be able to rework the logic of the script entirely, depending if my query works

@ridz1208 i'll let you know when its ready for testing

@jesscall jesscall removed the State: Needs work PR awaiting additional work by the author to proceed label Mar 24, 2020
@jesscall
Copy link
Contributor

@ridz1208 prod had 500 error because of the query that was being run... i think we should keep the script as is, even though the query is returning more data than needed. We can parse through it on PHP side and not put too much strain on the DB and server which has limited memory.

Can you do a prelim code review and let me know if there's anything else i should change or improve?

@ridz1208
Copy link
Collaborator Author

@sruthymathew123 @cmadjar

I have added to this PR a "lite" version of the tool designed to only report on issues. this script can do the checking using object instanciation or raw SQL queries. the raw SQL way obviously does not allow you to have a database table name different then the instrument name so it might omit some instruments in your database.

if you decide to use the SQL way to avoid object instanciation errors, you can use the instrument_double_escape_report.php script as follows

php instrument_double_escape_report.php use-database

let me know if it detects anything.

Copy link
Contributor

@PapillonMcGill PapillonMcGill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the code attentively, haven't run it but look Ok.
LGTM

Comment on lines +185 to +187
if (array_key_exists($field, $confirmedTruncations[$tbl][$cid]) &&
strlen($val) < $maxFieldLengths[$tbl][$field] &&
!isset($untruncatedValues[$tbl][$cid][$field])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (array_key_exists($field, $confirmedTruncations[$tbl][$cid]) &&
strlen($val) < $maxFieldLengths[$tbl][$field] &&
!isset($untruncatedValues[$tbl][$cid][$field])
if (isset($val) &&
isset($confirmedTruncations[$tbl][$cid][$field]) &&
strlen($val) < $maxFieldLengths[$tbl][$field]['CHARACTER_MAXIMUM_LENGTH'] &&
!isset($untruncatedValues[$tbl][$cid][$field])

Comment on lines +240 to +241
array_key_exists($field, $confirmedTruncations[$tableName][$cid]) &&
array_key_exists($field, $untruncatedValues[$tableName][$cid])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
array_key_exists($field, $confirmedTruncations[$tableName][$cid]) &&
array_key_exists($field, $untruncatedValues[$tableName][$cid])
isset($confirmedTruncations[$tableName][$cid][$field]) &&
isset($untruncatedValues[$tableName][$cid][$field])

}
if (!empty($set) && $repair) {
$instrumentInstance->_save($set);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
unset($set);

@ridz1208
Copy link
Collaborator Author

Bugfix sent to 22 with reported script
bugfix also sent to 23

PRs below

#6223
#6477

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Bug PR or issue that aims to report or fix a bug Category: Security PR or issue that aims to improve security Critical to release PR or issue is key for the release to which it has been assigned
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants