Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data in fields where HTML characters were double escaped #6158

Closed
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions php/libraries/LorisForm.class.inc
Original file line number Diff line number Diff line change
Expand Up @@ -693,10 +693,10 @@ class LorisForm
}
}
}
// Always sanitize user-controlled input
if (!is_array($newValue)) {
$newValue = htmlspecialchars($newValue);
}
// // Always sanitize user-controlled input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this fix because it removes an important security measure. I understand though that the double-escaping bug has caused data loss which is a severe issue.

I'll see if I can come up with an alternate fix this week. If I can't figure it out then I think we'll have to go ahead with this and find another way to address XSS issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What data loss? Double escaped entries can be restored with a script. It's very unlikely that a patient entered something like & on a form.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to the number of & added several fields got truncated by mySQL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@driusan This was discussed in a group chat on Slack and you can view the details there. <, >, and & seem to be common characters used by clinicians when entering instruments by hand (!) and so there is a need to transcribe them literally into forms.

Since they can't be black-listed, we need to do better at displaying these characters to the front-end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ridz1208 The problem arose because of MySQL 5.1 though right? If CCNA can't/won't upgrade then this might be a project override rather than something that needs to go into core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's mostly the " that are used by clinicians.

and @johnsaigle not sure I see what can be an override. everyone still has the issue, it's just that the other people's data wont get truncated if they are on more recent versions of mysql, they will still have the double escaping issue. and given that this bug existed since LORIS 17, I think we should cover a wide range of versions in our fix

// if (!is_array($newValue)) {
// $newValue = htmlspecialchars($newValue);
// }

return $newValue;
}
Expand Down
280 changes: 280 additions & 0 deletions tools/single_use/fix_double_escape.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
<?php
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<?php
<?php declare(strict_types=1);

/**
* This tool scores any registered instrument that was built using the
* NDB_BVL_Instrument class and that has a working score() method.
* The command line arguments need to contain a valid test_name, a 'all' or 'one'
* option, a CandID and a SessionID.
*
* The 'all' option scores all existing records of an instrument if and only if
* - They belong to an active timepoint
* - The data-entry status of the timepoint is set to `complete`
* - The administration of the timepoint is NOT `none`
*
* Note: This tool can also be used to debug the scoring algorithms for development.
*
* Limitation: This tool does not reset or nullify the score of an instrument
* which was previously scored but no longer meets the criteria above (i.e. if the
* administration was changed from `all` to `none`, the score will not be removed).
*
* @package behavioural
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @package behavioural

Not needed

*/

require_once __DIR__."/../generic_includes.php";

// LOGGING
$dir = __DIR__ . "/../logs/";
if (!is_dir($dir)) {
mkdir($dir);
}
$today = getdate();
$date = strftime("%Y-%m-%d_%H:%M");
$logPath = "$dir/fix_double_escaped_fields_$date.log";
$logfp = fopen($logPath, 'a');

if (!$logfp) {
printError(
"No logs can be generated, path:$logPath ".
"does not exist or can not be written to.\n"
);
}
//PARSE ARGUMENTS
$actions = array(
'report',
'repair',
);
if (!isset($argv[1]) || $argv[1] === 'help' || in_array('-h', $argv, true) || !in_array($argv[1], $actions, true)) {
showHelp();
}
$report = false;
if (isset($argv[1]) && $argv[1] === 'report') {
$report =true;
}
$repair = false;
if (isset($argv[1]) && $argv[1] === 'repair') {
$repair =true;
}
$useHistory =false;
if (in_array('use-history', $argv, true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the logic for these if-statements you should probably throw an error in case too many arguments are passed, for example if someone tries to do both report and repair

$useHistory =true;
}

// DEFINE VARIABLES
// All instruments looked at
$instrumentNames = $DB->pselectCol("SELECT Test_name FROM test_names", array());
// Array of all fields containing any escaped characters
$escapedEntries = array();
// Array of probable truncations based on preg_match ONLY
$probableTruncations = array();
// Array of confirmed truncations based on size of fields and content
$confirmedTruncations = array();
// Boolean flag for identify non-impacted databases and terminating.
$errorsDetected = false;


// FIRST loop just reporting all potential problematic fields
foreach($instrumentNames as $instrumentName) {
printOut("Checking $instrumentName");
try {
$instrument = \NDB_BVL_Instrument::factory($instrumentName);
} catch (Exception $e) {
printError(
"There was an error instantiating instrument $instrumentName.
This instrument will be skipped."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to print the stack trace of the exception

);
continue;
}
$instrumentCIDs = $DB->pselectCol(
"SELECT CommentID FROM flag WHERE Test_name=:tn",
array("tn" => $instrumentName)
);
foreach ($instrumentCIDs as $cid) {
$instrumentInstance = \NDB_BVL_Instrument::factory($instrumentName, $cid);

$instrumentData = \NDB_BVL_Instrument::loadInstanceData($instrumentInstance);
$set = array();

// Go through all fields and identify which have any escaped characters
foreach ($instrumentData as $field => $value) {
// regex detecting any escaped character in the database
if (preg_match('/&(amp;)+(gt;|lt;|quot;|amp;)/', $value)) {
$escapedEntries[$instrumentName][$cid][$field] = $value;
$errorsDetected = true;
}
if (preg_match('/&(amp;)*[a|g|l|q](m|t|u)?(p|o|;)?(t|;)?(;)?$/', $value)) {
// This checks for signs of truncation, it matches any string that
// ENDS with either a complete or incomplete escaped expression.
// THIS DOES NOT GUARANTEE that other fields are not truncated as
// truncation could occur even if field does not end with the escaped
// characters.
$probableTruncations[$instrumentName][$cid][$field] = $value;
}
}
}
}

// A list of all escaped characters has been built. Forloop through and start fixing.

//SECOND loop, depending on flags, report or fix values.
if ($errorsDetected) {
//TODO CODE FOR CHECKING FIELD MAX LENGTH AND COMPARING TO ACTUAL LENGTH IN ORDER TO BUILD THE CONFIRMEDTRUNCATIONS array defined above

if ($useHistory) {
// TODO CODE FOR CHECKING HISTORY TABLE AND BUILDING BETTER REPAIR SUGGESTIONS
}

if ($report) {
printOut(
"Below is a list of all entries in the database instruemnts which ".
"contain escaped characters"
);
print_r($escapedEntries);
print_r("\n\n");
printOut(
"Below is the list of truncations automatically detected. This ".
"list might not be exhaustive, truncation can occur without being ".
"automatically detected by this script."
);
print_r($confirmedTruncations);
printOut(
"Below are the suggested repairs automatically generated from ".
"latest values in the fields."
);

if ($useHistory) {
printOut(
"Due to possible truncation of data, some suggestions are ".
"derived from the history table of the database."
);
}
} elseif ($repair) {
printOut("Attempting repairs");
}



foreach ($escapedEntries as $instrumentName=>$cids) {
printOut("Opening $instrumentName");
foreach ($cids as $cid => $fields) {
try {
$instrumentInstance = \NDB_BVL_Instrument::factory($instrumentName, $cid);
} catch (Exception $e) {
printError(
"There was an error instantiating instrument $instrumentName for " .
"$cid.This instrument will be skipped."
);
continue;
}
foreach ($fields as $field => $value) {

if ($useHistory && array_key_exists($field, $confirmedTruncations[$instrumentName][$cid])) {
//TODO ADD CODE TO REPLACE THE VALUE BY THE DATA FROM THE HISTORY TABLE IF TRUNCATION IS CONFIRMED
johnsaigle marked this conversation as resolved.
Show resolved Hide resolved
}

// Each of the expressions below uniquely match each of the targeted
// characters indicated in the comment above the function.

// < : match any substring starting with `&`
// followed by 1 or more `amp;` and ending with `lt;`
$newValue = preg_replace('/&(amp;)+lt;/', '<', $value);
// > : match any substring starting with `&`
// followed by 1 or more `amp;` and ending with `gt;`
$newValue = preg_replace('/&(amp;)+gt;/', '>', $newValue);
// " : match any substring starting with `&`
// followed by 1 or more `amp;` and ending with `quot;`
$newValue = preg_replace('/&(amp;)+quot;/', '"', $newValue);
// & : match any substring starting with `&`
// followed by 2 or more `amp;` (because 1 is normal in the database
// since it is the escaped form of `&`) and
// NOT ending with `lt;` or `gt;` or `quot;` or `amp;`
// (the last one is to ensure we don't match subsequences from the
// case above).
$newValue = preg_replace('/&(amp;){2,}(?!(lt;|gt;|quot;|amp;))/', '&', $newValue);


if (!empty($value) && !empty($newValue) && $newValue != $value) {
printOut(
"CommentID: $cid - Value at $field will be modified. " .
"\n\tCurrent Value: $value" .
"\n\tWill be replaced by: $newValue\n"
);
$set[$field] = $newValue;
}
}
if (!empty($set) && $repair) {
$instrumentInstance->_save($set);
}
}
}

if (!$repair) {
printOut("\nRun tool again with `repair` argument to apply changes");
}
} else {
printOut("No errors have been detected. End !");
}

fclose($logfp);


foreach ($probableTruncations as $i=>$cids) {
foreach ($cids as $cid=>$fields) {
foreach ($fields as $field=>$value) {
//check history

}
}
}

/*
* Prints to log file
*/
function logMessage($message)
{
global $logfp;
if (!$logfp) {
//The log file could not be instantiated
//use print instead
print_r($message);
}
$now_string = strftime("%Y-%m-%d %H:%M:%S");
fwrite($logfp, "[$now_string] $message\n");

}

/*
* Prints to STDERR
*/
function printError($message)
{
logMessage($message);
fwrite(STDERR, "$message \n");
}

/*
* Prints to STDOUT
*/
function printOut($message)
{
logMessage($message);
print_r("$message\n");
}

function showHelp()
{
echo "\n\n*** Fix Double Escaped Fields ***\n\n";

echo "Usage:
fix_double_escape.php [help | -h] -> displays this message
fix_double_escape.php report [use-history] -> runs tool and reports erroneous data and potential fixes
without making any changes. (with use-history flag, proposes
data repairs by looking through the instrument's historical
entries in the LORIS history table)
fix_double_escape.php repair [use-history] -> runs tool and rectifies erroneous data when possible (with
use-history flag, repairs data by looking through the
instrument's historical entries in the LORIS history table)

Note: Using the `use-history` flag will considerably slow down the script.
\n\n";

die();
}