-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation for all string input detecting unicode/XML "gremlins" #242
Comments
Do we need to put this into the Guidelines for Partners sending us XML files? |
@zzgvh any ideas on this? Should we add validation or add documentation or both? |
Currently there is a script, https://github.com/akvo/akvo-rsr/blob/develop/akvo/api/xml_char_check.py, that queries the API for all projects, organisations and updates in XML format and reports if any objects that fail to render. I think it needs some small fixing to run, but it gets the job done. We could set it up to run once per X and send a message if it finds problems. But in the long run we probably should do something since we're going to output more and more XML over time. This raises a number of questions. When do we check? On input or on output. What do you do if you detect a gremlin? Delete it? Tell the user? Both? |
Ah, I think I understand this a lot more now, I was thinking this was input API, but it's input Admin and output API that causes this. I suggest that this should be added to input validation on save in the Admin. Makes sense to prevent invalid characters from entering the database. As we move to more API based services and solutions, having a standard and accepted character set will become increasingly important. Is this possible to add to the Admin Validation? (Moving this to the 2.3.6 Release as it's potentially a larger issue) |
It should be possible and it might be fairly easy, but will need more investigation to say for sure. I'm guessing the way to do it is to override the base validator for text fields so that every text field gets tested for gremlins. That the same validator can then be used for text coming in through the API (which isn't a problem for XML coming that way but might be for JSON data). |
[#242] Added string validation for allowed XML characters
Merged in #531 |
Test planGIVEN a text field in the RSR admin |
@KasperBrandt @zzgvh I could be wrong here (or it might be handled elsewhere) but aren't &, <, >, ", and ' all invalid XML characters? The admin seems to have no issue with saving these down at present |
@rumca @adriancollier Not sure if we should disallow these characters, because they pose no problem. They are encoded using utf-8, e.g. '<' is stored as '<', but in the actual HMTL / XML it will display However, other weird chars (such as �) cannot be displayed by XML in any way and make the XML invalid. These are currently filtered out. |
OK, it's my opinion that we should only restrict characters that will cause a problem in the Export and Import functions to XML, so if this works then we should leave it. |
Yep, just tested the XML Export and regular project page when inserting |
There are a number of unicode control characters that aren't allowed in XML. This creates a headache when text is copied into the admin forms, for example to the text fields in the Project and Organisation (where we've seen this problem occur)(I strongly suspect M$ word et al in this case 😛) when the data is later requested through the API in XML format.
To fix this we need proper validation of all string input. The place for this is in the validators that are called when an admin form is saved and which can be called when inputting data through the API.
A couple of references:
http://en.wikipedia.org/wiki/Valid_characters_in_XML
http://stackoverflow.com/questions/397250/unicode-regex-invalid-xml-characters
Older issue covering the same ground: #189
The text was updated successfully, but these errors were encountered: