-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSONObject: tabs should not be allowed #240
Comments
Page 5, Figure 5 of EMCA-404 indicates that a String can contain any valid code point except " (quote), \ (reverse solidus), or a control character. I believe that most systems do not consider \t tab to be a control character |
For what it's worth, http://JSONLint.com seems to agree that tabs should not appear in strings. |
well, I'll say at best the document is inconsistent. I'd think this would be a possible breaking change for existing users though. |
Neither spec is especially well-written, and this particular item is already prone to confusion because a tab can either be represented as an escape sequence or as the character itself. My reading is that tabs are allowed in strings only when they are represented by the char sequence "\t". However, I am not inclined to fix this since the bug seems unlikely to likely to break applications, but the fix might. Other opinions? |
I'm OK with that. Maybe just place an Errata in the README? |
Also, RFC 7159 page 8 specifies a string to be the same as the flowchart#5 indicates on page 5 of EMCA-404. So I'm OK with leaving it as it is. |
gah, nevermind again. I see this in RFC7159: All Unicode characters may be placed within the My reading skills need help today. Tab is U+0009, so it's defined as a control character in the spec. |
ok, in light that RFC7159 is more explicit and that we already output the tabs properly (https://github.com/stleary/JSON-java/blob/master/JSONObject.java#L1435), I think I may be in favor of throwing an exception when reading JSON with invalid control characters as defined in RFC7159. It will break backwards compatibility though. |
I think that even if the tab is not allowed based on the spec, being permissive when parsing although being strict when stringifying should be the best way. |
I'd go with @brunoais suggestion and just leave it as-is. Possibly make a note in the README/FAQ to indicate that reading is slightly more lax about control characters than writing. I believe a new-line (U+000A) is still not allowed for reading, but other control characters should be fine (even possibly U+0000). Perhaps also make some test cases to confirm behaviour. Also of note: |
Good plan, I will update the README and FAQ, probably this weekend (busy last week of sprint). If anyone gets to it before me, feel free to open a pull request. |
Love the activity! Thanks everyone for their feedback! |
see PR #242 |
also, for a full test case on what control characters are allowed, see stleary/JSON-Java-unit-test#50 in summary, all control characters are allowed except \0, \n, and \r. |
Please add future comments to the pull request #242. |
Tests for Issue stleary#240 in JSON-Java
If I'm understanding the JSON standards correctly, tabs are not allowed. I'm posting a json to my api (the whitespace at the end of the value is a tab character). This is what's happening in the background:
String string = {"key":"value "};
JSONObject json;
try {
json = new JSONObject(string);
} catch (JSONException e) {
}
the assignment above should not succeed, but it does.
you can use this sample project to see for yourself
https://github.com/mmoghadas/gs-rest-service/blob/master/complete/src/main/java/hello/GreetingController.java#L26
git clone git@github.com:mmoghadas/gs-rest-service.git
cd gs-rest-service/complete
./gradlew build
java -jar build/libs/gs-rest-service-0.1.0.jar
http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
Insignificant whitespace is allowed before or after any token. The whitespace characters are: character
tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.
The text was updated successfully, but these errors were encountered: