Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONObject: tabs should not be allowed #240

Closed
mmoghadas opened this issue Jun 14, 2016 · 15 comments
Closed

JSONObject: tabs should not be allowed #240

mmoghadas opened this issue Jun 14, 2016 · 15 comments

Comments

@mmoghadas
Copy link

mmoghadas commented Jun 14, 2016

If I'm understanding the JSON standards correctly, tabs are not allowed. I'm posting a json to my api (the whitespace at the end of the value is a tab character). This is what's happening in the background:

String string = {"key":"value "};
JSONObject json;
try {
json = new JSONObject(string);
} catch (JSONException e) {
}

the assignment above should not succeed, but it does.
you can use this sample project to see for yourself
https://github.com/mmoghadas/gs-rest-service/blob/master/complete/src/main/java/hello/GreetingController.java#L26

git clone git@github.com:mmoghadas/gs-rest-service.git
cd gs-rest-service/complete
./gradlew build
java -jar build/libs/gs-rest-service-0.1.0.jar

http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
Insignificant whitespace is allowed before or after any token. The whitespace characters are: character
tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.

@johnjaylward
Copy link
Contributor

johnjaylward commented Jun 14, 2016

Page 5, Figure 5 of EMCA-404 indicates that a String can contain any valid code point except " (quote), \ (reverse solidus), or a control character. I believe that most systems do not consider \t tab to be a control character

@stleary
Copy link
Owner

stleary commented Jun 15, 2016

For what it's worth, http://JSONLint.com seems to agree that tabs should not appear in strings.

@johnjaylward
Copy link
Contributor

well, I'll say at best the document is inconsistent. I'd think this would be a possible breaking change for existing users though.

@stleary
Copy link
Owner

stleary commented Jun 15, 2016

Neither spec is especially well-written, and this particular item is already prone to confusion because a tab can either be represented as an escape sequence or as the character itself. My reading is that tabs are allowed in strings only when they are represented by the char sequence "\t". However, I am not inclined to fix this since the bug seems unlikely to likely to break applications, but the fix might. Other opinions?

@johnjaylward
Copy link
Contributor

I'm OK with that. Maybe just place an Errata in the README?

@johnjaylward
Copy link
Contributor

Also, RFC 7159 page 8 specifies a string to be the same as the flowchart#5 indicates on page 5 of EMCA-404.

So I'm OK with leaving it as it is.

@johnjaylward
Copy link
Contributor

johnjaylward commented Jun 15, 2016

gah, nevermind again. I see this in RFC7159:

All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).

My reading skills need help today. Tab is U+0009, so it's defined as a control character in the spec.

@johnjaylward
Copy link
Contributor

ok, in light that RFC7159 is more explicit and that we already output the tabs properly (https://github.com/stleary/JSON-java/blob/master/JSONObject.java#L1435), I think I may be in favor of throwing an exception when reading JSON with invalid control characters as defined in RFC7159.

It will break backwards compatibility though.

@brunoais
Copy link

brunoais commented Jun 15, 2016

I think that even if the tab is not allowed based on the spec, being permissive when parsing although being strict when stringifying should be the best way.
In other words, I believe that accepting the tab character when converting from JSON should be accepted but when JSON is generated from a string, it should escape according to the rules.
If I read it right, that's exactly how it is happening right now.

@johnjaylward
Copy link
Contributor

johnjaylward commented Jun 16, 2016

I'd go with @brunoais suggestion and just leave it as-is. Possibly make a note in the README/FAQ to indicate that reading is slightly more lax about control characters than writing. I believe a new-line (U+000A) is still not allowed for reading, but other control characters should be fine (even possibly U+0000). Perhaps also make some test cases to confirm behaviour.

Also of note:
RFC7169 page 10 section 9 states that parsers must accept all valid JSON, but may also accept invalid JSON and extensions.

@stleary
Copy link
Owner

stleary commented Jun 16, 2016

Good plan, I will update the README and FAQ, probably this weekend (busy last week of sprint). If anyone gets to it before me, feel free to open a pull request.

@mmoghadas
Copy link
Author

Love the activity! Thanks everyone for their feedback!

@johnjaylward
Copy link
Contributor

see PR #242

@johnjaylward
Copy link
Contributor

johnjaylward commented Jun 16, 2016

also, for a full test case on what control characters are allowed, see stleary/JSON-Java-unit-test#50

in summary, all control characters are allowed except \0, \n, and \r.

@stleary
Copy link
Owner

stleary commented Jun 17, 2016

Please add future comments to the pull request #242.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants