-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: the concept of jstrencode(1)
appears to be both very wrong and very buggy
#28
Comments
One thing that should be done and which I will do is to rename the |
Renamed utf8encode() to utf8_to_unicode() to be less confusing as although converting a code point to unicode is called encoding (from all sources we have seen) in JSON, according to encoders **AND** decoders out there, a code point in a string should be converted to unicode. A number of bugs have been identified in jstrencode(1) during discussion in 'the other repo' (or one of the 'other repos'). This is in jstrencode(1) now (as of yesterday); prior to yesterday it was in jstrdecode(1) due to the unfortunate swap in names. This swap happened because when focusing on issue #13 (the decoding - which turned out to be encoding - bug of \uxxxx) focus of the fact that the jstrencode(1) tool is not strictly UTF-8 but rather JSON was lost. The man page has had these bugs added so it is important to remove them when the bugs are fixed. A new issue #28 has been opened for these problems.
Something I do not understand, looking at the so-called json spec and the JavaScript output, is that for the non valid I presume that this is not understanding the standard correctly (if we can call something about it 'correct' :-) ) or because I am tired and not reading the grammar right. I rather suspect it's a bit of both. |
We are in the process of making some minor edits to We do not expect these edits to me major, just a few tweaks needed. Once the Please, @xexyl, stand by / try to withhold any edits and PRs for short term. We will advise when we are done. |
Thank you! I won't make any edits. I am done for the day and given that (as I brought up in the other thread) I don't even know what you have in mind for the functionality I probably can't do much anyway. |
We have a solution that appears to work well for |
UPDATE 0When we return after dinner, we plan to update the documentation as per our minor edits. |
NOTE: In all of these example, we do NOT have a trailing newline. Regarding a string with a trailing backslash, and here our examples, consider this string:
When encoded by
All good. But let's now mess with things. 🤭 What should happen if we were to give this to
In the above, we have 4 characters What should happen? An ERROR should be thrown. Depending on how you want to slice the problem, one could consider the string to end with Either way, the JSON encoded string is invalid and the resulting JSON is not valid. It's your choice as to what error you'd prefer to throw. We recommend such a test case be added to the "invalid JSON" test suite. |
See PR #29 UPDATE 0With pr #29 and release 1.6.13 2024-11-16 of the mkiocccentry repo and commit c47f3abbe47cc63c33a5f4b99ac6ca35cb773dfe of the temp-test-ioccc repo we have now entered a soft code freeze for the mkiocccentry repo. UPDATE 1aPlease see, @xexyl, GH-issuecomment-2480929883. Please also see GH-issuecomment-2481266359. |
Good things above. I'll reply now. Unfortunately did not have time to address anything before the mkiocccentry sync but these things are not really important for there I guess. I don't think you addressed everything I brought up in this issue but we can discuss those later. |
Other than this issue there are others as well. It appears to me that we would have to keep track of each character in the decoding code. That would complicate matters greatly. We deliberately designed the regexp in the parser/lexer to not worry about certain details since we could take care of them in the code outside the parser/scanner. But the ending quotes would be easily solved by passing the strings through the parser for validation. Some might say this would be overly complicated but it might be the cleaner way since otherwise we would have to add even more code to the decode functions (and maybe encode too?). It would be far easier to use the parser to do this and after all, isn't that the point of it - to verify it's valid and to provide a way to traverse it? We do not have to traverse it but it using the parse function would sure simplify these tools when it comes to errors. What do you think? |
Unsure but it might also help with some of the other problems in this issue. Not sure. Just a thought. I'll look back at this thread and the OT one tomorrow. Got behind in things I have to do today so doing those things now. |
We do NOT think that internal string formatting, such as trialing backslashes and Unicode/UTF-8, belong in the bison/flex scanner/parser. We strongly recommend against involving the bison/flex scanner/parser in details such as determine if a JSON strings properly encoded. |
We need to move on and focus on the Great Fork Merge and getting ready for the IOCCC28. After the IOCCC28 closes, and the judges go on a vacation, and we address calc v3 fork, we can return to considering improvements to this JSON parser. UPDATE 0Sorry (tm Canada 🇨🇦). |
I just meant if we were to invoke it it would be simple to solve these problems. If not though how do you propose to solve the problems? My point was that the parser would flag these as errors and the problem would be solved. We would not have to update the scanner/parser at all. But if you can think of something better or easier please let me know. |
I'm looking forward to it so no problem. I also have other things to concern myself with as well. These do not seem critical, certainly, even if it'd be better if it was resolved. The example you gave is easy to see what is happening but solving it is more complicated than figuring out what it is. Anyway best wishes. We can (as I think I said already) talk about this another time after the next IOCCC. |
Let the routines that form the parse tree call functions that detect what the scanner/parser thinks of as a string, and if the JSON string is not properly encoded (for whatever reason), raise an error and not return the parse tree. That way, the scanner/parser is agnostic as to the internal content of things such as strings. Meanwhile from the point of an application calling the JSON scanner/parser, a bogus encoding of a JSON string is still flagged as an error. We hope this helps. UPDATE 0Some typos fixed. UPDATE 1Time to move on for us as we need to focus on the two significant open Great Fork Merge TODOs. |
That, as I read it, is exactly what I was getting at. Using the parser can determine if the string is properly encoded. If it's not an error is raised and the decoding function is not called. If it is valid JSON it can be called. Also as an extra sanity check we could invoke the JSON parser on the encode tool after encoding, to verify that it's valid JSON. I'll do that but I have some other things in my mind to do with jparse. First one I think will improve error messages when working with strings and not files.
Yes it does.
Ever notice that TYPOS is an anagram for SPOT + Y? And TYPO is an anagram for POT+Y? :-)
Great! Today I have a blood draw, unfortunately. Not that vampires really bother me but I'd rather not go out. Also this place might insist I sit down and I really don't want to esp as I always change clothes after sitting down in another's chair and our washer is not working right (hopefully it can be repaired Wednesday). So I'm doing everything I can to not change outer clothes until it's fixed. Hope you have a great day! In a while I will probably be afk a bit but I do hope to do a number of things here today. I also have other things to do too of course. Hope you slept well or you're having a nice sleep! I also hope that if you're back home that you had a nice holiday and you're safe and sound and well! THANK YOU for the feedback! It won't solve all the issues I raised in this thread but it will solve one that you came up with (I did not even think of the example you gave but it's a great one to fix). |
That new functionality is added .. just running |
Next thing actually is fixing the man page ... for newer functions and strings. But I have to go afk now for a while. Once I've done the man pages I'll then return to jstrdecode. Already added the new functionality (as in committed) of the string parsing wrt better error messages. |
I ended up updating the library in a significant way. I noticed that some function names are ambiguous. Thus the names were renamed. However given that these are internal functions it should not matter. Namely: -extern struct json *parse_json_string(char const *string, size_t len);
-extern struct json *parse_json_bool(char const *string);
-extern struct json *parse_json_null(char const *string);
-extern struct json *parse_json_number(char const *string);
-extern struct json *parse_json_array(struct json *elements);
-extern struct json *parse_json_member(struct json *name, struct json *value);
+extern struct json *json_parse_string(char const *string, size_t len);
+extern struct json *json_parse_bool(char const *string);
+extern struct json *json_parse_null(char const *string);
+extern struct json *json_parse_number(char const *string);
+extern struct json *json_parse_array(struct json *elements);
+extern struct json *json_parse_member(struct json *name, struct json *value); They were too similar to the I also updated the jparse.3 man page. Unfortunately I also accidentally ODed on a medication so I am feeling quite awful. This should not happened but unfortunate circumstances led to this. I had a blood draw but that was postponed until Friday. I'll be fine though. I hope to look at the parse call in jstrdecode but I'm not sure if I'll get to that today. If I don't today I won't until Wednesday at least most likely. However I think once it is done it'll be worth it. It won't solve all problems here but it will solve some. Of course none of this needs to be synced to mkiocccentry right now .. can wait until after IOCCC28 unless of course you need the jstrdecode bug fixes applied for the website. I hope you're doing well with the Great Fork Merge! I do have other things I have to do and given what happened it seems that I'll have to start doing that soon. Though I would think the adding the parser to jstrdecode won't take much effort: though it'll also require a new option, namely |
Functionality added .. have to fix |
Hmm .. strange. The input UPDATE 0Well okay it only warned but now it shows an error. |
Oh great .. now there is a linker error in linux:
|
Nothing should have changed here .. no idea how to fix. Feature test macros do not work :( |
This problem has been going on for some time it seems. But I don't know why. Not good. |
My concern is what if it's something that's a blocker for people in linux using it for mkiocccentry? I'll open a bug report and then I must leave. |
Interesting .. even adding |
Got it. It was util_test in test_jparse. Okay that's a relief. Will fix it but that means it'll have to be updated in mkiocccentry. Sorry :( |
With commit ef45fa8 this horrible mess should be prevented in the future. This caused the problems in mkiocccentry and another problem here too. The If you wish me to remove that functionality from dyn_array and dbg I can do that but not until tomorrow. I know mkiocccentry is in code freeze status so that can happen later. I just hope I can fix my clone .. but might have to wait until another commit is pushed to it as it seems that it might want to merge (for reasons beyond my comprehension) commits that were closed. Good luck with the Great Fork Merge and good day! It's going to be a difficult, long day and that's why I'm stopping now .. working on other things for the time being but probably a bad idea to work here until tomorrow. |
TIME TO PAUSEWe urgently need to FOCUS on IOCCC28 matters and this will have to respectfully delay for months responding to comments about this repo. Unless we do that, IOCCC28 is not going to start this calendar year ... and we don't want to deal with the consequences of that! |
Is there an existing issue for this?
Describe the bug
Based on JavaScript encoding of JSON, there appear to be multiple problems with our tool.
First of all, it appears that the
\
being converted to a\\
is wrong. I will give examples in the what we should expect section.The next problem is that code points should be converted to unicode symbols just like with
jstrdecode(1)
. This is how it is with JavaScript too: both encode and decode need to do this.Another problem is that the
\
escape chars should not be done the way we have it. See what to expect for examples.What you expect
In order of the problems above, here are examples of what jstrencode(1) does and what JavaScript does.
If we have the JavaScript:
it SHOULD (i.e. this is what JavaScript does) encode to:
but our tool converts the string to:
Notice the double
\
before theb
! Now on the subject of the escaped quotes in the beginning see the anything else section.Now as far as the code points go, javascript of:
converts to:
but our tool does this:
or as just a string:
Now if the
\uxxxx
was converted to a unicode symbol it might just be that the\"
surrounding the output would be the difference but I am not sure of this.The third problem appears to be even worse. There might be other cases where something like this happens but anyway the JavaScript:
... turns into:
Notice how the
\c
has the\
silently removed and thec
is by itself (or rather it's after the unicode symbol and before theo
).But our tool does something extremely wrong. First as a string by itself:
.. or as the json with
{}
:As can be seen the encoding concept we have appears to be totally wrong.
Environment
jparse_bug_report.sh output
n/a
Anything else?
As for the escaped
"
surrounding the string. I guess it depends on how this tool will be used but even so it would appear to be the wrong default.As for the doubling of
\
it also depends on how we need to use it.But clearly the
\
of non-valid escape chars seems to be wrong.Of course it could be that it's because I am tired or because I do not really know JavaScript but it might not be. I think it probably isn't either of those. But of course depending on what our tool needs to do that deviates from a normal json encoder ....
The text was updated successfully, but these errors were encountered: