Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] bin tools are using jstrencode(1) when they should be decoding JSON encoded strings #2752

Closed
1 task done
lcn2 opened this issue Nov 12, 2024 · 94 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working top priority This a top priory critical path issue for next milestone

Comments

@lcn2
Copy link

lcn2 commented Nov 12, 2024

Is there an existing issue for this?

  • I have searched for existing issues and did not find anything like this

Describe the bug

Tools such as:

  • bin/cvt-submission.sh
  • bin/entry2csv.sh
  • bin/gen-authors.sh
  • bin/gen-location.sh
  • bin/gen-years.sh
  • bin/output-index-author.sh
  • bin/output-year-index.sh
  • bin/subst.entry-index.sh

Are using jstrencode(1) when they need to take JSON encoded strings and decode them into "real strings" for use in markdown and HTML pages.

What you expect

Tools such as:

  • bin/cvt-submission.sh
  • bin/entry2csv.sh
  • bin/gen-authors.sh
  • bin/gen-location.sh
  • bin/gen-years.sh
  • bin/output-index-author.sh
  • bin/output-year-index.sh
  • bin/subst.entry-index.sh

would use jstrdecode(1) to decode JSON encoded strings, converting them into "real" strings.

Environment

  • OS: n/a
  • Device: n/a
  • Compiler: n/a

Anything else?

Things may have gone amiss with commit e58bd97 (dated: Fri Nov 1 06:45:02 2024 -0700)

This seems to be linked to commit 0a7c9673fa7797f1e9c2c87dea377edb03816f03 (dated: Thu Oct 31 11:57:38 2024 -0700) from the "other repo".

UPDATE 0

This is a Great Fork Merge show stopper.

@lcn2 lcn2 added bug Something isn't working top priority This a top priory critical path issue for next milestone labels Nov 12, 2024
@lcn2
Copy link
Author

lcn2 commented Nov 12, 2024

We remain concerned with the issue of JSON string encoding and decoding.

We suspect that the details of GH-issuecomment-2471493104 were either unclear or glossed over.

We don't have access to the shell now, but when we return we plan to do more testing. Nevertheless what we saw in the code concerned us enough to file this bug and halt the Great Fork Merge process.

@xexyl
Copy link

xexyl commented Nov 12, 2024

Okay. Without having the time to read this I am admittedly confused because taking UTF-8 to Unicode is encoding and it seemed to work.

But I will have to look at this tomorrow. Sorry.

@xexyl
Copy link

xexyl commented Nov 12, 2024

I just read it and it doesn't make sense. It sounds like a terminology issue but taking a UTF-8 code point and converting it into a Unicode symbol is encoding not decoding.

Please explain what you are getting at. I will look at this tomorrow.

@lcn2
Copy link
Author

lcn2 commented Nov 12, 2024

UTF-8 is an encoding of Unicode stuff.

JSON string encoding of "real" strings, decoding of JSON encoded strings into "real" strings is another matter.

When jstrdecode(1) decodes a JSON encoded string such as:

"\uD83D\uDD25"

one expects to get the "real" string of:

🔥

P.S.

The man page for jstrencode(1), as we think we mentioned before, seems wrong. It reads:

jstrencode encodes JSON decoded strings given on the command line.

The term "JSON decoded strings" is somewhat meaningless. JSON doesn't decode strings. The so-called JSON specification requires all strings to be encoded, and thus produce JSON encoded strings. See GH-ssuecomment-2471493104.

The man page for jstrencode(1) should read something like:

jstrencode encodes strings into JSON encoded strings accoding to the so-called JSON specification.

@xexyl
Copy link

xexyl commented Nov 12, 2024

All sources say the opposite of what you're saying though. That's why I swapped the terms.

The man page can be updated of course as can documentation.

@lcn2
Copy link
Author

lcn2 commented Nov 13, 2024

I just read it and it doesn't make sense. It sounds like a terminology issue but taking a UTF-8 code point and converting it into a Unicode symbol is encoding not decoding.

Please explain what you are getting at. I will look at this tomorrow.

JSON requires strings to be encoded. This JSON string encoding has nil to do with how UTF-8 encodes Unicode stuff.

Quoting from GH-issuecomment-2471493104:

JSON string encoding, at a minimum, requires the string to be surrounded by double quotes. At a minimum, encoding will result in the prepending and appending a double quote character.

JSON string encoding ALSO requires one to convert things like ASCII newlines into "\n". There are other important back-slashing requirements such as dealing with double quotes are within the "real" string, backslashes, tabs, etc. (that need to be handed during the JSON string encoding process)

Quoting from GH-issuecomment-2471493104 again:

Encoding this "real" string:

This "string" has a newline
in the middle and at the end

into this JSON encoded string:

"This \"string\" has a newline\nin the middle and at the end\n"

The above is JSON string encoding.

Now the question of what to do with non-ASCII stuff IS a matter for Unicode / UTF-8 encoding and decoding. But that is NOT JSON string encoding, nor is it decoding of JSON encoded strings.

The issue 13 in the other repo raised a concern about JSON encoded strings that contained \uHexHexHexHex-stuff was being handed as a side effect of decoding JSON encoded strings.

Tools such as jsp(1) formatted or "pretty printed" JSON encoded strings like this:

$ echo '"œßåé"' | jsp --no-color --indent 4
"\u0153\u00df\u00e5\u00e9"

NOTE: The pipe input above is a JSON encoded string. The output produced by the jsp(1) tool is also a JSON encoded string. The jsp(1) tool happens to choose to formatted or "pretty printed" by printing \uHexHexHexHex stuff. While it is a questionable practice from the point of view of formatting or "pretty printing", it is technically correct as BOTH JSON encoded strings are valid and are equivalent.

BTW: we think the jsp(1) tool makes things ugly when to does that sort of formatting.

The jstrdecode(1) tool and the internal JSON encoded string decoding functionality needs to be able to that the JSON encoded string:

"\u0153\u00df\u00e5\u00e9"

and produce the original "real" string: œßåé.

Instead `jstrencode(1) does this!

We don't mind if, in the process of JSON string encoding (what jstrencode(1) should do), Unicode stuff is left alone and/or converted in such a way that when one decodes (what jstrdecode(1) should do), produces nice looking Unicode stuff.

So by all means, let jstrencode(1) convert the string: 🔥by JSON encoding into "🔥" as that is valid JSON.

And by all means, let jstrdecode(1) convert the JSON encoded string "🔥" into the real string: 🔥

We just need to ALSO be sure that when jstrdecode(1) is given the JSON encoded string "\uD83D\uDD25" we get the real string: 🔥 as well.

Both "\uD83D\uDD25" and "🔥" are valid JSON encoded strings. They should convert into the same real string 🔥 as well.

We hope this helps.

@lcn2
Copy link
Author

lcn2 commented Nov 13, 2024

All sources say the opposite of what you're saying though. That's why I swapped the terms.

If there is a comment bug in the source comments, then those should be fixed.

We looked at the comment for the json_encode(char const *ptr, size_t len, size_t *retlen) function in jparse/json_parse.c in the "other repo". It looks correct.

Are we missing something? There certainly could have been some copy and paste errors when code for one side of encoding was converted into decoding, for example.

UPDATE 0

We recommend to prioritize on GH-issuecomment-2472008592 and the core of this issue first, before worrying about source code comments and man pages.

@xexyl
Copy link

xexyl commented Nov 13, 2024

All sources say the opposite of what you're saying though. That's why I swapped the terms.

If there is a comment bug in the source comments, then those should be fixed.'

It could be a comment bug yes: when I tried to (later) correct the comments by swapping from one function to the other, it might have been a mistake. However ... I see something really odd.

We looked at the comment for the json_encode(char const *ptr, size_t len, size_t *retlen) function in jparse/json_parse.c in the "other repo". It looks correct.

Okay so then it maybe isn't a comment bug?

Are we missing something? There certainly could have been some copy and paste errors when code for one side of encoding was converted into decoding, for example.

It could be. But if the encoding comment looks correct and I swapped it then it seems like it might be right and this is not an issue? But even so see below.

UPDATE 0

We recommend to prioritize on GH-issuecomment-2472008592 and the core of this issue first, before worrying about source code comments and man pages.

Of course.

I'll post a new comment with something funny though.

@xexyl
Copy link

xexyl commented Nov 13, 2024

Here's something funny.

All sources say that converting '\u0f0f' to its unicode symbol is ENCODING. But what about JavaScript's JSON object (or whatever it is)? Check this javascript out:

const json_to_encode = {
          name: "\u0f0f"
        };
            const json_to_decode = '{"name": "\u0f0f"}';

            const json_decoded = JSON.parse(json_to_decode);
            const json_encoded = JSON.stringify(json_to_encode);

        document.writeln(json_decoded.name);
        document.write(json_encoded);

shows in the html file:

༏ {"name":"༏"}

which suggests that BOTH encoding and decoding convert the \uxxxx to its unicode symbol!

Now given that the function we have is utf8encode() and given many other sources I would say that the right term to use for what we have is ENCODE and given that it does print out the correct string in the website then this is probably okay.

What do you think?

@xexyl
Copy link

xexyl commented Nov 13, 2024

Meanwhile I had an interesting idea based on this, as I was waking up (a common thing of programmers as you surely know :-) ).

What if the jstrencode/jstrdecode tools had an option to parse the string as JSON? It would be something like (I think - I would have to look at it more and only just did) something like...

For encoding:

	-j		parse as JSON post encoding (def: do not)
	-j level	set JSON debug level

and for decoding:

	-j		parse as JSON pre encoding (def: do not)
	-J level	set JSON debug level

Now what would the purpose of this be? Perhaps sanity checks or maybe for some experiment or else because part of the parsing is the encoding (which might suggest that only the encoding one should have the option but that's why the post/pre actions).

@xexyl
Copy link

xexyl commented Nov 13, 2024

I just read it and it doesn't make sense. It sounds like a terminology issue but taking a UTF-8 code point and converting it into a Unicode symbol is encoding not decoding.
Please explain what you are getting at. I will look at this tomorrow.

JSON requires strings to be encoded. This JSON string encoding has nil to do with how UTF-8 encodes Unicode stuff.

Yes. But the point is that when UTF-8 '\u0f0f' (for example) is encoded it turns into and that's exactly what jstrencode does.

Quoting from GH-issuecomment-2471493104:

JSON string encoding, at a minimum, requires the string to be surrounded by double quotes. At a minimum, encoding will result in the prepending and appending a double quote character.

Aha. Now I wonder about this. Perhaps the problem is that some of the options need to be moved from one tool to the other? That seems likely. Though in that case will the output of the UTF-8 code points for the website then show the right string? If it adds "s then it would be the wrong output, right? Perhaps that's why you want the decode tool to also do this?

JSON string encoding ALSO requires one to convert things like ASCII newlines into "\n". There are other important back-slashing requirements such as dealing with double quotes are within the "real" string, backslashes, tabs, etc. (that need to be handed during the JSON string encoding process)

Okay so it seems likely that some of the options/functionality has to be moved over too, and the names alone cannot be swapped? For instance:

$ cat nl


$ jstrdecode < nl
\n\n
$ jstrencode < nl
Warning: json_encode: found non-\-escaped char: 0x0a
Warning: jstrencode_stream: error while encoding stdin buffer
Warning: main: error while encoding processing stdin

?

Unless there is also some confusion with the terms?

Quoting from GH-issuecomment-2471493104 again:

Encoding this "real" string:

This "string" has a newline
in the middle and at the end

into this JSON encoded string:

"This \"string\" has a newline\nin the middle and at the end\n"

The above is JSON string encoding.

Okay and I see...

$ jstrdecode  < foo.txt 
This \"string\" has a newline\nin the middle and at the end\n

But the thing is how do we determine when to encode and when to decode, then? I can see what you mean here: JSON with a " inside a quote is wrong unless it's escaped. But on the other hand encoding does seem to also convert a code point into its unicode symbol. This is a mess!

Now the question of what to do with non-ASCII stuff IS a matter for Unicode / UTF-8 encoding and decoding. But that is NOT JSON string encoding, nor is it decoding of JSON encoded strings.

Hmm ... okay so that might be something to consider too. The question we have I think then: what do we do here? Perhaps the tool names should be reverted again but then we have to decide how to proceed with the code points?

The issue 13 in the other repo raised a concern about JSON encoded strings that contained \uHexHexHexHex-stuff was being handed as a side effect of decoding JSON encoded strings.

Yes.

Tools such as jsp(1) formatted or "pretty printed" JSON encoded strings like this:

$ echo '"œßåé"' | jsp --no-color --indent 4
"\u0153\u00df\u00e5\u00e9"

NOTE: The pipe input above is a JSON encoded string. The output produced by the jsp(1) tool is also a JSON encoded string. The jsp(1) tool happens to choose to formatted or "pretty printed" by printing \uHexHexHexHex stuff. While it is a questionable practice from the point of view of formatting or "pretty printing", it is technically correct as BOTH JSON encoded strings are valid and are equivalent.

Right. It is encoded. Which suggests that the decoded is the \uxxxx string. Though as my example javascript above shows it seems that it's both. Now as you say this is not json encoding/decoding so perhaps I do need to swap the names again. But in this case we do also need to determine how to encode the code points in the case of it happening. Unless we do not want this feature?

BTW: we think the jsp(1) tool makes things ugly when to does that sort of formatting.

When it converts it to \uxxxx you mean?

The jstrdecode(1) tool and the internal JSON encoded string decoding functionality needs to be able to that the JSON encoded string:

"\u0153\u00df\u00e5\u00e9"

and produce the original "real" string: œßåé.

Instead `jstrencode(1) does this!

That's because all the sources say that that IS encoding, not decoding. Except for the example I gave above which suggests both do it.

And this is why the tools here use the jstrencode tool, not the jstrdecode tool. So minus the fact that the javascript example suggests that both encoding and decoding should print out the fire based on the emoji I gave (above), it should be good, unless you want to quibble about terminology. But since below you talk about how both should do it then it seems like that matters less if at all.

Now as you say though, the json encoding/decoding is not the same thing as unicode. On the other hand you did raise the problem of it not doing it at all.

We don't mind if, in the process of JSON string encoding (what jstrencode(1) should do), Unicode stuff is left alone and/or converted in such a way that when one decodes (what jstrdecode(1) should do), produces nice looking Unicode stuff.

I wonder if there is a way to do it for both like the javascript above shows.

So by all means, let jstrencode(1) convert the string: 🔥by JSON encoding into "🔥" as that is valid JSON.

It does this already indeed.

And by all means, let jstrdecode(1) convert the JSON encoded string "🔥" into the real string: 🔥

Well jstrdecode will take the fire emoji and output the fire emoji:

$ cat fire.json | jstrdecode -n
🔥\n

but it appears that the -n option does not work. Hmm.... I wonder why.

Still it should I believe also output the emoji for both encoding and decoding.

We just need to ALSO be sure that when jstrdecode(1) is given the JSON encoded string "\uD83D\uDD25" we get the real string: 🔥 as well.

Is this the real problem then? The tool names should be swapped back and both should print out the fire emoji from the fire code point? This way the json encoded strings will be correctly encoded but still print out the encoded form? I think that sounds reasonable but how to go about it I'm not sure yet.

Both "\uD83D\uDD25" and "🔥" are valid JSON encoded strings. They should convert into the same real string 🔥 as well.

That's true. And that's what happens with jstrencode(1). That's why the confusion. But as above... I'm not sure.

@xexyl
Copy link

xexyl commented Nov 13, 2024

I guess the following. Please correct me if I'm wrong.

  1. The tools should have their named swapped again. This is because the point was for JSON encoding/decoding, not encoding of the strings themselves.
  2. Then both decoding and encoding of the code points should turn it into the proper unicode symbol.
  3. After this the documentation can be updated for both.

I have a thought on what might allow this to happen but I am not sure. I have to look at the code too.

@xexyl
Copy link

xexyl commented Nov 13, 2024

.. first step is to swap the names again. That'll be fun.

@xexyl
Copy link

xexyl commented Nov 13, 2024

Okay the filenames and terms are swapped. The next step would be to make sure that both encode and decode convert code points to unicode symbols. I think I have an idea how to do this. But I will have to go afk very soon.

@xexyl
Copy link

xexyl commented Nov 13, 2024

Made a commit ... not pushing yet. Have to go afk. Once I'm back I'll work on the unicode problem. Then we can figure out which tool belongs in the website. Hoping I can manage this today.

@xexyl
Copy link

xexyl commented Nov 13, 2024

Ugh. The real problem is that the json_encode() function (after name change) uses the table ... not the parsing manually. I don't know how to fix that yet. I'll ponder it as I'm afk (or part of the time).

@xexyl
Copy link

xexyl commented Nov 13, 2024

Well I have an idea but unfortunately it might have to be done first .. but that means the table access will be wrong. This is because the '` for example in \u will be changed to \\u. So something has to be figured out. But it might end up that this table can be worked out to be not needed or more useful. We shall see!

@xexyl
Copy link

xexyl commented Nov 13, 2024

Just pushed the changes noted above. In a bit I will look at seeing if I can figure out the encoding/decoding of code points.

@xexyl
Copy link

xexyl commented Nov 13, 2024

Hmm .. my initial idea will not work. This is turning into a nightmare.

@xexyl
Copy link

xexyl commented Nov 13, 2024

Have another thought. Looking into it.

@xexyl
Copy link

xexyl commented Nov 13, 2024

That does not work either because the table converts \ into two \ so the \u is not matched. It is not clear to me if it's okay to change it to be a single one .. yet.

@xexyl
Copy link

xexyl commented Nov 13, 2024

I think i got it! Have to do more testing ...

@xexyl
Copy link

xexyl commented Nov 13, 2024

The one problem is that doing...

$ jstrencode '\'

fails when it should print \\. It seems like the encode function might need to go through the string itself, somehow, like the decoding process, but with slightly different rules.

@xexyl
Copy link

xexyl commented Nov 13, 2024

I discovered another bug too ... check this:

$ jstrencode '\a'
'\\a

Should not have the first character there, but just \\a.

UPDATE 0

Or perhaps that is a display issue?

@lcn2
Copy link
Author

lcn2 commented Nov 15, 2024

Recommend Priorities

We recommend a priority on moving to a state where we can do a "soft code freeze 🥶" for the mkiocccentry repo while making sure that make quick_www and then make www work AFTER installation of the tools from mkiocccentry repo and the jparse sub-directory of mkiocccentry and AFTER import and JSON string encoding and the decoding of JSON encoded strings are working.

In particular, fixing stuff (code, code comments,man pages, doc) that flipped around for jstrencode(1) and jstrdecode(1) while making sure that the tools, after they are installed, work well with make quick_www and then make www.

We are aware that there are questions and issues relating to blackslashes and perhaps Unicode/UTF-8 matters, and perhaps other jparse matters. While those are interesting and important, they may not necessarily be a problem for the existing web site and thus are not a top priority.

Once mkiocccentry repo is stable and in a position to have a "soft code freeze 🥶", we can consider those "questions and issues" because by then, make www will be happy 😊, we can make progress on the Great Fork Merge 🥳.

And by consider those "questions and issues", assess the size of the task to investigate 🔬, determine the effort to answer open questions, assess the potential impact on the web site, and if there is enough time, install a code change into the mkiocccentry repo before the "hard code freeze 🧊" and final release prior to IOCCC28.

We hope this helps.

UPDATE 0a

Question

Is the mkiocccentry repo "in sync" with the jparse repo? We ask this because we thought we say some slight differences in jparse/json_util.c via the browser but we cannot say for certain.

@xexyl
Copy link

xexyl commented Nov 15, 2024

I fixed the man page mix up .. thanks! Not in mkiocccentry just yet. I have to sync it over. It happened due to rushing to get the commit in before I left yesterday for an appointment.

As for jstrdecode and quotes: it's not that simple but I'll find the thread you wrote it in and reply to that later on .. very tired. Woke up at stupid o'clock today :(

@xexyl
Copy link

xexyl commented Nov 15, 2024

Ah, it's this thread. Okay I'll try replying now but might take a bit of time.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Man page bugs

Perhaps the man pages have been fixed in some PR that we have not yet considered:

We'd check the man pages in the repo via a browser, but the current https://github.com/ioccc-src/mkiocccentry/blob/master/jparse/man/man1/jstrencode.1 contains text for jstrdecode(1), so we cannot advise of what they had/have in terms command line options.

Thanks for pointing this out! I forgot to do the name swap when I was rushing yesterday. I did that a few minutes ago. Syncing to mkiocccentry now too.

@xexyl
Copy link

xexyl commented Nov 15, 2024

And no: jparse in mkiocccentry is not up to date because I made some slight changes yesterday. Nothing vital but something that I felt should be done. I'll sync it with the man pages as well. Still the jstrdecode issue with quotes will take more time to consider and it might need an option to say it's a quote. I'll explain in a bit.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Ugh .. I just found yet another problem from the rush! I'll fix that too.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Doing a sync ... will then reply to other comments if I can. Otherwise later on today I'm sure.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Ah .. encoding has the skip quotes. Maybe decoding needs that as well?

IMPORTANT functionality

Be default, jstrencode(1) MUST append and prepend double quotes. So:

$ jstrencode foo
"foo"

Be default, jstrdecode(1) MUST expect enclosing double quotes or generate an ERROR. So:

$ jstrdecode foo
jstrdecode ERROR: ...

Unfortunately it's not that simple. Because this is also valid json:

true

as is

-5

and so on. If we expected quotes then it would be in error. Instead we need an option to jstrdecode that says the arg is a string and then if that arg is used we expect a " as the first and last character. I can do that I hope today maybe even in a bit if I can find the energy.

There are, or should be, command line options flags that "deal with" enclosing double quotes.

There are but not in the same way you refer to.

For example, a flag to ask jstrencode(1) to NOT enclose the output in double quotes.

Okay but in this case it should actually be that using it means TO enclose the output in double quotes. Why? Because otherwise we'll get something like:

$ jstrencode -- -5
"-5"

which is wrong.

I suggest the option -s to tell it that it is meant to be a string.

For example, a flag to ask jstrdecode(1) to NOT require enclosing double quotes, and thus NOT throw an error if the input/arg(s) lack them.

In this case it has to be the opposite for reasons I cited; it should be that it IS a string. Otherwise the new option you suggest would have to be used most the time which is annoying to the user. Here too I suggest -s.

There are, or were, or should have been such command line options flags .. we are not near a terminal to check .. so if such options exist, please be sure they still work. If no such options exists (or no longer exist), then please add them.

There were options that deal with quotes but not like this. As I see it the following options should be added to jstrencode:

-s		assume args are strings, enclosing output in quotes

and jstrdecode should have:

-s		assume arg is a string, expecting a leading and trailing "

I hope to work on these today but as I said I am extremely tired .. woke up too early :( I hope I get more energy sometime soon.

Man page bugs

Perhaps the man pages have been fixed in some PR that we have not yet considered:

We'd check the man pages in the repo via a browser, but the current https://github.com/ioccc-src/mkiocccentry/blob/master/jparse/man/man1/jstrencode.1 contains text for jstrdecode(1), so we cannot advise of what they had/have in terms command line options.

That was fixed in jparse and mkiocccentry (along with other problems discovered). Thanks!

@xexyl
Copy link

xexyl commented Nov 15, 2024

Recommend Priorities

We recommend a priority on moving to a state where we can do a "soft code freeze 🥶" for the mkiocccentry repo while making sure that make quick_www and then make www work AFTER installation of the tools from mkiocccentry repo and the jparse sub-directory of mkiocccentry and AFTER import and JSON string encoding and the decoding of JSON encoded strings are working.

Well the good thing is that after you merge the commit I made today this part should be fine. Also a mistake with the rush to get those important fixes in yesterday was repaired.

In particular, fixing stuff (code, code comments,man pages, doc) that flipped around for jstrencode(1) and jstrdecode(1) while making sure that the tools, after they are installed, work well with make quick_www and then make www.

This should, as noted above, be fine, though I have not tested it with the jparse subdirectory of mkiocccentry. However given that it's now synced up (though not merged in the master branch) it should be fine.

We are aware that there are questions and issues relating to blackslashes and perhaps Unicode/UTF-8 matters, and perhaps other jparse matters. While those are interesting and important, they may not necessarily be a problem for the existing web site and thus are not a top priority.

I agree there. I can always work on it in the interim of course.

Once mkiocccentry repo is stable and in a position to have a "soft code freeze 🥶", we can consider those "questions and issues" because by then, make www will be happy 😊, we can make progress on the Great Fork Merge 🥳.

Great. I hope that the new options I described will be easy enough. I don't know if you want those in the mkiocccentry repo or not. I can guess that you do want them in and I also guess you want the minimum version of the jstrdecode in the bin tools (here) updated too. I can do that for sure.

And by consider those "questions and issues", assess the size of the task to investigate 🔬, determine the effort to answer open questions, assess the potential impact on the web site, and if there is enough time, install a code change into the mkiocccentry repo before the "hard code freeze 🧊" and final release prior to IOCCC28.

That makes sense.

UPDATE 0a

Question

Is the mkiocccentry repo "in sync" with the jparse repo? We ask this because we thought we say some slight differences in jparse/json_util.c via the browser but we cannot say for certain.

It is now thanks to your comment .. you were right: that file did have some changes as did some others. Some of them I am not surprised as I made those changes after the pull request. Others surprised me. I thought I had synced them. But obviously I didn't. The man page files were swapped too thanks to you. I guess I shouldn't have tried to rush it when I had to leave early morning but at least it should be fine now and we also found some other issues in the process.

@xexyl
Copy link

xexyl commented Nov 15, 2024

We hope this helps.

!sknahT .did yllautca ti ,yako ... ton did tI

@xexyl
Copy link

xexyl commented Nov 15, 2024

I guess that this issue here though can be closed as complete? Unless you want the new options to the encode/decode tools in first. I'm going to go afk a bit .. when back I hope I can work on the new options.

@xexyl
Copy link

xexyl commented Nov 15, 2024

The jstrdecode(1) new option -s has been, I believe, completed.

The jstrencode now needs its version.

@xexyl
Copy link

xexyl commented Nov 15, 2024

The jstrencode version was done a bit ago .. doing a make test. Increased version of each to 2.5 as it seems like a significant update.

Also you might enjoy the modification to the help string of the -q option to all the tools. I think I might do that for the mkiocccentry repo too or at least txzchk but we'll see.

After these are synced to mkiocccentry I'll do a make install from the mkiocccentry subdirectory and then do a make www though I can't imagine there will be any problems. I will also update the minimum version of the tool in the bin/ scripts.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Well okay a test did fail .. I wonder why. Probably the new flag but I'll check.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Hmm ... strange issue. Working on it.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Oh. I think I see. Investigating.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Solved. Will do a commit soonish .. then sync to mkiocccentry then work on the website scripts.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Committed to the jparse repo. Will now make sure it goes well in test workflow. Assuming all is good I'll sync to mkiocccentry and then update the bin/ tools here. After that I'll run make www and make sure things are okay. I will then make a commit here if all is good. I don't think I'll be doing anything else today with code, though.

@lcn2
Copy link
Author

lcn2 commented Nov 15, 2024

Unfortunately it's not that simple. Because this is also valid json:

true

as is

-5

The jstrencode(1) and jstrdecode(1) tools work on strings and JSON encoded strings. They do NOT work on JSON booleans, JSON numbers, JSON nulls, etc.

So the above JSON examples to not apply.

@lcn2
Copy link
Author

lcn2 commented Nov 15, 2024

Thank you for resolving this imporant issue @xexyl

@lcn2 lcn2 closed this as completed Nov 15, 2024
@xexyl
Copy link

xexyl commented Nov 15, 2024

Unfortunately it's not that simple. Because this is also valid json:

true

as is

-5

The jstrencode(1) and jstrdecode(1) tools work on strings and JSON encoded strings. They do NOT work on JSON booleans, JSON numbers, JSON nulls, etc.

So the above JSON examples to not apply.

Hmm .... that is a valid point indeed given the name. However the examples have always shown that and I think that also came from you, though perhaps not. But the name would suggest that it's only strings. On the one hand it would be nice to have a general encoder/decoder but on the other ...

I wonder what should be done. Probably need to rethink this and then maybe take the -s option away. It might be good to still have the option I was working on though, to validate as JSON or not. But it might not. I'll close the pull request over at the mkiocccentry repo for now.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Or actually I'll turn it into a draft.

And you're welcome.

@xexyl
Copy link

xexyl commented Nov 15, 2024

My guess is the following.

Given the name I can get rid of the -s option.

The library change is still good, however.

Then I can also use the new options I added that have the parser validate it.

That being said if we do this then examples have to be heavily modified.

@lcn2
Copy link
Author

lcn2 commented Nov 15, 2024

Hello from eastern Greenland 🇬🇱 👋

@xexyl
Copy link

xexyl commented Nov 15, 2024

Actually .. it turns out that the use of the library update is very useful! It's just that we now will assume it's only strings.

I have added a -j and -J level option to jstrdecode(1).

If -j is specified do not validate as JSON (the reason for doing it normally is it helps out with the decoding). Otherwise do validate it. If validation is desired (-j not used) then -J sets the JSON debug level, of course.

I have already removed the -s option. I have to do the same in jstrencode too. I'll then make a commit over there and then mkiocccentry and make it not a draft.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Hello from eastern Greenland 🇬🇱 👋

And yet .. sadly, due to human arrogance, it's getting more green - in colour, and less green in environment :(

Anyway glad travels are going well! Stay safe and well!

@xexyl
Copy link

xexyl commented Nov 15, 2024

Unsure if jstrdecode is supposed to quote the output now .. I think not since it's decoding, not encoding. I'll do it that way but if you can think of a case where an option would be good for that, please let me know.

@lcn2
Copy link
Author

lcn2 commented Nov 15, 2024

Unsure if jstrdecode is supposed to quote the output now .. I think not since it's decoding, not encoding. I'll do it that way but if you can think of a case where an option would be good for that, please let me know.

By default, jstrdecode(1) does NOT enclose the output in double quotes.

If -Q or -e is given, then it will.

@xexyl
Copy link

xexyl commented Nov 15, 2024

Unsure if jstrdecode is supposed to quote the output now .. I think not since it's decoding, not encoding. I'll do it that way but if you can think of a case where an option would be good for that, please let me know.

By default, jstrdecode(1) does NOT enclose the output in double quotes.

If -Q or -e is given, then it will.

Thanks for the reminder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working top priority This a top priory critical path issue for next milestone
Projects
None yet
Development

No branches or pull requests

2 participants