Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError verbose error mesage #121479

Open
ArielRotem1 opened this issue Jul 8, 2024 · 25 comments
Open

JSONDecodeError verbose error mesage #121479

ArielRotem1 opened this issue Jul 8, 2024 · 25 comments
Labels
type-feature A feature request or enhancement

Comments

@ArielRotem1
Copy link

ArielRotem1 commented Jul 8, 2024

Feature or enhancement

Proposal:

Example is the best way to start:

import json

try:
    raw_json_dict = b"{'a': 1, {'b': 2 'c': 3}}"
    letter_to_index = json.loads(raw_json_dict)
except JSONDecodeError as error:
   print(error.verbose())

# Printing => verbose error:"'b': 2 'c': 3}"

The problem:
Currently, when getting a JSONDecodeError you can't fully understand where the issue is because of unindecative error message.
The message only gives you the location of the error but not what actually caused the error itself.
For example, when you try to json.loads() a response from a server and you don't know what the content will be.
If you got a JSONDecodeError and you didn't log or save the raw response fron the server, you won't be able to figure out what caused the error and fix it.

The solution
First, I am not here to propose one unique solution, there could be multiple possible solutions. I just want to present the simple one to show its impact.

The solution I proposed in the example makes use of new function inside the JSONDecodeError class.
The verbose function prints the actual characters in which the error occurred and print a range of chars around them to get a context of the error.
Furthermore, the function will be able to get a parameter which will defines the range of characters around the error to get back.

I would very like to hear your opinions of this feature.
Thanks!

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

@ArielRotem1 ArielRotem1 added the type-feature A feature request or enhancement label Jul 8, 2024
@gaogaotiantian
Copy link
Member

gaogaotiantian commented Jul 8, 2024

I believe this is something that should be posted in https://discuss.python.org/c/ideas/6 first. There are a lot of things to be discussed:

  • whether this is needed (does it provide extra information that's helpful? the current implementation does not impact the normal usage where the exception is directly printed)
  • how this should be implemented
  • what are the side effects
  • what are the alternatives

The idea should be mature and concrete (preferably discussed) before presented in issues :)

@nineteendo
Copy link
Contributor

nineteendo commented Jul 8, 2024

What a coincidence! I've implemented this a few days ago in my fork of the json library:

def _get_err_context(doc: str, pos: int) -> tuple[int, str]:
    line_start: int = doc.rfind("\n", 0, pos) + 1
    if (line_end := doc.find("\n", pos)) == -1:
        line_end = len(doc)

    max_chars: int = get_terminal_size().columns - 4  # leading spaces
    min_start: int = min(line_end - max_chars, pos - max_chars // 2)
    max_end: int = max(line_start + max_chars, pos + max_chars // 2)
    text_start: int = max(min_start, line_start)
    text_end: int = min(line_end, max_end)
    text: str = doc[text_start:text_end]
    if text_start > line_start:
        text = "..." + text[3:]

    if text_end < line_end:
        text = text[:-3] + "..."

    offset: int = pos - text_start + 1
    return offset, text


class JSONDecodeError(SyntaxError):
    """JSON decode error."""

    def __init__(self: Self, msg: str, doc: str, pos: int, filename: str = "<string>") -> None:
        """Create new JSON decode error."""
        lineno: int = doc.count("\n", 0, pos) + 1
        self.colno: int = pos - doc.rfind("\n", 0, pos)
        offset, text = _get_err_context(doc, pos)
        super().__init__(msg, (filename, lineno, offset, text))

    def __str__(self: Self) -> str:
        return f"{self.msg} (file {self.filename}, line {self.lineno:d}, column {self.colno: d})"

Result:

>>> import jsonc
>>> jsonc.loads("[,]")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/storage/emulated/0/Download/pyvz2-alpha/pyvz2/jsonc/__init__.py", line 124, in loads
    return JSONDecoder(allow=allow).decode(s, filename=filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/storage/emulated/0/Download/pyvz2-alpha/pyvz2/jsonc/decoder.py", line 324, in decode
    raise JSONSyntaxError(msg, filename, s, err.value) from None
  File "<string>", line 1
    [,]
     ^
jsonc.scanner.JSONSyntaxError: Expecting value

Ideally the column number should be included in the traceback (and the truncation should happen there):

yield ' File {}"{}"{}, line {}{}{}\n'.format(

yield ' File "{}", line {}\n'.format(

Then it would look even better:

$ python pyvz2 json
[,]

  File '<stdin>', line 1, column 2
    [,]
     ^
jsonc.scanner.JSONSyntaxError: Expecting value

I plan to discuss this in more detail in September.

@ArielRotem1
Copy link
Author

@gaogaotiantian Thank you for letting me know!
It makes more sense to go through reviews of the idea and solidify it.

@ArielRotem1
Copy link
Author

ArielRotem1 commented Jul 8, 2024

@nineteendo Wow! Nice to hear!
It helps to prove that the idea has a meaningful usage.
I would create a post to discuss the idea with the community.
I would love to hear there your support and what you did and why.

@nineteendo
Copy link
Contributor

Sure, but I can't post on Discourse... Can I summarize here?

@ArielRotem1
Copy link
Author

ArielRotem1 commented Jul 8, 2024

Sure, you can post here. I can send a link there to your comment.
But of course it would be better if you could post there.

@nineteendo
Copy link
Contributor

nineteendo commented Jul 9, 2024

(I'm banned)

During a rewrite, I ran into an encoding issue that was impossible to figure out with the current error message:

>>> import json
>>> "{}".encode("utf_16") # redirect stdout
b'\xff\xfe{\x00}\x00'
>>> b'\xff\xfe{\x00}\x00'.decode("cp1252") # redirect stdin
'ÿþ{\x00}\x00'
>>> json.loads('ÿþ{\x00}\x00')
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I had to resort to print() for debugging. After fixing it, I had the idea to display it like syntax errors:

jsonc.scanner.JSONSyntaxError: Expecting value
 File "C:\Users\wanne\empty.json", line 1, column 1
   ÿþ{
   ^

You can even open the file at the location of the error by hovering over the filename in VS Code! It was relatively easy to derive the filename from the open file and pass it around:

def load(fp):
    return loads(fp.read(), filename=getattr(fp, "name", "<string>"))

def loads(s, *, filename="<string>"):
    if not filename.startswith("<") and not filename.endswith(">"):
        filename = realpath(filename)

Unlike python files, lines in a JSON file can get very long, making the error hard to read:

jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                            
                                                                            
                                                                            
                                                                            
                                                                      ^

So I tried only displaying 40 characters before and after:

jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
  File '<stdin>', line 1, column 371
    ...SeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                            ^

But that left a lot of unused space, so I made it dynamic:

jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter
  File '<stdin>', line 1, column 371
    ...s such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                       ^

It didn't look quite right in tracebacks, so I inherited from SyntaxError:

File "<string>", line 1

    ^
jsonc.scanner.JSONSyntaxError: Expecting value

Sadly, we lose the column information now, so traceback should be modified. But other than that, it looks great.

@nineteendo
Copy link
Contributor

You can install jsonyx for testing purposes: https://pypi.org/project/jsonyx
It can be run from the command line with python -m jsonyx.tool.

IMPORTANT: Don't use it in production code, the API is unstable and is insufficiently tested.

@nineteendo
Copy link
Contributor

nineteendo commented Jul 17, 2024

As stated previously, you get the information for free in tracebacks:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/jsonyx/__init__.py", line 247, in loads
    return JSONDecoder(allow=allow).loads(s, filename=filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/jsonyx/__init__.py", line 121, in loads
    return self._scanner(filename, s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1
    [,]
     ^
jsonyx._decoder.JSONSyntaxError: Expecting value

And you can format the exception like this:

>>> import jsonyx, traceback
>>> exc = jsonyx.JSONSyntaxError("Expecting value", "<string>", "[,]", 1)
>>> print(*traceback.format_exception_only(exc), sep="", end="")
  File "<string>", line 1
    [,]
     ^
jsonyx._decoder.JSONSyntaxError: Expecting value

@nineteendo
Copy link
Contributor

nineteendo commented Jul 17, 2024

@elisbyberi, what do you think of my approach? Imagine if syntax errors looked like this:

>>> [,]
SyntaxError: invalid syntax: line 1 column 2 (char 2)

Instead of this:

>>> [,]
  File "<stdin>", line 1
    [,]
     ^
SyntaxError: invalid syntax

Ideally, answer on the Discourse thread, I can't post there.

@nineteendo
Copy link
Contributor

Could you show Chris what the simple approach could look like for indented JSON in a terminal?

  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                            
                                                                            
                                                                            
                                                                            
                                                                      ^
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter

@nineteendo
Copy link
Contributor

Also, it's worth noting that including the filename in the error message can be extremely useful.

@nineteendo
Copy link
Contributor

I'm going to try to underline more than one character if applicable.

@ArielRotem1
Copy link
Author

Could you show Chris what the simple approach could look like for indented JSON in a terminal?

  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "G
lossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta
ndard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1
986", "GlossDef": {"para": "A meta-markup language, used to create markup la
nguages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                            
                                                                            
                                                                            
                                                                            
                                                                      ^
jsonc.scanner.JSONSyntaxError: Expecting ':' delimiter

Why the caret signaling where the error is so far from the data itself?
Additionally, the caret seems to change location according to the screen size of the device you're using.

@nineteendo
Copy link
Contributor

There are 374 spaces before the caret:

  File '<stdin>', line 1, column 371
    {"glossary": {"title": "example glossary", "GlossDiv": {"title": "S", "GlossList": {"GlossEntry": {"ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": {"para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"]}, "GlossSee"}}}}}
                                                                                                                                                                                                                                                                                                                                                                                      ^
jsonyx._decoder.JSONSyntaxError: Expecting ':' delimiter

Which get wrapped in the terminal (which is dependent of the screen size):

Screenshot 2024-07-18 at 18 12 30

Instead of truncating the text:

Screenshot 2024-07-18 at 18 13 28

@nineteendo
Copy link
Contributor

nineteendo commented Jul 18, 2024

Could you mention jsonyx in the thread? I'm getting a little annoyed about problems being mentioned that are already handled by it. Especially the "Expecting value" is quite undescriptive.

@ArielRotem1
Copy link
Author

Yes, I will do it, but it is pretty inconveniet to handle a conversation like this. So if you can do anything to join the fourm again it would be better.

@nineteendo
Copy link
Contributor

nineteendo commented Jul 18, 2024

Using my second GitHub account is a very bad idea, and the chance of getting unbanned early is low.
The library should suffice for the discussion.

@nineteendo
Copy link
Contributor

I can now underline multiple characters:

Screenshot 2024-07-19 at 08 32 19

@nineteendo
Copy link
Contributor

nineteendo commented Jul 25, 2024

Here's the code to get the error context, supporting multiple characters (which provides useful information in error messages):

def _get_err_context(doc: str, start: int, end: int) -> tuple[int, str, int]:
    line_start: int = doc.rfind("\n", 0, start) + 1
    if (line_end := doc.find("\n", start)) == -1:
        line_end = len(doc)

    end = min(max(start + 1, line_end), end)
    max_chars: int = get_terminal_size().columns - 4  # leading spaces
    text_start: int = max(min(
        line_end - max_chars, end - max_chars // 2 - 1, start - max_chars // 3,
    ), line_start)
    text_end: int = min(max(
        line_start + max_chars, start + max_chars // 2 + 1,
        end + max_chars // 3,
    ), line_end)
    text: str = doc[text_start:text_end].expandtabs(1)
    if text_start > line_start:
        text = "..." + text[3:]

    if len(text) > max_chars:
        end -= len(text) - max_chars
        text = text[:max_chars // 2 - 1] + "..." + text[-max_chars // 2 + 2:]

    if text_end < line_end:
        text = text[:-3] + "..."

    return start - text_start + 1, text, end - text_start + 1

This isn't trivial at all. (not sure if it's even correct)

@nineteendo
Copy link
Contributor

nineteendo commented Jul 29, 2024

I can now also mention the end line and end column number:

[
    1,
    2,
    3
  File '<stdin>', line 1-4, column 1-6
    [
    ^
jsonyx._decoder.JSONSyntaxError: Unterminated array

Summary of all extra information in error messages:

  • filename, allowing the file to be directly opened by an IDE like Visual Studio Code
  • end line and end column number clarifying the exact problem
  • context, which makes it easier to help beginners without having to first ask to share the JSON file...

Ideally we should include this in the standard library, but a third-party library like jsonyx works just as well for me.
It's just a bit annoying that I'll have to include this for every future project where I want to use JSON.

But we should definitely take our time to evaluate this.

@nineteendo
Copy link
Contributor

nineteendo commented Jul 31, 2024

Yeah, it wasn't correct, I forget to count the newline when it was selected. I also added universal newline support:

 def _get_err_context(doc: str, start: int, end: int) -> tuple[int, str, int]:
-   line_start: int = doc.rfind("\n", 0, start) + 1
+   line_start: int = max(doc.rfind("\n", 0, start), doc.rfind("\r", 0, start)) + 1
-   if (line_end := doc.find("\n", start)) == -1:
-       line_end = len(doc)
+   if (match := _match_line_end(doc, start)):
+       line_end: int = match.end()
+   else:
+       line_end = start
 
     end = min(max(start + 1, line_end), end)
     max_chars: int = get_terminal_size().columns - 4  # leading spaces
+    if end == line_end + 1:  # newline
+        max_chars -= 1
 
     text_start: int = max(min(
         line_end - max_chars, end - max_chars // 2 - 1, start - max_chars // 3,
     ), line_start)
     text_end: int = min(max(
         line_start + max_chars, start + max_chars // 2 + 1,
         end + max_chars // 3,
     ), line_end)
     text: str = doc[text_start:text_end].expandtabs(1)
     if text_start > line_start:
         text = "..." + text[3:]
 
     if len(text) > max_chars:
         end -= len(text) - max_chars
         text = text[:max_chars // 2 - 1] + "..." + text[-max_chars // 2 + 2:]
 
     if text_end < line_end:
         text = text[:-3] + "..."
 
     return start - text_start + 1, text, end - text_start + 1

I have 22 tests for this function!

@pytest.mark.parametrize(
    ("columns", "doc", "start", "end", "offset", "text", "end_offset"), {
        # Only current line
        (7, "current", 0, 7, 1, "current", 8),
        #    ^^^^^^^             ^^^^^^^
        (12, "current\nnext", 0, 7, 1, "current", 8),
        #     ^^^^^^^                   ^^^^^^^
        (12, "current\rnext", 0, 7, 1, "current", 8),
        #     ^^^^^^^                   ^^^^^^^
        (12, "current\r\nnext", 0, 7, 1, "current", 8),
        #     ^^^^^^^                     ^^^^^^^
        (16, "previous\ncurrent", 9, 16, 1, "current", 8),
        #               ^^^^^^^              ^^^^^^^
        (16, "previous\rcurrent", 9, 16, 1, "current", 8),
        #               ^^^^^^^              ^^^^^^^
        (16, "previous\r\ncurrent", 10, 17, 1, "current", 8),
        #                 ^^^^^^^               ^^^^^^^

        # No newline
        (17, "start-middle-end", 0, 5, 1, "start-middle-end", 6),
        #     ^^^^^                        ^^^^^
        (8, "current\nnext", 0, 12, 1, "current", 8),
        #    ^^^^^^^^^^^^^              ^^^^^^^
        (8, "current\rnext", 0, 12, 1, "current", 8),
        #    ^^^^^^^^^^^^^              ^^^^^^^
        (8, "current\r\nnext", 0, 13, 1, "current", 8),
        #    ^^^^^^^^^^^^^                ^^^^^^^

        # Newline
        (8, "current", 7, 8, 8, "current", 9),
        #           ^                   ^
        (8, "current\nnext", 7, 12, 8, "current", 9),
        #           ^^^^^^                     ^
        (8, "current\rnext", 7, 12, 8, "current", 9),
        #           ^^^^^^                     ^
        (8, "current\r\nnext", 7, 13, 8, "current", 9),
        #           ^^^^^^                      ^

        # Expand tabs
        (8, "\tcurrent", 1, 8, 2, " current", 9),
        #      ^^^^^^^              ^^^^^^^

        # Truncate start
        (7, "start-middle-end", 5, 6, 4, "...-...", 5),  # end
        #         ^                          ^
        (12, "start-middle-end", 7, 11, 5, "...middle...", 9),  # start
        #            ^^^^                       ^^^^
        (6, "start-middle-end", 13, 16, 4, "...end", 7),  # line_end
        #                 ^^^                  ^^^
        (7, "start-middle-end", 16, 17, 7, "...end", 8),  # newline
        #                    ^                    ^

        # Truncate middle
        (13, "start-middle-end", 0, 16, 1, "start...e-end", 14),
        #     ^^^^^^^^^^^^^^^^              ^^^^^^^^^^^^^

        # Truncate end
        (8, "start-middle-end", 0, 5, 1, "start...", 6),  # line_start
        #    ^^^^^                        ^^^^^
        (7, "start-middle-end", 5, 6, 4, "...-...", 5),  # start
        #         ^                          ^
        (12, "start-middle-end", 7, 11, 5, "...middle...", 9),  # end
        #            ^^^^                       ^^^^
    },
)
# pylint: disable-next=R0913
def test_err_context(  # noqa: PLR0913, PLR0917
    monkeypatch: pytest.MonkeyPatch, columns: int, doc: str, start: int,
    end: int, offset: int, text: str, end_offset: int,
) -> None:
    """Test error context."""
    monkeypatch.setenv("COLUMNS", str(4 + columns))  # leading spaces
    exc: JSONSyntaxError = JSONSyntaxError("", "", doc, start, end)
    assert exc.offset == offset
    assert exc.text == text
    assert exc.end_offset == end_offset

I don't think many other people would spent so much time on nice error messages...

@nineteendo
Copy link
Contributor

The library is finally ready for production: https://jsonyx.readthedocs.io.

@ArielRotem1
Copy link
Author

Hi, as @gaogaotiantian said, the python community has discussed the suggestion in the following link:
https://discuss.python.org/t/jsondecodeerror-exceptions-message-verbose/58472
How we advanced from here toward implementing the feature?

@nineteendo
Copy link
Contributor

nineteendo commented Aug 22, 2024

There's no consensus on Discourse.

This doesn't need to be added to Python, a third party-library can provide this functionality just as well.
If you want to make use of this right now you can pip install jsonyx. What DOES need to be implemented by Python is displaying the end line number and start and end column number in the traceback, so that would be worth discussing.

If you do want this feature to be added to Python, I suggest creating an issue (and mentioning me) on https://github.com/simplejson/simplejson and see how that is received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
Status: No status
Development

No branches or pull requests

3 participants