Bug: JSONDecodeError with valid string #57

brupelo · 2019-05-04T14:56:04Z

@mverleg Hi Mark, nice to meet you, first of all, thanks for creating this little library, it's quite handy one... today I've found a little bug.

Could you please take a look & advice?

>>> import json
>>> json.loads(json.dumps('a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'))
'a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'

>>> import json_tricks
>>> json_tricks.loads(json_tricks.dumps('a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\virtual_envs\py364_32\lib\site-packages\json_tricks\nonp.py", line 213, in loads
    return json_loads(string, object_pairs_hook=hook, **jsonkwargs)
  File "d:\software\python364_32\Lib\json\__init__.py", line 368, in loads
    return cls(**kw).decode(s)
  File "d:\software\python364_32\Lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "d:\software\python364_32\Lib\json\decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1 (char 0)

Using json-tricks==3.13.1 + python 3.6.2 + win7 over here.

Ps. you can see the same data is encoded/decoded perfectly using json but it crashes when using json_tricks

The text was updated successfully, but these errors were encountered:

altendky · 2019-05-04T15:35:49Z

A little exploration...

https://repl.it/@altendky/OriginalMoccasinVendor-2

import json

import json_tricks

s = r'\"#'

j = json.dumps(s)
print(repr(j))
jt = json_tricks.dumps(s)
print(repr(jt))

print('json and json_tricks encoding match: {}'.format(j == jt))

print('--- encoded json')
print(j)
print('---')

print('--- json.loads(j)')
json.loads(j)

print('--- json_tricks.loads(j, ignore_comments=False)')
json_tricks.loads(j, ignore_comments=False)

print('--- json_tricks.loads(j)')
json_tricks.loads(j)

'"\\\\\\"#"'
'"\\\\\\"#"'
json and json_tricks encoding match: True
--- encoded json
"\\\"#"
---
--- json.loads(j)
--- json_tricks.loads(j, ignore_comments=False)
--- json_tricks.loads(j)
Traceback (most recent call last):
  File "main.py", line 25, in <module>
    json_tricks.loads(j)
  File "/home/runner/.local/lib/python3.6/site-packages/json_tricks/nonp.py", line 213, in loads
    return json_loads(string, object_pairs_hook=hook, **jsonkwargs)
  File "/usr/local/lib/python3.6/json/__init__.py", line 367, in loads
    return cls(**kw).decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1 (char 0)

mverleg · 2019-05-04T16:31:16Z

Thanks for the report and the analysis!

The comment parsing currently happens with regular expressions, which aren't quite powerful enough to understand all of json. But perhaps they could be expanded to work for these cases.

I'm not sure this can be solved completely generally without making a complete json parser that also understands comments. That would also help with some other issues regarding primitives, but it's a big step.

As you've found you can work around it with ignore_comments=False if you're not using comments.

brupelo · 2019-05-04T16:36:19Z

@mverleg I've posted this question in #python@freenode, that's why @altendky very gently has helped here, that said, main reason why I'd like to solve this issue is because in some of the items I'm serializing/deserializing (items used in pyqt stuff) I'm storing python code as strings, please see below a little example about it:

# ------ a.py ---------
from json_tricks import dumps, loads
import b

for k, v in [(v, getattr(b, v)()) for v in dir(b) if v.startswith("Item")]:
    print('-' * 80)
    try:
        loads(dumps(v))
        print(f"{k} encoded/decoded succesfully")
    except Exception as e:
        print(f"{k} encoded/decoded failed")
        import traceback
        traceback.print_exc()

# ---------- b.py -----------
class ItemInvalid0():

    def __init__(self):
        self.content = 'a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'

class ItemValid0():

    def __init__(self):
        self.content = 'def foo():\n    print(\"hello world\")'

So for my particular case I'm not really sure how I'm gonna serialize/deserialize these type of objects :/

mverleg · 2019-05-04T16:40:20Z

@brupelo Would it help as a workaround to change

loads(dumps(v))

to

loads(dumps(v), ignore_comments=False)

You won't have any comments in the json if you're dumping the data yourself, so it should be okay to ignore them, and would even be a bit faster.

It's just a workaround, but I'm not sure if/when I can fix this issue.

brupelo · 2019-05-04T17:04:55Z

@mverleg Great, it seems that workaround works... :O/

Just to be extra careful, right now I've decided before serializing/dumping the state of my pyqt software to disk I'll check if loads won't crash before saving anything... Why? Well, In the first place I assumed if something was dumpable would also be loadable. And when I tried to restore a session of my pyqt software and found this bug I've got quite annoying as it was a project I wasn't able to recover anymore... of course, this was my fault in the first place for not being extra cautious and for not reading more carefully the docs ;)

Anyway, I'll leave the issue open but so far the workaround is good for me... About your previous comment:

I'm not sure this can be solved completely generally without making a complete json parser that also understands comments. That would also help with some other issues regarding primitives, but it's a big step.

Some solution that come to my mind... https://github.com/dmeranda/demjson, I've used this in some projects and it handles a more general format of json (like SublimeText)... pretty handy library, hope that helps

mverleg · 2019-05-04T18:01:55Z

Yeah better leave it open, it should ideally still work even when ignoring comments.

In general there's no guarantee that things are encodeable and decodeable, or that those return it to the same type. For example, a numpy float gets encoded to just a number, and then there's no way to know it was a numpy type, so it gets decoded to a float. Json also doesn't view lists and tuples as different.

But where possible the aim is for the build-in json-tricks types to be exactly the same after encoding and decoding. Unless primitives=True is used, in which case they'll be stored as simple as possible (often losing type information).

The 'primitives' like lists, maps, numbers, texts and booleans encodable and decodable. Most of the extra types in this library are too.

mverleg · 2019-05-04T18:37:08Z

@brupelo By the way, after you add the ignore_comments to the loading code, you should be able to load your broken project. The error is in the loading code, so the data that your program stored should be fine.

mverleg · 2022-11-02T15:52:23Z

Although this is technically a valid problem, I'm going to close it because

It's been 3+ years without further interest
There is a workaround that's okay for many cases. The workaround will be default behaviour in the next breaking version (4.0), see Make ignore_comments not default (next major release) #74.
The only complete fix is to write a parser, which is a lot of work, and either slow (if in pure Python) or architecture-dependent (if in native code)

If someone has a good solution and is willing to do most of the work, feel free to re-open.

mverleg added a commit that referenced this issue May 4, 2019

Add automated test to check issue #57 (not solved yet)

388fd68

mverleg added the wontfix label Nov 2, 2022

mverleg closed this as completed Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: JSONDecodeError with valid string #57

Bug: JSONDecodeError with valid string #57

brupelo commented May 4, 2019

altendky commented May 4, 2019

mverleg commented May 4, 2019

brupelo commented May 4, 2019

mverleg commented May 4, 2019 •

edited

Loading

brupelo commented May 4, 2019 •

edited

Loading

mverleg commented May 4, 2019

mverleg commented May 4, 2019

mverleg commented Nov 2, 2022 •

edited

Loading

Bug: JSONDecodeError with valid string #57

Bug: JSONDecodeError with valid string #57

Comments

brupelo commented May 4, 2019

altendky commented May 4, 2019

mverleg commented May 4, 2019

brupelo commented May 4, 2019

mverleg commented May 4, 2019 • edited Loading

brupelo commented May 4, 2019 • edited Loading

mverleg commented May 4, 2019

mverleg commented May 4, 2019

mverleg commented Nov 2, 2022 • edited Loading

mverleg commented May 4, 2019 •

edited

Loading

brupelo commented May 4, 2019 •

edited

Loading

mverleg commented Nov 2, 2022 •

edited

Loading