Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: JSONDecodeError with valid string #57

Closed
brupelo opened this issue May 4, 2019 · 8 comments
Closed

Bug: JSONDecodeError with valid string #57

brupelo opened this issue May 4, 2019 · 8 comments
Labels

Comments

@brupelo
Copy link

brupelo commented May 4, 2019

@mverleg Hi Mark, nice to meet you, first of all, thanks for creating this little library, it's quite handy one... today I've found a little bug.

Could you please take a look & advice?

>>> import json
>>> json.loads(json.dumps('a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'))
'a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'

>>> import json_tricks
>>> json_tricks.loads(json_tricks.dumps('a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\virtual_envs\py364_32\lib\site-packages\json_tricks\nonp.py", line 213, in loads
    return json_loads(string, object_pairs_hook=hook, **jsonkwargs)
  File "d:\software\python364_32\Lib\json\__init__.py", line 368, in loads
    return cls(**kw).decode(s)
  File "d:\software\python364_32\Lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "d:\software\python364_32\Lib\json\decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1 (char 0)

Using json-tricks==3.13.1 + python 3.6.2 + win7 over here.

Ps. you can see the same data is encoded/decoded perfectly using json but it crashes when using json_tricks

@altendky
Copy link

altendky commented May 4, 2019

A little exploration...

https://repl.it/@altendky/OriginalMoccasinVendor-2

import json

import json_tricks

s = r'\"#'

j = json.dumps(s)
print(repr(j))
jt = json_tricks.dumps(s)
print(repr(jt))

print('json and json_tricks encoding match: {}'.format(j == jt))

print('--- encoded json')
print(j)
print('---')

print('--- json.loads(j)')
json.loads(j)

print('--- json_tricks.loads(j, ignore_comments=False)')
json_tricks.loads(j, ignore_comments=False)

print('--- json_tricks.loads(j)')
json_tricks.loads(j)
'"\\\\\\"#"'
'"\\\\\\"#"'
json and json_tricks encoding match: True
--- encoded json
"\\\"#"
---
--- json.loads(j)
--- json_tricks.loads(j, ignore_comments=False)
--- json_tricks.loads(j)
Traceback (most recent call last):
  File "main.py", line 25, in <module>
    json_tricks.loads(j)
  File "/home/runner/.local/lib/python3.6/site-packages/json_tricks/nonp.py", line 213, in loads
    return json_loads(string, object_pairs_hook=hook, **jsonkwargs)
  File "/usr/local/lib/python3.6/json/__init__.py", line 367, in loads
    return cls(**kw).decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1 (char 0)

@mverleg
Copy link
Owner

mverleg commented May 4, 2019

Thanks for the report and the analysis!

The comment parsing currently happens with regular expressions, which aren't quite powerful enough to understand all of json. But perhaps they could be expanded to work for these cases.

I'm not sure this can be solved completely generally without making a complete json parser that also understands comments. That would also help with some other issues regarding primitives, but it's a big step.

As you've found you can work around it with ignore_comments=False if you're not using comments.

@brupelo
Copy link
Author

brupelo commented May 4, 2019

@mverleg I've posted this question in #python@freenode, that's why @altendky very gently has helped here, that said, main reason why I'd like to solve this issue is because in some of the items I'm serializing/deserializing (items used in pyqt stuff) I'm storing python code as strings, please see below a little example about it:

# ------ a.py ---------
from json_tricks import dumps, loads
import b

for k, v in [(v, getattr(b, v)()) for v in dir(b) if v.startswith("Item")]:
    print('-' * 80)
    try:
        loads(dumps(v))
        print(f"{k} encoded/decoded succesfully")
    except Exception as e:
        print(f"{k} encoded/decoded failed")
        import traceback
        traceback.print_exc()

# ---------- b.py -----------
class ItemInvalid0():

    def __init__(self):
        self.content = 'a.b("\\\\", "/")\nc = \'"{}"\'.d(e)\nf.g("#")\n'

class ItemValid0():

    def __init__(self):
        self.content = 'def foo():\n    print(\"hello world\")'

So for my particular case I'm not really sure how I'm gonna serialize/deserialize these type of objects :/

@mverleg
Copy link
Owner

mverleg commented May 4, 2019

@brupelo Would it help as a workaround to change

loads(dumps(v))

to

loads(dumps(v), ignore_comments=False)

You won't have any comments in the json if you're dumping the data yourself, so it should be okay to ignore them, and would even be a bit faster.

It's just a workaround, but I'm not sure if/when I can fix this issue.

@brupelo
Copy link
Author

brupelo commented May 4, 2019

@mverleg Great, it seems that workaround works... :O/

Just to be extra careful, right now I've decided before serializing/dumping the state of my pyqt software to disk I'll check if loads won't crash before saving anything... Why? Well, In the first place I assumed if something was dumpable would also be loadable. And when I tried to restore a session of my pyqt software and found this bug I've got quite annoying as it was a project I wasn't able to recover anymore... of course, this was my fault in the first place for not being extra cautious and for not reading more carefully the docs ;)

Anyway, I'll leave the issue open but so far the workaround is good for me... About your previous comment:

I'm not sure this can be solved completely generally without making a complete json parser that also understands comments. That would also help with some other issues regarding primitives, but it's a big step.

Some solution that come to my mind... https://github.com/dmeranda/demjson, I've used this in some projects and it handles a more general format of json (like SublimeText)... pretty handy library, hope that helps

@mverleg
Copy link
Owner

mverleg commented May 4, 2019

Yeah better leave it open, it should ideally still work even when ignoring comments.

In general there's no guarantee that things are encodeable and decodeable, or that those return it to the same type. For example, a numpy float gets encoded to just a number, and then there's no way to know it was a numpy type, so it gets decoded to a float. Json also doesn't view lists and tuples as different.

But where possible the aim is for the build-in json-tricks types to be exactly the same after encoding and decoding. Unless primitives=True is used, in which case they'll be stored as simple as possible (often losing type information).

The 'primitives' like lists, maps, numbers, texts and booleans encodable and decodable. Most of the extra types in this library are too.

@mverleg
Copy link
Owner

mverleg commented May 4, 2019

@brupelo By the way, after you add the ignore_comments to the loading code, you should be able to load your broken project. The error is in the loading code, so the data that your program stored should be fine.

@mverleg mverleg added the wontfix label Nov 2, 2022
@mverleg
Copy link
Owner

mverleg commented Nov 2, 2022

Although this is technically a valid problem, I'm going to close it because

  • It's been 3+ years without further interest
  • There is a workaround that's okay for many cases. The workaround will be default behaviour in the next breaking version (4.0), see Make ignore_comments not default (next major release) #74.
  • The only complete fix is to write a parser, which is a lot of work, and either slow (if in pure Python) or architecture-dependent (if in native code)

If someone has a good solution and is willing to do most of the work, feel free to re-open.

@mverleg mverleg closed this as completed Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants