Skip to content

Error when trying to set JSONB as a custom type codec #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jcugat opened this issue May 12, 2017 · 12 comments
Closed

Error when trying to set JSONB as a custom type codec #140

jcugat opened this issue May 12, 2017 · 12 comments

Comments

@jcugat
Copy link

jcugat commented May 12, 2017

  • asyncpg version: 0.11.0
  • PostgreSQL version: 9.6.2
  • Python version: 3.6.1
  • Platform: macOS Sierra
  • Do you use pgbouncer?: No
  • Did you install asyncpg with pip?: Yes
  • If you built asyncpg locally, which version of Cython did you use?: Cython==0.25.2
  • Can the issue be reproduced under both asyncio and
    uvloop?
    :

I followed the steps to have a custom type codec for JSON as described here https://magicstack.github.io/asyncpg/current/usage.html#custom-type-conversions

It works with tables created with JSON type, but fails for JSONB type.

I created a new test locally (similar to test_custom_codec_override but using jsonb instead of json type):

    async def test_custom_codec_override_jsonb(self):
        """Test overriding core codecs."""
        import json

        conn = await self.cluster.connect(database='postgres', loop=self.loop)
        try:
            def _encoder(value):
                return json.dumps(value).encode('utf-8')

            def _decoder(value):
                return json.loads(value.decode('utf-8'))

            await conn.set_type_codec(
                'jsonb', encoder=_encoder, decoder=_decoder,
                schema='pg_catalog', binary=True
            )

            data = {'foo': 'bar', 'spam': 1}
            res = await conn.fetchval('SELECT $1::jsonb', data)
            self.assertEqual(data, res)

        finally:
            await conn.close()

This is the error returned:

======================================================================
ERROR: test_custom_codec_override_jsonb (tests.test_codecs.TestCodecs)
Test overriding core codecs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/_testbase.py", line 67, in wrapper
    self.loop.run_until_complete(__meth__(self, *args, **kwargs))
  File "/Users/josepcugat/.pyenv/versions/3.6.1/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/Users/josepcugat/workspace/asyncpg/tests/test_codecs.py", line 951, in test_custom_codec_override_jsonb
    res = await conn.fetchval('SELECT $1::jsonb', data)
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/connection.py", line 331, in fetchval
    data = await self._execute(query, args, 1, timeout)
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/connection.py", line 997, in _execute
    return await self._do_execute(query, executor, timeout)
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/connection.py", line 1018, in _do_execute
    result = await executor(stmt, None)
  File "asyncpg/protocol/protocol.pyx", line 189, in bind_execute (asyncpg/protocol/protocol.c:60713)
    return await self._new_waiter(timeout)
asyncpg.exceptions.InternalServerError: unsupported jsonb version number 123

----------------------------------------------------------------------
@elprans
Copy link
Member

elprans commented May 12, 2017

You need to declare your custom codec as binary=False. jsonb has a different binary layout.

@jcugat
Copy link
Author

jcugat commented May 15, 2017

Hey @elprans, thanks for your help. Unfortunately I tried with binary=False but this error appears instead:

======================================================================
ERROR: test_custom_codec_override_jsonb (tests.test_codecs.TestCodecs)
Test overriding core codecs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/_testbase.py", line 67, in wrapper
    self.loop.run_until_complete(__meth__(self, *args, **kwargs))
  File "/Users/josepcugat/.pyenv/versions/3.6.1/lib/python3.6/asyncio/base_events.py", line 466, in run_until_complete
    return future.result()
  File "/Users/josepcugat/workspace/asyncpg/tests/test_codecs.py", line 951, in test_custom_codec_override_jsonb
    res = await conn.fetchval('SELECT $1::jsonb', data)
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/connection.py", line 331, in fetchval
    data = await self._execute(query, args, 1, timeout)
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/connection.py", line 997, in _execute
    return await self._do_execute(query, executor, timeout)
  File "/Users/josepcugat/workspace/asyncpg/asyncpg/connection.py", line 1018, in _do_execute
    result = await executor(stmt, None)
  File "asyncpg/protocol/protocol.pyx", line 181, in bind_execute (asyncpg/protocol/protocol.c:60629)
    state._encode_bind_msg(args),
  File "asyncpg/protocol/prepared_stmt.pyx", line 125, in asyncpg.protocol.protocol.PreparedStatementState._encode_bind_msg (asyncpg/protocol/protocol.c:56512)
    codec.encode(self.settings, writer, arg)
  File "asyncpg/protocol/codecs/base.pyx", line 150, in asyncpg.protocol.protocol.Codec.encode (asyncpg/protocol/protocol.c:14433)
    return self.encoder(self, settings, buf, obj)
  File "asyncpg/protocol/codecs/base.pyx", line 95, in asyncpg.protocol.protocol.Codec.encode_scalar (asyncpg/protocol/protocol.c:13818)
    self.c_encoder(settings, buf, obj)
  File "asyncpg/protocol/codecs/json.pyx", line 13, in asyncpg.protocol.protocol.jsonb_encode (asyncpg/protocol/protocol.c:29782)
    as_pg_string_and_size(settings, obj, &str, &size)
  File "asyncpg/protocol/codecs/text.pyx", line 12, in asyncpg.protocol.protocol.as_pg_string_and_size (asyncpg/protocol/protocol.c:22545)
    raise TypeError('expected str, got {}'.format(type(obj).__name__))
TypeError: expected str, got dict

Not sure if I'm doing something wrong.

@elprans
Copy link
Member

elprans commented May 15, 2017

Looks like asyncpg prefers builtin binary-format codec over custom text-format. I'll look into fixing this. In the meantime, you can go back to binary=True version, only modify your code like this:

        import json

        conn = await self.cluster.connect(database='postgres', loop=self.loop)
        try:
            def _encoder(value):
                return b'\x01' + json.dumps(value).encode('utf-8')

            def _decoder(value):
                return json.loads(value[1:].decode('utf-8'))

            await conn.set_type_codec(
                'jsonb', encoder=_encoder, decoder=_decoder,
                schema='pg_catalog', binary=True
            )

            data = {'foo': 'bar', 'spam': 1}
            res = await conn.fetchval('SELECT $1::jsonb', data)
            self.assertEqual(data, res)

        finally:
            await conn.close()

@jcugat
Copy link
Author

jcugat commented May 16, 2017

Is this last version backwards-compatible with already existing data in the database? I'm not sure if this magic byte is consumed somewhere in the asyncpg client or actually stored in the database.

@elprans
Copy link
Member

elprans commented May 16, 2017

The magic byte is part of the format for JSONB type in the database. It is physically stored in the database and is consumed by asyncpg when reading and writing to jsonb columns

@vitaly-burovoy
Copy link
Contributor

b'\x01' is a version of a binary protocol[1], it is saved nowhere, just checked at the very beginning of a parse process.
They expect this value can be changed in the future when they are ready to send jsonb data in a really binary format (now the binary format differs from the text format only by "protocol version number"). Currently it is just wasting of a time because of serializing/deserializing from/to internal representation.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/jsonb.c;h=164f57ef770e57cb084b232a146f1203b58fd752;hb=HEAD#l115

@elprans
Copy link
Member

elprans commented May 16, 2017

@vitaly-burovoy That's correct, but the OP's motivation is to enable structured encoding/decoding. asyncpg decodes jsonb as text.

elprans added a commit that referenced this issue Jul 4, 2017
When a custom codec is set for a type, it should be called regardless of
whether its format matches the preferred format requested by the
pipeline.

Fixes: #140
Fixes: #148
elprans added a commit that referenced this issue Jul 4, 2017
When a custom codec is set for a type, it should be called regardless of
whether its format matches the preferred format requested by the
pipeline.

Fixes: #140
Fixes: #148
@vpzed
Copy link

vpzed commented Oct 22, 2017

For anyone else trying this with Postgresql 10 and asyncpg 0.13.0, the suggested workaround code suggested above in this Issue must be updated from:

await conn.set_type_codec('jsonb', encoder=_encoder, decoder=_decoder, schema='pg_catalog', binary=True)

to

await conn.set_type_codec('jsonb', encoder=_encoder, decoder=_decoder, schema='pg_catalog', format='binary')

Looks like the documentation is behind, but I found reference to the "format" keyword in a commit diff so I tried that and it worked.

@elprans
Copy link
Member

elprans commented Oct 23, 2017

Unless you really need to do binary, stick to the default "text" mode. The bug that necessitated the workaround has been fixed in #164 a while ago. I updated the docs to reflect this.

@vpzed
Copy link

vpzed commented Oct 23, 2017

As someone who is new to PostgreSQL, trying different drivers, and testing PostgreSQL against my current MongoDB back-end - do you have any recommended best practices for using asyncpg for a mixed RDBMS and Document Store setup?

The application stores REST API response data in JSONB columns with some key fields pulled out into simple data type columns for more direct SQL activities. Several tables with smaller JSON objects will have 50+ million rows. I've not been able to find many real-world examples using asyncpg and I'd like to give the driver and database a good test, but without practical examples I worry that the testing won't be fair. If you have any recommendations or can point me to projects using asyncpg in a recommended fashion it would be very much appreciated.

Thank you for your consideration, and for your efforts supporting the Python community.

@elprans
Copy link
Member

elprans commented Oct 23, 2017

I recommend reading PostgreSQL documentation: https://www.postgresql.org/docs/current/static/datatype-json.html, especially the "Designing JSON documents effectively" and "jsonb Indexing" bits.

There isn't anything special you need to do with asyncpg to work with json types. Beyond optionally setting a codec that encodes/decodes JSON data automatically that is.

@dyllan-to-you
Copy link

dyllan-to-you commented Jun 12, 2021

It seems like using this setting format=binary workaround is required for a copy_records_to_table call?

but the b"\x01" part is only required for jsonb codecs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants