-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_type_codec() appears to assume a particular set_type_codec for the "char" datatype #617
Comments
Thank you @zzzeek for the very comprehensive description here! Putting on my asyncpg hat, I'm thinking now perhaps we should run the asyncpg built-in introspection queries with fixed codecs only, so that user-defined codecs won't affect the introspections. People could not only change how |
|
So, the set_type_codec() method uses the internal names. It would help for clarity if these names were noted in the documentation, right in this chart, as well as a distinct row for the internal "char" datatype, which is also exposed by PostgreSQL in its information schema catalogs. This is one of those "everything is fine but I spent a lot of time pulling my hair out" kind of issues, I'm sure you know what I'm talking about. here's a demo to illustrate the name "bpchar" as part of asyncpg's public API and that it is different from "char" (again credit to @fantix for showing me how to actually create a table with the internal "char" type): import asyncio
import asyncpg
async def main():
conn = await asyncpg.connect(
user="scott", password="tiger", database="test"
)
await conn.execute(
"CREATE TABLE IF NOT EXISTS test_char "
'(sql_val char, internal_val "char")'
)
await conn.execute("DELETE FROM test_char")
await conn.execute("INSERT INTO test_char VALUES ($1, $2)", "x", b"x")
await conn.set_type_codec(
"bpchar",
schema="pg_catalog",
encoder=lambda value: value,
decoder=lambda value: "im a bp char: %s" % (value,),
format="text",
)
await conn.set_type_codec(
"char",
schema="pg_catalog",
encoder=lambda value: value,
decoder=lambda value: "im an internal char: %s" % (value,),
format="text",
)
rows = await conn.fetch("SELECT sql_val, internal_val FROM test_char")
print(rows)
asyncio.run(main()) output:
|
Oh yeah, I totally forgot about the funny single-character This is definitely a bug. I'm not sure it makes sense to include internal type names in the type chart given that Postgres itself doesn't document those names. Perhaps |
Currently, `Connection.set_type_codec()` only accepts type names as they appear in `pg_catalog.pg_type` and would refuse to handle a standard SQL spelling of a type like `character varying`. This is an oversight, as the internal type names aren't really supposed to be treated as public Postgres API. Additionally, for historical reasons, Postgres has a single-byte `"char"` type, which is distinct from both `varchar` and SQL `char`, which may lead to massive confusion if a user sets up a custom codec on it expecting to handle the `char(n)` type instead. Issue: #617.
Currently, `Connection.set_type_codec()` only accepts type names as they appear in `pg_catalog.pg_type` and would refuse to handle a standard SQL spelling of a type like `character varying`. This is an oversight, as the internal type names aren't really supposed to be treated as public Postgres API. Additionally, for historical reasons, Postgres has a single-byte `"char"` type, which is distinct from both `varchar` and SQL `char`, which may lead to massive confusion if a user sets up a custom codec on it expecting to handle the `char(n)` type instead. Issue: #617.
Currently, `Connection.set_type_codec()` only accepts type names as they appear in `pg_catalog.pg_type` and would refuse to handle a standard SQL spelling of a type like `character varying`. This is an oversight, as the internal type names aren't really supposed to be treated as public Postgres API. Additionally, for historical reasons, Postgres has a single-byte `"char"` type, which is distinct from both `varchar` and SQL `char`, which may lead to massive confusion if a user sets up a custom codec on it expecting to handle the `char(n)` type instead. Issue: #617
asyncpg version: 0.21.0
PostgreSQL version: 11.8 fedora
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: N/A
Python version: 3.8.3
Platform: Fedora 31
Do you use pgbouncer?: no
Did you install asyncpg with pip?: yes
If you built asyncpg locally, which version of Cython did you use?: N/A
Can the issue be reproduced under both asyncio and
uvloop?: N/A
It appears that the implementation for set_type_codec() relies upon the results of the query TYPE_BY_NAME which itself is assumed to return a bytes value from the PostgreSQL "char" datatype.
I was previously unaware that PostgreSQL actually has two "char" variants bpchar and char, and in the documentation at https://magicstack.github.io/asyncpg/current/usage.html#type-conversion this is talking about the "bpchar" datatype. that's fine. However, when trying to normalize asyncpg's behavior against that of the psycopg2 and pg8000 drivers, both of which will give you back string for both of these types (we have determined this is also a bug in those drivers, as they fail to return arbirary bytes for such a datatype and likely was missed when they migrated to Python 3), I tried setting up a type_codec for "char" that would allow it to return strings:
that works, but when you do that, you no longer can use the
set_type_codec
method for anything else, because the behavior of the type is redefined outside of the assumptions made by is_scalar_type.The example program below illustrates this failure when attempting to subsequently set up a codec for the JSONB datatype:
output:
Since the "char" datatype is kind of an obscure construct, it's likely reasonable that asyncpg disallow setting up a type codec for this particular type, or perhaps it could emit a warning, but at the moment there doesn't seem to be documentation suggesting there are limitations on what kinds of type codecs can be constructed.
none of this is blocking us, just something we came across and I hope it's helpful to the asyncpg project. cc @fantix
The text was updated successfully, but these errors were encountered: