-
-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: json and jsonb parsing in postgres-js #1785
Conversation
…son and jsonb columns
Thank you @Angelelz & @AndriiSherman for your thorough work here! 😃 🙏 One question I've been wondering: when this is fixed, will there be any easy way to migrate all existing data/columns we persisted through Drizzle that was incorrectly persisted as JSON strings instead of JSON objects directly? I'm not aware of any "JSON parse" type function in Postgres, so I'm not sure this'll be possible with a simple SQL migration. So this might require a Node/TypeScript migration (which drizzle-kit doesn't support, right?) to read existing data with the old parser (parsing JSON strings to JSON objects) and write it back with the new parser. I can easily write such a migration myself, but my main question is: once this fix is merged, will there be any way to read data through the old parser with the new Drizzle code? Perhaps I should update my Drizzle schema to change the type of the existing column to Lemme know if my q is unclear. Thank you very much! |
That's a really good question. Before merging this one we will think this through. I guess we will write a small guide on how you can migrate this data to be a proper JSON |
That'd be great, thank you. I'm remembering that Drizzle doesn't actually know or ask about the JSON schema/type of JSON columns. The In our case, every JSON(B) data type we have is an array or object at the top-level, not a string. So we can easily differentiate between "incorrectly" persisted JSON today ( Given this, we can achieve an in-place (not needing to add a new column) and zero-downtime migration with the following approach, provided here if helpful for y'all or anyone else:
^ Those are just my theoretical thoughts. Feedback welcome if you think I'm missing anything. Thanks again! |
@aseemk If you can afford a short downtime/maintenance on the app and have a migration script that reads all tables with jsonb columns, read the jsonb column and fix it, then you could run it as a one time operation on all environments directly after the regular drizzle-kit migrations. Then in your application code nothing has to change, as you'd update the dependencies and deploy directly after. |
Yeah, you're right that brief downtime/maintenance may be reasonable. I also realized that I think a migration might be entirely doable in SQL! There's no It appears to work on my own data that has both top-level objects and top-level arrays as JSON values (both containing nested objects and arrays). E.g. for a single JSONB column named select id
, tags::text as bad_jsonb_str -- should see wrapper quotes and backslash escapes
, jsonb_typeof(tags) as bad_jsonb_type -- should be "string"
, left(tags::text, 2) as starting_chars -- should be `"{` or `"[` for top-level objects or arrays respectively
, right(tags::text, 2) as ending_chars -- should be `}"` or `]"` for top-level objects or arrays respectively
-- Remove the starting and ending quotes:
, substring(tags::text from 2 for length(tags::text) - 2) as trimmed
-- Remove any backslash escape characters:
, regexp_replace(tags::text, '\\(.)', '\1', 'g') as unescaped
-- Combine the two above!
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g') as good_jsonb_str
, regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb as good_jsonb
, jsonb_typeof(
regexp_replace(substring(tags::text from 2 for length(tags::text) - 2), '\\(.)', '\1', 'g')::jsonb
) as good_jsonb_type -- should now be object or array!
from users
limit 10 Can make this into a migration by changing that into an Hope that helps others! |
This doesn't affect pg users right? |
If you can insert and retrieve your json data without workarounds or issues, this doesn't affect you. |
To clarify: we use We're able to insert and retrieve our JSON data through Drizzle without any issues today. But we're unable to "reach into" that JSON data in SQL, e.g. whether in migrations (backfills) or for atomic, field-wise updates. This has been a source of inconvenience (but not downtime or anything like that) several times for us, so I'm excited for this fix! Thanks again. =) |
Thanks for the heads-up, I've never tried accessing that data without drizzle (yet) @Angelelz this PR only address postgres.js. Is there another bug that affects pg then ? |
Can you clarify here? What do you mean by can't reach into the json data with SQL? What sql statement doesn't work for you on json? |
I dont think Pg has an issue. All the test are passing |
I see my confusion here:
Sorry for the mixup! So it sounds like yes, this bug only affects us because we're using Postgres.js ( Thanks for clarifying! |
Any update on this? Is there a workaround? |
Is this fix waiting on a migration path for existing
If this isn't likely to land soon, would it be possible to identify a recommended workaround? Thanks for the help! |
I didn't really see a big difference between Postgres.js and node-postgres ( The only downside is that you'll have to write a data migration that translates the old escaped JSON values into the new ones yourself. |
I tried switching to node-postgres and had major performance issues. Not sure why. Really hope this issue gets resolved soon |
I also tried passing the JSON with the sql`${[]}::jsonb` // does not work Produces the following SQL with a syntax error... insert into
"my_table" ("my_column")
values
(()::jsonb) ...and empty params array // Params
[] sql`${JSON.stringify([])}::jsonb` // also does not work Produces the correct SQL... insert into
"my_table" ("my_column")
values
($1::jsonb) ...but passes the JSON array as a string which gets turned into a JSON string // Params
[
"[]", // gets turned into a JSON string i.e. "\"[]\""
] Anyone knows a workaround for that? Edit: I found a solution. sql`${new Param([])}::jsonb` // wrap JSON array in new Param Credit: #724 (comment) |
I'm looking into migrating from Create a custom jsonB type instead of using the drizzle provided one. |
it works I was using it while back can check #666 (comment) |
diff --git a/node_modules/drizzle-orm/postgres-js/driver.js b/node_modules/drizzle-orm/postgres-js/driver.js
index 7e48e8c..219e0a0 100644
--- a/node_modules/drizzle-orm/postgres-js/driver.js
+++ b/node_modules/drizzle-orm/postgres-js/driver.js
@@ -12,6 +12,8 @@ function drizzle(client, config = {}) {
client.options.parsers[type] = transparentParser;
client.options.serializers[type] = transparentParser;
}
+ client.options.serializers['114'] = transparentParser;
+ client.options.serializers['3802'] = transparentParser;
const dialect = new PgDialect();
let logger;
if (config.logger === true) { |
Bumping this MR. The custom type workaround works like a charm, but this issue needs to be resolved. |
As for migration guide, wouldn't this query fix the issue? UPDATE <table_name>
SET <col_1_name> = (<col_1_name> #>> '{}')::jsonb,
<col_2_name> = (<col_2_name> #>> '{}')::jsonb
...
<col_n_name> = (<col_n_name> #>> '{}')::jsonb
; |
Please merge the patch if possible |
@AndriiSherman Is there any update? It's been quite some time |
I've been playing with an Here is what is working for our project: UPDATE {tableName} SET {columnName} = ({columnName}->>0)::jsonb
WHERE {columnName}::text LIKE '"%"'; Running that statement will correct bad columns, but leave good columns untouched, so it is safe to run multiple times. CaveatsThis works when I'm not sure how much real world application data will be using a JSONB column to store a raw JSON primitive type (that kinda defeats the purpose of JSONB), but I wanted to call attention to this limitation. |
Any updates on this PR? |
I found a workaround by executing raw SQL that includes the JSON or JSONB data, pre-stringified. Here's an example: const json_data = JSON.stringify(data);
db.execute(sql.raw(`INSERT INTO table ("json_data") VALUES ($$${json_data}$$::jsonb)`)); I'm using dollar-quoted constants instead of single or double quotes because the For more information on dollar-quoted constants, refer to the PostgreSQL documentation: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING Because the code above looks a bit messy, here is a cleaner example: const json_data = JSON.stringify(data);
db.execute(
sql.raw(
'INSERT INTO table ("json_data") VALUES ($$' + json_data + "$$::jsonb)"
)
); |
Fixing conflicts and merging this into the beta tag. I will prepare a guide on how to fix an existing database and will reuse and mention some of the comments from this PR |
Merged. It will be available in |
Finally, thank you!! Which specific version does this one correspond to? |
FYI: This appears to break previous workarounds like Now I can simply write |
Second (and better) attempt to close #724 and close #1511
This PR depends on #1659 which has already been merged to beta.
After figuring out how to bypass parsers and serializers in postgres.js driver, this technique can be applied to the issue with json and jsonb.
There are additional tests in #1560. @AndriiSherman Please merge that one after merging this.