[Postgresql] Hangul(Korean alphabet) encoding problem #66

bjtj · 2025-02-09T02:56:22Z

! This is my first time submitting a GitHub issue. Please let me know if anything is inappropriate, and I appreciate your understanding.

Problem

Description:

When I use Babashka SQL pods with PostgreSQL, Hangul (Korean alphabet) encoding gets corrupted.
Inserting Hangul data results in corruption.

Environment:

Windows 10
babashka v1.3.191
PostgreSQL 17.2 on x86_64-windows, compiled by msvc-19.42.34435, 64-bit

deps.edn:

{:pods :pods {org.babashka/postgresql {:version "0.1.0"}}}

main.clj:

(require '[pod.babashka.postgresql :as pg])

(def db {:dbtype   "postgresql"
         :host     "localhost"
         :dbname   "mytest"
         :user     "test"
         :password "test"
         :port     5432})

(pg/execute! db ["SELECT '안녕?'"])

result:

[{:?column? "ì•ˆë…•?"}]

expected:

[{:?column? "안녕?"}]

plus:

After inserting Hangul (Korean) data and querying it in psql, the encoding appears to be corrupted.

Server & Client encoding configuration

(pg/execute! db ["show server_encoding;"])

[{:server_encoding "UTF8"}]

(pg/execute! db ["show client_encoding;"])

[{:client_encoding "UTF8"}]

Comparative experiment

This python code works as expected

import psycopg

def main():
    with psycopg.connect('dbname=mytest user=test password=test') as conn:
        with conn.cursor() as cur:
            print(cur.execute("SELECT '안녕?'"))

Clojure + next.jdbc works as expected

I tried using Babashka with next.jdbc, but next.jdbc doesn't seem to work in Babashka. Is that correct?

Thanks.

The text was updated successfully, but these errors were encountered:

borkdude · 2025-02-09T19:04:39Z

I've tried to reproduce this on macOS and I get the following:

(require '[babashka.pods :as p])

(p/load-pod 'org.babashka/postgresql {:version "0.1.3"})

(require '[pod.babashka.postgresql :as pg])

(def db {:dbtype   "postgresql"
         :host     "localhost"
         :dbname   "postgres"
         :user     "test"
         :password "test"
         :port     5432})

(prn (-> (pg/execute! db ["SELECT '안녕?' as foo"])
         first
         :foo))

The output is "안녕?" which looks correct?
Note that I've used the pod version 0.1.3. Can you try this on WSL2 perhaps on linux to see if that works for you? Then we could narrow it down to either a bb, pod or OS issue.

bjtj · 2025-02-09T22:46:34Z

I tried it on WSL2 as you suggested. When I tested connecting to the PostgreSQL server on the host machine from WSL2, it worked fine. 👀

It can be considered an issue that occurs only on Windows.

Thanks,

borkdude · 2025-02-10T10:56:22Z

Can you try this:

(require '[babashka.pods :as p])

(p/load-pod 'org.babashka/postgresql {:version "0.1.3"})

(require '[pod.babashka.postgresql :as pg])

(def db {:dbtype   "postgresql"
         :host     "localhost"
         :dbname   "postgres"
         :user     "test"
         :password "test"
         :port     5432})

(spit "foo.txt" 
  (-> (pg/execute! db ["SELECT '안녕?' as foo"])
         first
         :foo))

and then look with a text editor (e.g. VSCode) in the file foo.txt to see if the characters are fine in there?

bjtj · 2025-02-10T11:42:08Z

I tested it right away with the code you sent, but the result was the same.

foo.txt:

ì•ˆë…•?

screenshot:

Thanks,

borkdude · 2025-02-10T11:50:49Z

Can you test just writing the string directly to the file without the database to see if the same problem occurs? Perhaps it's not even a sql pod issue. Thanks

bjtj · 2025-02-10T12:15:53Z

Of course. I tested it right away, and there was no problem when saving the file directly.

For reference, there was no issue with SQLite org.babashka/go-sqlite3 {:version "0.1.0"}.

It seems to be a problem that is not easy to solve.

Thanks,

borkdude · 2025-02-10T12:45:47Z

Have you tried pod version 0.1.3 on Windows?

bjtj · 2025-02-10T12:53:17Z

Yes, all attempts were made on Windows. Pod version 0.1.3 was also tested on Windows. So far, the issue has occurred on Windows, but there were no problems in WSL2.

borkdude · 2025-02-10T12:57:20Z

Could you maybe also test hsqldb on Windows with pod version 0.1.3? If that works then we know that it is a specific problem with the postgres pod on Windows

bjtj · 2025-02-10T13:12:12Z

Yes, no problem. I will test it and let you know.

bjtj · 2025-02-10T13:22:02Z

Oh, it seems the same issue occurs in hsqldb as well.

bb.edn:

{:pods {org.babashka/hsqldb {:version "0.1.3"}}}

borkdude · 2025-02-10T13:22:38Z

Interesting

borkdude · 2025-02-10T13:29:21Z

I think it makes sense to upgrade all the builds to use Oracle GraalVM latest (23), add a test for this and then see what happens. It could be a matter of setting -J-Dfile.encoding=UTF-8 during the build like described here:

oracle/graal#2492

borkdude · 2025-02-12T09:23:34Z

One more idea, it could be that the problem is with string encoding via the single arg constructor: (String. v)

Can you try in your version of bb the following:

(String. (.getBytes "안녕?"))
;; vs
(String. (.getBytes "안녕?") java.nio.charset.StandardCharsets/UTF_8)

to see if you see a different result?

bjtj · 2025-02-12T11:20:03Z

First, I tested it with the code you provided.

I also saved it as a file, and both the file size and the data were identical.

I think the values are changing in the process of transferring the data to the database.

(pg/execute! db ["CREATE TABLE IF NOT EXISTS foo (text VARCHAR(256))"])
(pg/execute! db ["INSERT INTO foo (text) VALUES (?)" (String. (.getBytes "안녕?"))])
(pg/execute! db ["INSERT INTO foo (text) VALUES (?)" (String. (.getBytes "안녕?")
                                                              java.nio.charset.StandardCharsets/UTF_8)])

I'm not sure if I fully understood your code, but it seems that you're simply passing the values to next.jdbc, so I'll check the next.jdbc side.

babashka-sql-pods/src/pod/babashka/sql.clj

Lines 71 to 78 in 5eb5203

    
           (defn execute! 
        
             ([db-spec sql-params] 
        
              (execute! db-spec sql-params nil)) 
        
             ([db-spec sql-params opts] 
        
              ;; (.println System/err (str sql-params)) 
        
              (let [conn (->connectable db-spec) 
        
                    res (jdbc/execute! conn sql-params opts)] 
        
                res)))

Thanks,

borkdude · 2025-02-12T12:44:37Z

I didn't mean that you would insert the result into the database. The original output looks similar to this:

(String. (.getBytes "안녕?") (java.nio.charset.Charset/forName "CP1252"))
"ì•ˆë…•?"

so I think it's a encoding mismatch somewhere. I read that from JDK18 onwards the default encoding is UTF-8 unless otherwise specified so this might fix it. Currently the SQL pods are still built using JDK11, so upgrading should help.

I'll try to reproduce this problem on my own Windows machine. Thanks for your patience.

bjtj changed the title ~~Hangul(Korean alphabet) encoding problem~~ [Postgresql] Hangul(Korean alphabet) encoding problem Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Postgresql] Hangul(Korean alphabet) encoding problem #66

[Postgresql] Hangul(Korean alphabet) encoding problem #66

bjtj commented Feb 9, 2025

borkdude commented Feb 9, 2025

bjtj commented Feb 9, 2025

borkdude commented Feb 10, 2025 •

edited

Loading

bjtj commented Feb 10, 2025 •

edited

Loading

borkdude commented Feb 10, 2025

bjtj commented Feb 10, 2025

borkdude commented Feb 10, 2025

bjtj commented Feb 10, 2025

borkdude commented Feb 10, 2025

bjtj commented Feb 10, 2025

bjtj commented Feb 10, 2025

borkdude commented Feb 10, 2025

borkdude commented Feb 10, 2025

borkdude commented Feb 12, 2025

bjtj commented Feb 12, 2025

borkdude commented Feb 12, 2025

[Postgresql] Hangul(Korean alphabet) encoding problem #66

[Postgresql] Hangul(Korean alphabet) encoding problem #66

Comments

bjtj commented Feb 9, 2025

Problem

Server & Client encoding configuration

Comparative experiment

borkdude commented Feb 9, 2025

bjtj commented Feb 9, 2025

borkdude commented Feb 10, 2025 • edited Loading

bjtj commented Feb 10, 2025 • edited Loading

borkdude commented Feb 10, 2025

bjtj commented Feb 10, 2025

borkdude commented Feb 10, 2025

bjtj commented Feb 10, 2025

borkdude commented Feb 10, 2025

bjtj commented Feb 10, 2025

bjtj commented Feb 10, 2025

borkdude commented Feb 10, 2025

borkdude commented Feb 10, 2025

borkdude commented Feb 12, 2025

bjtj commented Feb 12, 2025

borkdude commented Feb 12, 2025

borkdude commented Feb 10, 2025 •

edited

Loading

bjtj commented Feb 10, 2025 •

edited

Loading