Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL VARCHAR is mapped to Vec<u8> instead of String #1690

Closed
marcustut opened this issue Feb 8, 2022 · 19 comments · Fixed by #1856
Closed

MySQL VARCHAR is mapped to Vec<u8> instead of String #1690

marcustut opened this issue Feb 8, 2022 · 19 comments · Fixed by #1856
Labels
bug:db Involves a bug in the database server db:mysql Related to MySQL macros

Comments

@marcustut
Copy link
Contributor

marcustut commented Feb 8, 2022

The following is my database schema, I expect the VARCHAR and TEXT types are both mapped to Rust's String. However, I got the error in the next following log.

fyp/slide-service> describe `users`;
+---------------+--------------+------+-----+-------------------+-----------------------------------------------+
| Field         | Type         | Null | Key | Default           | Extra                                         |
+---------------+--------------+------+-----+-------------------+-----------------------------------------------+
| id            | varchar(255) | NO   | PRI | NULL              |                                               |
| username      | varchar(50)  | NO   | UNI | NULL              |                                               |
| email         | varchar(255) | NO   | UNI | NULL              |                                               |
| full_name     | varchar(60)  | YES  |     | NULL              |                                               |
| password_hash | varchar(255) | NO   |     | NULL              |                                               |
| avatar_url    | text         | YES  |     | NULL              |                                               |
| bio           | text         | YES  |     | NULL              |                                               |
| created_at    | timestamp    | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED                             |
| updated_at    | timestamp    | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
+---------------+--------------+------+-----+-------------------+-----------------------------------------------+

This is the compile-time error

error[E0308]: mismatched types
  --> cmd/auth-service/src/db/user.rs:23:20
   |
23 |         let user = sqlx::query_as!(User, "SELECT * FROM users WHERE username = ?", username)
   |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected struct `std::string::String`, found struct `Vec`
   |
   = note: expected struct `std::string::String`
              found struct `Vec<u8>`
   = note: this error originates in the macro `$crate::sqlx_macros::expand_query` (in Nightly builds, run with -Z macro-backtrace for more info)

This following is the Rust struct.

#[derive(Debug, Deserialize, Serialize, sqlx::FromRow, SimpleObject)]
pub struct User {
    pub id: String,
    pub username: String,
    pub email: String,
    #[serde(skip)]
    #[graphql(skip)]
    pub password_hash: String,
    pub full_name: Option<String>,
    pub bio: Option<String>,
    pub avatar_url: Option<String>,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
}

One more note, I'm not sure if this is related, I'm using PlanetScale's MySQL which uses Vitess under the hood.

@abonander
Copy link
Collaborator

What is the character set on those VARCHAR columns?

In the MySQL protocol, character strings and byte strings can use the same type IDs so we distinguish them based on character set. We only recognize a handful of character sets that are known to be compatible with UTF-8 so if the columns use an unrecognized character set then the query!() macro would fall back to treating the columns as binary strings.

@marcustut
Copy link
Contributor Author

CREATE TABLE `users` (
  `id` varchar(255) COLLATE utf8mb4_bin NOT NULL,
  `username` varchar(50) COLLATE utf8mb4_bin NOT NULL,
  `email` varchar(255) COLLATE utf8mb4_bin NOT NULL,
  `full_name` varchar(60) COLLATE utf8mb4_bin DEFAULT NULL,
  `password_hash` varchar(255) COLLATE utf8mb4_bin NOT NULL,
  `avatar_url` text COLLATE utf8mb4_bin,
  `bio` text COLLATE utf8mb4_bin,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `username` (`username`),
  UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

To be specific, this is the schema generated from PlanetScale. It seems like it is using the utf8mb4 CHARSET

@abonander
Copy link
Collaborator

We do recognize that character set, in fact it's what the connection is set to use by default for sanity reasons.

Can you execute this query against the database?

Select ID, CHARACTER_SET_NAME, COLLATION_NAME
      FROM INFORMATION_SCHEMA.COLLATIONS
      WHERE COLLATION_NAME = 'utf8mb4_bin`;

@marcustut
Copy link
Contributor Author

This is what I got from the query

fyp/slide-service> SELECT ID, CHARACTER_SET_NAME, COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME = 'utf8mb4_bin';
+----+--------------------+----------------+
| ID | CHARACTER_SET_NAME | COLLATION_NAME |
+----+--------------------+----------------+
| 46 | utf8mb4            | utf8mb4_bin    |
+----+--------------------+----------------+
fyp/slide-service>

@abonander
Copy link
Collaborator

Yeah, that's correct, so it doesn't make sense that the macros think it's a binary string.

Could you execute this program against your database and tell me what it outputs:

use sqlx::{MySqlConnection, Executor};

// or #[async_std::main]
#[tokio::main]
async fn main() {
    let mut conn = MySqlConnection::connect("<your database URL>").await.unwrap();

    // This is the method the macros use. It's hidden in the docs because it's not considered part of the stable API.
    let describe = conn.describe("SELECT * FROM users WHERE username = ?").await.unwrap();

    println!("describe output: {:?}", describe);
}

@abonander
Copy link
Collaborator

What would also really help is a Wireshark capture taken while this program executes, although if you're required to use TLS to connect to the database then it'll unfortunately be useless.

@abonander abonander added db:mysql Related to MySQL macros labels Feb 8, 2022
@marcustut
Copy link
Contributor Author

warning: `auth-service` (bin "auth-service") generated 10 warnings
    Finished dev [unoptimized + debuginfo] target(s) in 4.44s
     Running `/Users/marcus/Projects/fyp/backend/target/debug/auth-service`
describe output: Describe { columns: [MySqlColumn { ordinal: 0, name: id, type_info: MySqlTypeInfo { type: VarString, flags: NOT_NULL | PRIMARY_KEY | BINARY | NO_DEFAULT_VALUE, char_set: 255, max_size: Some(1020) }, flags: Some(NOT_NULL | PRIMARY_KEY | BINARY | NO_DEFAULT_VALUE) }, MySqlColumn { ordinal: 1, name: username, type_info: MySqlTypeInfo { type: VarString, flags: NOT_NULL | UNIQUE_KEY | BINARY | NO_DEFAULT_VALUE, char_set: 255, max_size: Some(200) }, flags: Some(NOT_NULL | UNIQUE_KEY | BINARY | NO_DEFAULT_VALUE) }, MySqlColumn { ordinal: 2, name: email, type_info: MySqlTypeInfo { type: VarString, flags: NOT_NULL | UNIQUE_KEY | BINARY | NO_DEFAULT_VALUE, char_set: 255, max_size: Some(1020) }, flags: Some(NOT_NULL | UNIQUE_KEY | BINARY | NO_DEFAULT_VALUE) }, MySqlColumn { ordinal: 3, name: full_name, type_info: MySqlTypeInfo { type: VarString, flags: BINARY, char_set: 255, max_size: Some(240) }, flags: Some(BINARY) }, MySqlColumn { ordinal: 4, name: password_hash, type_info: MySqlTypeInfo { type: VarString, flags: NOT_NULL | BINARY | NO_DEFAULT_VALUE, char_set: 255, max_size: Some(1020) }, flags: Some(NOT_NULL | BINARY | NO_DEFAULT_VALUE) }, MySqlColumn { ordinal: 5, name: avatar_url, type_info: MySqlTypeInfo { type: VarString, flags: BINARY, char_set: 255, max_size: Some(8332) }, flags: Some(BINARY) }, MySqlColumn { ordinal: 6, name: bio, type_info: MySqlTypeInfo { type: VarString, flags: BINARY, char_set: 255, max_size: Some(1020) }, flags: Some(BINARY) }, MySqlColumn { ordinal: 7, name: created_at, type_info: MySqlTypeInfo { type: Timestamp, flags: NOT_NULL | BINARY | TIMESTAMP, char_set: 63, max_size: Some(19) }, flags: Some(NOT_NULL | BINARY | TIMESTAMP) }, MySqlColumn { ordinal: 8, name: updated_at, type_info: MySqlTypeInfo { type: Timestamp, flags: NOT_NULL | BINARY | TIMESTAMP | ON_UPDATE_NOW, char_set: 63, max_size: Some(19) }, flags: Some(NOT_NULL | BINARY | TIMESTAMP | ON_UPDATE_NOW) }], parameters: Some(Right(1)), nullable: [Some(false), Some(false), Some(false), Some(true), Some(false), Some(true), Some(true), Some(false), Some(false)] }

This is the output, I wasn't able to use Wireshark as it is required to use tls to connect to the planetscale database.

@abonander
Copy link
Collaborator

abonander commented Feb 10, 2022

Yep, it's using a charset/collation (they're used interchangeably in places but in MySQL a collation is a charset + sorting rules) that SQLx doesn't recognize: utf8mb4_0900_ai_ci (char_set: 255 in the describe info, try SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID = 255 to confirm)

So that collation is UTF-8 4-byte encoding, Unicode version 9.0 with accent-insensitive and case-insensitive sorting. That is not the same as the utf8mb4_bin collation given in your schema. Did you modify it manually?

We probably need some sort of connection-local cache of collations, because there's a whole list of them for different encodings, Unicode versions and sorting rules, but all we really care about is if they're compatible with UTF-8.

We could just add this and more to the hardcoded list that we currently have, but we're already doing something similar in Postgres to map type names to type IDs so I don't think it'd be the worst thing.

@marcustut
Copy link
Contributor Author

No, I didn't modify the collation. However, now I found out it's because of that, so in theory if I change the charset/collation to a supported one then this issue will be resolved right?

I assume sqlx supports either collation of utf8mb4_general_ci or utf8mb4_unicode_ci?

@abonander
Copy link
Collaborator

The collations that SQLx considers compatible with String are listed here: https://github.com/launchbadge/sqlx/blob/master/sqlx-core/src/mysql/types/str.rs#L40-L44

As a sanity check, can you give me the output of SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID = 255? Maybe the collation IDs change between versions or Planetscale is doing something funky with them. Both would be reasons to not hardcode the IDs like we're doing now.

@marcustut
Copy link
Contributor Author

fyp/slide-service> SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID = 255;
+--------------------+--------------------+-----+------------+-------------+---------+---------------+
| COLLATION_NAME     | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
+--------------------+--------------------+-----+------------+-------------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4            | 255 | Yes        | Yes         |       0 | NO PAD        |
+--------------------+--------------------+-----+------------+-------------+---------+---------------+
fyp/slide-service>

Here you go, thanks for helping with this issue ;D

@abonander
Copy link
Collaborator

Yeah, that's matching the information I get from MySQL locally.

That's really weird, though. Silently changing from a case-sensitive collation to a case-insensitive one seems like a pretty bad thing for a database to do. It'd certainly change the behavior of indexes.

@abonander
Copy link
Collaborator

abonander commented Feb 10, 2022

According to the MySQL docs, the default collation unless otherwise specified is utf8mb4_0900_ai_ci, but we default that to utf8mb4_unicode_ci in... kind of a convoluted manner:

It's possible that PlanetScale is ignoring the collation we set on the connection. It might be worth opening a thread in their beta discussions: https://github.com/planetscale/beta/discussions

@abonander
Copy link
Collaborator

Opened a bug report and I'm labeling this as bug:db planetscale/discussion#162

@abonander abonander added the bug:db Involves a bug in the database server label Feb 11, 2022
@scottwey
Copy link
Contributor

scottwey commented May 6, 2022

From the Planetscale team:

Vitess (and hence PlanetScale) cannot allow the users to choose a collation for their connection, because our database is a hyper-scalar system.

It looks like they're unable to fix the issue on their end anytime soon.

It looks like adding COLLATE_UTF8MB4_0900_AI_CI = 255; to https://github.com/launchbadge/sqlx/blob/master/sqlx-core/src/mysql/types/str.rs will fix this issue.
I made a PR for this. Would love to be able to use sqlx with Planetscale!

@ekallevik
Copy link

I have run into the same issue: VARCHAR is mapped to Vec instead of String.

describe test_table
Field Type Null Key Default Extra
id int NO PRI auto_increment
data varchar(10) NO
SELECT ID, CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME = 'utf8mb4_bin';
ID CHARACTER_SET_NAME COLLATION_NAME
46 utf8mb4 utf8mb4_bin
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID = 255;
COLLATION_NAME COLLATION_NAME ID IS_DEFAULT IS_COMPILED SORTLEN PAD_ATTRIBUTE
utf8mb4_0900_ai_ci utf8mb4 255 YES YES 0 PAD
    let mut conn = MySqlConnection::connect(URL).await.unwrap();
    let describe = conn.describe("SELECT * FROM test_table WHERE id = ?").await.unwrap();

    println!("{:?}", describe);
Describe 
{
  columns: [
    MySqlColumn
    {
      ordinal: 0,
      name: id,
      type_info: MySqlTypeInfo
    {
      type: Long,
      flags: NOT_NULL
      |
      PRIMARY_KEY
      |
      AUTO_INCREMENT
      |
      NUM,
      char_set: 63,
      max_size: Some(11)
    },
    flags
    :
    Some(NOT_NULL
    |
    PRIMARY_KEY
    |
    AUTO_INCREMENT
    |
    NUM)
    },
    MySqlColumn
    {
      ordinal: 1,
      name: data,
      type_info: MySqlTypeInfo
    {
      type: VarString,
      flags: NOT_NULL
      |
      NO_DEFAULT_VALUE,
      char_set: 255,
      max_size: Some(40)
    },
    flags
    :
    Some(NOT_NULL
    |
    NO_DEFAULT_VALUE)
    }
  ],
  parameters: Some(Right(1)),
  nullable: [
    Some(false),
    Some(false)
  ]
}

Cargo.toml

[package]
name = "backend"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.5.7"
hyper = { version = "0.14.19", features = ["full"] }
serde = { version = "1.0.137", features = ["derive"] }
sqlx = { version = "0.5.13", features = ["mysql", "runtime-tokio-rustls"] }
tokio = { version = "1.19.2", features = ["full"] }
tower = { version = "0.4.12", features = ["util", "timeout"] }
tower-http = { version = "0.3.0", features = ["add-extension", "trace"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
uuid = { version = "1.1.1", features = ["serde", "v4"] }

Is this a bug on my side, or new variant of the previous issue perhaps?

@houseme
Copy link

houseme commented Aug 28, 2024

I have run into the same issue: VARCHAR is mapped to Vec instead of String.我遇到了同样的问题:VARCHAR 映射到 Vec 而不是 String。

describe test_table

Field 场地 Type 类型 Null 无效的 Key 钥匙 Default 默认 Extra 额外的
id int 整数 NO PRI auto_increment 自动增量
data 数据 varchar(10) NO

SELECT ID, CHARACTER_SET_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME = 'utf8mb4_bin';

ID CHARACTER_SET_NAME COLLATION_NAME
46 utf8mb4 utf8mb4_bin

SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE ID = 255;

COLLATION_NAME COLLATION_NAME ID IS_DEFAULT IS_COMPILED SORTLEN PAD_ATTRIBUTE
utf8mb4_0900_ai_ci utf8mb4 255 YES YES 0 PAD

    let mut conn = MySqlConnection::connect(URL).await.unwrap();
    let describe = conn.describe("SELECT * FROM test_table WHERE id = ?").await.unwrap();

    println!("{:?}", describe);
Describe 
{
  columns: [
    MySqlColumn
    {
      ordinal: 0,
      name: id,
      type_info: MySqlTypeInfo
    {
      type: Long,
      flags: NOT_NULL
      |
      PRIMARY_KEY
      |
      AUTO_INCREMENT
      |
      NUM,
      char_set: 63,
      max_size: Some(11)
    },
    flags
    :
    Some(NOT_NULL
    |
    PRIMARY_KEY
    |
    AUTO_INCREMENT
    |
    NUM)
    },
    MySqlColumn
    {
      ordinal: 1,
      name: data,
      type_info: MySqlTypeInfo
    {
      type: VarString,
      flags: NOT_NULL
      |
      NO_DEFAULT_VALUE,
      char_set: 255,
      max_size: Some(40)
    },
    flags
    :
    Some(NOT_NULL
    |
    NO_DEFAULT_VALUE)
    }
  ],
  parameters: Some(Right(1)),
  nullable: [
    Some(false),
    Some(false)
  ]
}

Cargo.toml 货物.toml

[package]
name = "backend"
version = "0.1.0"
edition = "2021"

[dependencies]
axum = "0.5.7"
hyper = { version = "0.14.19", features = ["full"] }
serde = { version = "1.0.137", features = ["derive"] }
sqlx = { version = "0.5.13", features = ["mysql", "runtime-tokio-rustls"] }
tokio = { version = "1.19.2", features = ["full"] }
tower = { version = "0.4.12", features = ["util", "timeout"] }
tower-http = { version = "0.3.0", features = ["add-extension", "trace"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
uuid = { version = "1.1.1", features = ["serde", "v4"] }

Is this a bug on my side, or new variant of the previous issue perhaps?这是我这边的错误,还是上一期的新变体?

Have you solved this problem? How to handle it?

@echelon
Copy link

echelon commented Sep 18, 2024

I just got hit with this when upgrading from 0.7.x to 0.8.2. Previously accepted utf8mb4 fields used to map to Strings with sqlx successfully, and now I receive the following error:

the trait `From<std::option::Option<Vec<u8>>>` is not implemented for `std::option::Option<std::string::String>`, which is required by `std::option::Option<Vec<u8>>: Into<_>

@abonander
Copy link
Collaborator

@echelon there is a more recent issue for this that's pinned at the top of the issues page: #3387

@launchbadge launchbadge locked as resolved and limited conversation to collaborators Sep 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug:db Involves a bug in the database server db:mysql Related to MySQL macros
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants