Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: inserting Array of nullable primitive #1104

Closed

Conversation

Blackmorse
Copy link
Contributor

Description:
Unable to insert the data into array of nullables

E.g.

CREATE TABLE default.t
(
    `a` Int8,
    `arr` Array(Nullable(Int8))
)
ENGINE = MergeTree
ORDER BY a

Trying to push an array(Scala code):

    val driver = new ClickHouseDriver()
    val connection = driver.connect("jdbc:clickhouse://localhost:8123", new Properties())
    val statement = connection.prepareStatement("insert into default.t (a, arr) values (?, ?)")

    val a = Array(1.toByte, 2.toByte, null).map(_.asInstanceOf[AnyRef])
    val arr = connection.createArrayOf("Array(Nullable(Int8))", a)

    statement.setByte(1, 1.toByte)
    statement.setArray(2, arr)
    statement.addBatch()
    statement.executeBatch()

    connection.close()

Error:

class [Ljava.lang.Object; cannot be cast to class [B ([Ljava.lang.Object; and [B are in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class [Ljava.lang.Object; cannot be cast to class [B ([Ljava.lang.Object; and [B are in module java.base of loader 'bootstrap')
	at com.clickhouse.client.data.ClickHouseRowBinaryProcessor$MappedFunctions.writeArray(ClickHouseRowBinaryProcessor.java:57)
	at com.clickhouse.client.data.ClickHouseRowBinaryProcessor$MappedFunctions.serialize(ClickHouseRowBinaryProcessor.java:486)
	at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch(InputBasedPreparedStatement.java:333)
	at com.blackmorse.spark.clickhouse.types.Int8Tests.$anonfun$new$4(Int8Tests.scala:37)
...

Reason:
when using (e.g) Array(Nullable(Int8)) type, ClickHouseArrayValue but ClickHouseRowBinaryProcessor still expects ClickHouseByteArrayValue.

@github-actions
Copy link

github-actions bot commented Oct 9, 2022

Benchmark                                (client)  (connection)  (statement)  (type)   Mode  Cnt    Score    Error  Units
Basic.insertOneRandomNumber  clickhouse-http-jdbc         reuse       normal  object  thrpt   20  223.947 ± 35.287  ops/s
Basic.insertOneRandomNumber  clickhouse-http-jdbc         reuse     prepared  object  thrpt   20  207.273 ± 17.772  ops/s
Basic.insertOneRandomNumber  clickhouse-http-jdbc           new       normal  object  thrpt   20  241.529 ± 29.867  ops/s
Basic.insertOneRandomNumber  clickhouse-http-jdbc           new     prepared  object  thrpt   20  196.881 ± 22.172  ops/s
Basic.insertOneRandomNumber  clickhouse-grpc-jdbc         reuse       normal  object  thrpt   20  237.150 ± 35.856  ops/s
Basic.insertOneRandomNumber  clickhouse-grpc-jdbc         reuse     prepared  object  thrpt   20  208.857 ± 13.514  ops/s
Basic.insertOneRandomNumber  clickhouse-grpc-jdbc           new       normal  object  thrpt   20  257.916 ± 32.225  ops/s
Basic.insertOneRandomNumber  clickhouse-grpc-jdbc           new     prepared  object  thrpt   20  202.363 ± 16.366  ops/s
Basic.selectOneRandomNumber  clickhouse-http-jdbc         reuse       normal  object  thrpt   20  687.256 ± 55.872  ops/s
Basic.selectOneRandomNumber  clickhouse-http-jdbc         reuse     prepared  object  thrpt   20  701.576 ± 38.171  ops/s
Basic.selectOneRandomNumber  clickhouse-http-jdbc           new       normal  object  thrpt   20  657.470 ± 64.467  ops/s
Basic.selectOneRandomNumber  clickhouse-http-jdbc           new     prepared  object  thrpt   20  709.362 ± 38.701  ops/s
Basic.selectOneRandomNumber  clickhouse-grpc-jdbc         reuse       normal  object  thrpt   20  650.289 ± 50.945  ops/s
Basic.selectOneRandomNumber  clickhouse-grpc-jdbc         reuse     prepared  object  thrpt   20  651.997 ± 43.356  ops/s
Basic.selectOneRandomNumber  clickhouse-grpc-jdbc           new       normal  object  thrpt   20  626.593 ± 59.309  ops/s
Basic.selectOneRandomNumber  clickhouse-grpc-jdbc           new     prepared  object  thrpt   20  648.822 ± 49.873  ops/s

@zhicwu
Copy link
Contributor

zhicwu commented Oct 9, 2022

Thank you @Blackmorse! Actually this has been covered in my local changes.

Basically, widen unsigned types is probably not the best way, because it consumes more memory and has side effects in type conversion. I added UnsignedLong, UnsignedInteger, UnsignedShort, and UnsignedByte classes to better handle unsigned primitive types along with some changes in *Value classes, and more tests to ensure they work as expected. A few options were added too in case people prefer the old way.

I've completed the read part last night, and I should be able to complete the write part in a day or two.

@Blackmorse
Copy link
Contributor Author

Thanks for the answer, @zhicwu

Why is this related to the unsigned types?

@zhicwu
Copy link
Contributor

zhicwu commented Oct 10, 2022

Why is this related to the unsigned types?

Sorry I only explained the reason of refactoring :p The issue is not directly related to unsigned types, but during the refactoring, I also changed the way to handle nullable types and moved methods like newValue from ClickHouseValues class to ClickHouseColumn, with additional changes and tests to ensure they work for both RowBinary and Native data formats.

@Blackmorse
Copy link
Contributor Author

Hi, @zhicwu
Do you have any idea when you can apply you local code changes mentioned before? This issue is affecting my software, so maybe it's worth merging it now and not to wait for huge changes from your side?
Thanks

@zhicwu
Copy link
Contributor

zhicwu commented Nov 7, 2022

Hi, @zhicwu Do you have any idea when you can apply you local code changes mentioned before? This issue is affecting my software, so maybe it's worth merging it now and not to wait for huge changes from your side? Thanks

Hi @Blackmorse, I'm sorry for blocking your work. I must confess that I didn't spend any time on this project in the past weeks due to work on hand.

Are you expecting a patch release in maven central? As you probably noticed, there are some more changes in develop branch and the legacy driver got removed. So if all you need is the latest release plus this PR, I'd suggest you to build and push new artifact to local maven repository.

If it's fine to use nightly build(v0.3.3-SNAPSHOT) and wait for a few more days, I can commit some changes in local in a day or two, which will surely fix the issue along with performance improvement(in addition to PR #1087). However, this involves breaking changes to reduce serialization cost - if you're using interfaces ClickHouseDeserializer, ClickHouseSerializer and ClickHouseDataProcessor etc., your code will be impacted.

Lastly, I'm sure there's always good reason to use nullable types, but another alternative approach I can think of is to use Array(Int8), which avoids boxed type and null flag in serialization.

@zhicwu zhicwu changed the base branch from master to develop November 10, 2022 15:36
@zhicwu
Copy link
Contributor

zhicwu commented Nov 10, 2022

Fixed in #1124, you may check the test at here.

Couple things worthy of mention here:

  1. for column Array(Nullable(Int8)), use conn.createArrayOf("Nullable(Int8)", new Byte[] {1,null,2})
  2. it's easier and faster to use preparedStmt.setObject() instead of setArray to set complex value including array

@zhicwu zhicwu closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants