Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(go/adbc/driver): add support for ClickHouse #1903

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cocoa-xu
Copy link
Contributor

@cocoa-xu cocoa-xu commented Jun 3, 2024

Hi this PR is a preliminary Go implementation for ClickHouse. I'll leave this PR as a draft until most data types are supported and have some tests for it. The screenshot below shows its current status doing a query on a CSV file in elixir.

Mix.install([{:adbc, "~> 0.4.2-dev", github: "elixir-explorer/adbc"}])

defmodule ClickHouseTest do
  def test do
    children = [
      {Adbc.Database,
      "adbc.clickhouse.address": "127.0.0.1:9000",
      "adbc.clickhouse.sql.database": "default",
      "adbc.clickhouse.sql.username": "default",
       driver: "libadbc_driver_clickhouse.dylib",
       process_options: [name: MyApp.DB]},
      {Adbc.Connection, database: MyApp.DB, process_options: [name: MyApp.Conn]}
    ]

    Supervisor.start_link(children, strategy: :one_for_one)
    # dbg(Adbc.Connection.query(MyApp.Conn, "SELECT * FROM iris"))
    dbg(Adbc.Connection.query(MyApp.Conn, "SELECT * FROM file('iris.csv')"))
  end
end

ClickHouseTest.test()
Screenshot 2024-06-03 at 21 58 16

@cocoa-xu
Copy link
Contributor Author

After some experiments, I think there are potentially 3 options for this ClickHouse driver:

  1. use clickhouse-go
  2. use Low-level Go Client ch-go with chpool
  3. just use clickhouse's HTTP/HTTPS protocol with raw ArrowStream

Below are the main pros and cons for each one of them. I'm not quite sure which direction should I go for... the second option, using ch-go seems to be a better fit here than the other two.

1. clickhouse-go

Pros
  • Rich features out of the box, like setting maximum execution time, TLS, compression method and connection strategies (round-robin, random and in-order).
  • Support load-balancing and failover

More key features on their README.md#key-features

Cons
  • Only has row-orientated API for reading, and we have to use reflect for every single value to translate the query results into Arrow format, which can be a cost of some performance.

2. ch-go

ch-go is a low-level Go client for ClickHouse.

Pros
  • Provide columnar read and write interface using their native format.
  • Fast data block streaming with low network, CPU and memory overhead.

More key features here, https://github.com/ClickHouse/ch-go?tab=readme-ov-file#features

Cons

3. HTTP/HTTPS protocol with raw ArrowStream

This means implementing a simple wrapper for API calls like the following bash example with curl

curl --user 'default:<password>' \
  --data-binary 'SOME QUERY Format ArrowStream' \
  https://CHICKHOUSE-ID.clickhouse.cloud:8443
Pros
  • Read and write in ArrowStream directly, no reflection, no casting.
Cons
  • No load-balancing or failover. No fancy features are available out of the box.
  • Require the clickhouse instance(s) to be configured with HTTP/HTTPS protocol enable.

/cc @josevalim

@cocoa-xu
Copy link
Contributor Author

Proof-of-concept code/branch for using ch-go is available here, https://github.com/cocoa-xu/arrow-adbc/pull/new/feat/go-clickhouse-chgo-driver

@josevalim
Copy link

From my point of view, 3 is the one that makes the most sense, because it avoids additional processing and conversion costs. However, this is really for the ADBC team to decide. :)

@cocoa-xu
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants