Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(go/adbc/driver): add support for Google BigQuery #1722

Merged
merged 67 commits into from
Jul 5, 2024

Conversation

cocoa-xu
Copy link
Contributor

@cocoa-xu cocoa-xu commented Apr 15, 2024

Hi this PR is a preliminary Go implementation for Google BigQuery as the preferred approach to PR #1717.

Currently it supports query functionality as a proof of concept, users can

  • set most supported options for statements
  • send queries and read the result table in Arrow format

It gives the same results as in #1717 using this driver in Elixir using elixir-explorer/adbc.

Mix.install([{:adbc, "~> 0.3.2-dev", github: "elixir-explorer/adbc"}])

defmodule BigqueryTest do
  def test do
    children = [
      {Adbc.Database,
       "adbc.bigquery.sql.project_id": "bigquery-poc-418913",
       driver: "libadbc_driver_bigquery.dylib",
       process_options: [name: MyApp.DB]},
      {Adbc.Connection, database: MyApp.DB, process_options: [name: MyApp.Conn]}
    ]

    Supervisor.start_link(children, strategy: :one_for_one)

    dbg(
      Adbc.Connection.query(MyApp.Conn, "SELECT * FROM google_trends.small_top_terms LIMIT 7", [],
        "adbc.bigquery.sql.query.write_disposition": "WRITE_TRUNCATE"
      )
    )
  end
end

BigqueryTest.test()
[bigquery.exs:16: BigqueryTest.test/0]
Adbc.Connection.query(MyApp.Conn, "SELECT * FROM google_trends.small_top_terms LIMIT 7", [],
  "adbc.bigquery.sql.query.write_disposition": "WRITE_TRUNCATE"
) #=> {:ok,
 %Adbc.Result{
   num_rows: nil,
   data: %{
     "dma_id" => [546, 546, 546, 546, 546, 546, 546],
     "dma_name" => ["Columbia SC", "Columbia SC", "Columbia SC", "Columbia SC",
      "Columbia SC", "Columbia SC", "Columbia SC"],
     "rank" => [15, 15, 15, 15, 15, 15, 15],
     "refresh_date" => [~D[2024-03-14], ~D[2024-03-14], ~D[2024-03-14],
      ~D[2024-03-14], ~D[2024-03-14], ~D[2024-03-14], ~D[2024-03-14]],
     "score" => [nil, nil, nil, nil, nil, nil, nil],
     "term" => ["Nex Benedict", "Nex Benedict", "Nex Benedict", "Nex Benedict",
      "Nex Benedict", "Nex Benedict", "Nex Benedict"],
     "week" => [~D[2020-12-13], ~D[2020-12-20], ~D[2021-02-21], ~D[2021-02-28],
      ~D[2021-03-07], ~D[2021-03-14], ~D[2021-04-04]]
   }
 }}

There're still a few thing to be done:

  • set credentials when initialising the database; currently Google Cloud SDK will automatically find and use credentials saved on local storage (generated by gcloud auth application-default login)
  • implement GetInfo, GetTableSchema and other functions for BigQuery's AdbcConnection and AdbcStatement
    • get table constraints and return them in corresponding info objects (currently impossible to do so)
    • implement Bind and BindStream
    • implement ExecuteSchema?
    • implement ReadPartition and ExecutePartitions?
    • implement Substrait execution?
  • add tests for this driver

@github-actions github-actions bot added this to the ADBC Libraries 1.0.0 milestone Apr 15, 2024
@lidavidm lidavidm changed the title feat(go/driver/bigquery): add support for Google BigQuery feat(go/adbc/driver/bigquery): add support for Google BigQuery Apr 15, 2024
@cocoa-xu cocoa-xu force-pushed the feat/go-google-bigquery-support branch from 525d7c2 to 568d677 Compare April 16, 2024 06:06
@lidavidm
Copy link
Member

@zeroshade do you think you could give this a brief scan and make sure things are on the right track?

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!! Thanks!

Left a ton of comments for you

go/adbc/driver/bigquery/bigquery_database.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/bigquery_database.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/bigquery_database.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/bigquery_database.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/bigquery_database.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/statement.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/statement.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/statement.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/statement.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/statement.go Outdated Show resolved Hide resolved
@cocoa-xu
Copy link
Contributor Author

This is a great start!! Thanks!

Left a ton of comments for you

Hi @zeroshade, thank you so much for the code review! And sorry that I only picked up some Go skills in the past week based on the snowflake implementation. The issues you mentioned should've been fixed, and I'll implement the rest APIs and try to stick to these standards in Go :)

@cocoa-xu
Copy link
Contributor Author

Hi I've updated and implemented a bit more. Although I'm not 100% sure if this is the right/best way to do some functions... I'll be happy to make any changes.

Besides that, I also updated the todo list in the top comment. While I'd like to implement these functions as much as I can, please do let me know if we can put off any of them and address them in another PR. :)

@lidavidm
Copy link
Member

all those TODOs are fine to split into later PRs

@cocoa-xu
Copy link
Contributor Author

all those TODOs are fine to split into later PRs

Got it! Then we probably can merge this first once we're happy about with it. I'll do separate PRs for the left bits. :)

And once again, thank you all for the great help and your time for the code review. @lidavidm @zeroshade ❤️

@cocoa-xu cocoa-xu marked this pull request as ready for review April 22, 2024 13:01
@cocoa-xu cocoa-xu requested a review from lidavidm as a code owner April 22, 2024 13:01
@zeroshade
Copy link
Member

I agree with @lidavidm that the TODOs are fine to split into later PRs. Thanks for your work here! I'll give this a new review pass tomorrow. For now I approved the CI to run, looks like there's some pre-commit formatting/linting issues you have to resolve among other failures.

@cocoa-xu
Copy link
Contributor Author

I agree with @lidavidm that the TODOs are fine to split into later PRs. Thanks for your work here! I'll give this a new review pass tomorrow. For now I approved the CI to run, looks like there's some pre-commit formatting/linting issues you have to resolve among other failures.

Thank you very much @zeroshade!! I'll resolve these issues along with any issues you may point out in the code review 😃

@lidavidm
Copy link
Member

I think you'll want to try that rebase again 😅

@cocoa-xu cocoa-xu force-pushed the feat/go-google-bigquery-support branch from 72f998a to e68ed26 Compare April 24, 2024 08:55
@cocoa-xu
Copy link
Contributor Author

I think you'll want to try that rebase again 😅

git is hard... now it should work I guess

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass reviewing this. I'll do another pass tomorrow to get the rest

go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Show resolved Hide resolved
zeroshade pushed a commit that referenced this pull request Apr 25, 2024
As pointed out by @zeroshade
[here](#1722 (comment)),
we should fix the formatting of the comment.
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Apr 25, 2024

is this really the only way we can do this? we can't just make a sql query or otherwise get this information from bigquery, we have to iterate and perform the pattern matching ourselves?

tl;dr: yes.

According to the replies in googleapis/google-cloud-go#10044 and docs for BigQuery, the answer is yes, we have to enumerate projects and datasets (and as said in the reply, to enumerate projects we have to use ResourceManager, as current implementation of bigquery is not designed or possible to achieve this); or we're effectively limited in a single region of a project (it's impossible to query all datasets that stored in multiple regions using a single query).

Otherwise we would have to wait for Google to implement this feature, using a single SQL query to get all datasets in a project regardless of their location.

zeroshade pushed a commit that referenced this pull request Apr 26, 2024
As pointed out by @zeroshade
[here](#1722 (comment)),
this should be handled by doing `adbc.Error{Msg: ctx.Err(), Code:....}`.
@lidavidm lidavidm added this to the ADBC Libraries 14 milestone Jun 30, 2024
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Jul 1, 2024

Hi @zeroshade, sorry for the ping! I've done writing basic tests for BigQuery, and I fully understand that you may be quite busy recently, but I was wondering if you perhaps have any free moment for a review or leave me some todo points so that this can get merged or push forward?

I'll be very happy to share my BigQuery credentials for you to run the tests. Please feel free to email/dm me. My email address is the same as the one used in the commits, and you can also find me on the Gopher channel. ;)

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my last set of nitpicks that I can see. I'm happy with the PR as-is after these changes, @joellubi any further thoughts from you on this? Given my recent busy schedule, i'm gonna rely on Joel to help you get this merged!

thanks for all this work, looking forward to getting this in. sorry it's taking so long

c/driver/bigquery/README.md Outdated Show resolved Hide resolved
c/driver/bigquery/bigquery_test.cc Outdated Show resolved Hide resolved
c/driver/bigquery/bigquery_test.cc Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/connection.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/record_reader.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/record_reader.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/record_reader.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/record_reader.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/record_reader.go Outdated Show resolved Hide resolved
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Jul 1, 2024

@zeroshade Many thanks for the code review! I've fixed all of them except the last one -- the chan will close early after my refactor and I didn't quite figure out a way do avoid that so I left it as is. Sorry I'm not quite sure how to properly do this in Go, so if anyone has any suggestions or hints on this, I'll be happy to make further changes!

thanks for all this work, looking forward to getting this in. sorry it's taking so long

And no worries! It was all very interesting for me and I've learnt a lot about Go programming from doing it and from these code reviews!

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Jul 2, 2024

I've updated to use TableMetadata for getTableSchemaWithFilter!

@cocoa-xu cocoa-xu requested a review from joellubi July 2, 2024 17:56
Copy link
Member

@joellubi joellubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, looks good to me overall. I don't have a BigQuery environment at the moment to execute the test suite in so I haven't verified the tests myself. I pointed out a few places where I suspect some issues may come up when the tests are run, but any inconsistencies there can be fixed once the tests are added to CI and become load-bearing.

go/adbc/driver/bigquery/record_reader.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/driver_test.go Outdated Show resolved Hide resolved
go/adbc/driver/bigquery/driver_test.go Outdated Show resolved Hide resolved
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Jul 2, 2024

Many thanks for the code review @joellubi! Sorry I didn't notice some of these test flags, and they should be corrected now. If there is no other apparent issues, I guess perhaps we can merge this PR for now? I plan to make separate PRs for bulk ingestion and other todo items in this PR. ;)

@zeroshade
Copy link
Member

Just those last two nitpicks from me, I'm good to merge this after they are addressed. The remaining stuff can be handled in a follow-up

@cocoa-xu cocoa-xu force-pushed the feat/go-google-bigquery-support branch from 39be2ff to 32c9200 Compare July 3, 2024 16:41
@cocoa-xu cocoa-xu requested a review from zeroshade July 3, 2024 18:33
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Jul 3, 2024

This PR should be ready to be merged 🎉~ I'll make a checklist for these todos in the code today or tomorrow. And massive thanks for your time reviewing my code and these valuable feedback!

Copy link
Member

@joellubi joellubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cocoa-xu!

@lidavidm
Copy link
Member

lidavidm commented Jul 4, 2024

It looks like we're good to merge?

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Jul 4, 2024

It looks like we're good to merge?

Yes, it should be ready! And one left thing for the repo maintainers is, it would need a test account, generate and save JSON credentials in the repo secrets when we want to do any integration tests. :)

@lidavidm
Copy link
Member

lidavidm commented Jul 5, 2024

Thank you @cocoa-xu for the huge contribution!

@lidavidm lidavidm merged commit e7e2519 into apache:main Jul 5, 2024
61 of 65 checks passed
kou pushed a commit that referenced this pull request Jul 11, 2024
Quick follow up to #1722

meson test skips everything at the moment - didn't see this getting
tested in CI so assuming its something done offline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants