Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(c/driver/postgresql): Implement consuming a PGresult via the copy reader #2029

Merged

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jul 22, 2024

I started this PR wanting to get queries with parameters able to return their results; however, this turned into a PR leaning in to the PqResultHelper because it was helpful to export arrays from the PGresult* but wasn't quite general enough. I did a second bit of shuffling to make it (possibly, or maybe just for me) easier to understand what path gets taken on ExecuteQuery().

Some side effects of these changes are that we can now support multiple statements in the same query (by using PQexec() instead of PQexecParams() when there is no output requested) and that we can ExecuteSchema() for all parameterized queries.

The actual feature is that a user can set adbc.postgresql.use_copy = FALSE to force a non-COPY path for queries that aren't supported there. Because we request binary data, we can use all the same infrastructure for converting the results! I have only one test for this although I did run the whole test suite in C++ and Python...there are still a few missing features (batch size hint, large string overflow, error detail, cancel) but most tests pass using either path.

I'm happy to split this up if that is easier! I'm also planning to document the helper (but wanted a first round of review before documenting the behaviour to make sure it's behaviour we actually want).

Closes #855, Closes #2035.

library(adbcdrivermanager)
#> Warning: package 'adbcdrivermanager' was built under R version 4.3.3

con <- adbc_database_init(
  adbcpostgresql::adbcpostgresql(),
  uri = "postgresql://localhost:5432/postgres?user=postgres&password=password"
) |> 
  adbc_connection_init()

nycflights13::flights |> 
  write_adbc(con, "flights")

stream <- nanoarrow::nanoarrow_allocate_array_stream()
rows <- con |> 
  adbc_statement_init(adbc.postgresql.use_copy = FALSE) |> 
  adbc_statement_set_sql_query(
    "SELECT * from flights where month = 1 AND day = 1"
  ) |> 
  adbc_statement_prepare() |>
  adbc_statement_execute_query(stream)

rows
#> [1] 842

tibble::as_tibble(stream)
#> # A tibble: 842 × 19
#>     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
#>  1  2013     1     1      517            515         2      830            819
#>  2  2013     1     1      533            529         4      850            830
#>  3  2013     1     1      542            540         2      923            850
#>  4  2013     1     1      544            545        -1     1004           1022
#>  5  2013     1     1      554            600        -6      812            837
#>  6  2013     1     1      554            558        -4      740            728
#>  7  2013     1     1      555            600        -5      913            854
#>  8  2013     1     1      557            600        -3      709            723
#>  9  2013     1     1      557            600        -3      838            846
#> 10  2013     1     1      558            600        -2      753            745
#> # ℹ 832 more rows
#> # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> #   hour <dbl>, minute <dbl>, time_hour <dttm>

con |> 
  execute_adbc("DROP TABLE flights")

Created on 2024-07-25 with reprex v2.1.0

@paleolimbot paleolimbot force-pushed the c-postgres-parameterized-result-stream branch from a375c63 to 63b9459 Compare July 25, 2024 21:34
@paleolimbot paleolimbot marked this pull request as ready for review July 26, 2024 03:19
@github-actions github-actions bot modified the milestone: ADBC Libraries 14 Jul 26, 2024
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great!

return -1;
}

char* first = PQcmdTuples(result_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...wow, this function has such a weird API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is what you thought the last time you encountered it, too 🙂

// an empty string even for a DELETE. (Also, why does it return a
// string...) Possibly, it doesn't work because we use PQexecPrepared

NANOARROW_RETURN_NOT_OK(field_readers_[i]->InitArray(tmp->children[i]));
}

// TODO: If we get an EOVERFLOW here (e.g., big string data), we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we file a new issue for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! #2064

@paleolimbot paleolimbot merged commit 05fa60d into apache:main Aug 7, 2024
65 checks passed
@paleolimbot paleolimbot deleted the c-postgres-parameterized-result-stream branch August 7, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants