Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(r): Add callback-based async option for calling ArrowArrayStream::get_next() #211

Closed
wants to merge 2 commits into from

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jun 7, 2023

Total proof-of-concept of one way to go about async stream$get_next(), as discussed briefly with @krlmlr, @lidavidm, and @nbenn in connection with ADBC in R. When the stream is wrapping a database result set, the get_next() call can block for a considerable amount of time...this async version would support a world where it's easier to cancel running queries since we can loop + wait in R land (possibly calling the brand-new "cancel" function coming to ADBC 1.1).

This PR currently will leak memory and doesn't communicate any error information beyond "result code".

See also r-dbi/adbc#6

Reprex:

library(nanoarrow)

stream <- basic_array_stream(list(1:5))

# ...more usefully, a promise or future whose value is set from the callback
result <- NULL
nanoarrow:::nanoarrow_array_stream_get_next_async(stream, function(code, array) {
  message("done!")
  result <<- array
})

Sys.sleep(0.5)
nanoarrow:::run_callbacks()
#> done!
convert_array(result)
#> [1] 1 2 3 4 5

Created on 2023-06-07 with reprex v2.0.2

@krlmlr
Copy link
Contributor

krlmlr commented Jun 7, 2023

🤩 🤩 🤩

Give me some time to dig deeper into it. I might add a promises-based wrapper around that.

@codecov-commenter
Copy link

Codecov Report

Merging #211 (3f6214c) into main (47a4ef4) will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #211      +/-   ##
==========================================
+ Coverage   87.87%   87.94%   +0.06%     
==========================================
  Files          60       60              
  Lines        9249     9301      +52     
==========================================
+ Hits         8128     8180      +52     
  Misses       1121     1121              
Impacted Files Coverage Δ
r/src/init.c 100.00% <ø> (ø)
r/src/util.h 100.00% <ø> (ø)
r/R/array-stream.R 96.10% <100.00%> (+0.32%) ⬆️
r/R/util.R 90.24% <100.00%> (+0.24%) ⬆️
r/src/array_stream.c 88.67% <100.00%> (+0.32%) ⬆️
r/src/nanoarrow_cpp.cc 97.72% <100.00%> (+1.89%) ⬆️
r/src/util.c 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@paleolimbot
Copy link
Member Author

Just thinking out loud, but maybe a better home for this would be adbcdrivermanager (where a dependency on later/promises would be more than worth it to support async).

@krlmlr
Copy link
Contributor

krlmlr commented Jun 7, 2023

Yeah, if we can expose the relevant C interface here, we can support a higher-level interface in adbcdrivermanager.

Sequence diagrams FTW!

@krlmlr
Copy link
Contributor

krlmlr commented Jun 8, 2023

I was wondering if the thread should be a property of the connection object. That way we don't need to create and destroy threads, only wake them for each operation (limitless producer-consumer design?).

@paleolimbot
Copy link
Member Author

That's a great idea. This functionality is much better suited to adbcdrivermanager anyway and keeping all the async stuff in one place will be very helpful.

@krlmlr
Copy link
Contributor

krlmlr commented Jul 30, 2023

Do you plan to further enhance this?

@paleolimbot
Copy link
Member Author

Sorry for the delay...I was taking some time away from the keyboard!

I think the best place for this right now is the ADBC R bindings since there will also be other async-related code and it is probably best to keep that in the same place. I'm happy to take direction on that...you have spent more time considering async database access than I have and the purpose of this change would be for the express purpose of enabling that in ADBC/R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants