Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FlightSQL] Stateless prepared statement with parameter support #37720

Closed
alamb opened this issue Sep 14, 2023 · 28 comments
Closed

[FlightSQL] Stateless prepared statement with parameter support #37720

alamb opened this issue Sep 14, 2023 · 28 comments

Comments

@alamb
Copy link
Contributor

alamb commented Sep 14, 2023

Describe the enhancement requested

dev@arrow.apache.com mailing list thread: https://lists.apache.org/thread/f0xb61z4yw611rw0v8vf9rht0qtq8opc

Usecase

InfluxDB IOx / 3.0 would like to allow customers to create prepared SQL statements with parameters so they can send parameterized queries and parameter values to the serve. Without this feature, they have to do the parameter substitution on the client side, which is both subject to possible SQL injection attacks, or (if they use a pre existing library) may not have the same parameter typing as our SQL implementation.

Given the JDBC driver doesn't yet support binding parameters to prepared statements (see #33961) I am not sure how widely used the parameter support is, but I think interest is growing -- for example apache/arrow-rs#4797 adds client side support to the Rust implementation

Background: Stateless services

A common design pattern in cloud services is that the request from a client can be handled by one of a number of identical backend servers as shown in the diagram below.

Subsequent requests may be processed by different backend servers. Any state needed to continue a session is sent to the client which passes it back in subsequent requests.

This design can used to support features such as zero downtime deployments and automatic workload based scaling. It also has the nice property that there is no server side state to clean up (via timeout or other mechanism).

                                                                            ┌────────────────────┐   
                                                                  ┌ ─ ─ ─ ─▶│      Server 1      │   
                                                                            └────────────────────┘   
                                                                  │                                  
                                                                            ┌────────────────────┐   
   ┌────────────────────┐                                         │         │      Server 2      │   
   │     FlightSQL      │                                                   └────────────────────┘   
   │       Client       │─ ─ ─ ─ ─ ▶   ... Network ...    ─ ─ ─ ─ ┘                                  
   │                    │                                                             ...            
   └────────────────────┘                                                                            
                                                                            ┌────────────────────┐   
             ActionCreatePreparedStatementRequest                           │      Server N      │   
                      handled by Server 1                                   └────────────────────┘   
                                                                                                     
                                                                                                     
                                                                                                     
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
                                                                                                     
                                                                            ┌────────────────────┐   
                                                                            │      Server 1      │   
                                                                            └────────────────────┘   
                                                                                                     
                                                                            ┌────────────────────┐   
   ┌────────────────────┐                                                   │      Server 2      │   
   │     FlightSQL      │                                                   └────────────────────┘   
   │       Client       │─ ─ ─ ─ ─ ▶   ... Network ...    ─ ─ ─ ─ ┐                                  
   │                    │                                                             ...            
   └────────────────────┘                                         │                                  
                                                                            ┌────────────────────┐   
                                                                  └ ─ ─ ─ ─▶│      Server N      │   
                ActionPreparedStatementExecute                              └────────────────────┘   
                      handled by Server N                                                            
                                                                                                     

Problem

As currently specified, I don't think we can implement FlightSQL prepared statements with parameters with such a stateless design.

In IOx, the handle returned from ActionCreatePreparedStatementRequest contains the original SQL query text among other things. Thus the subsequent call to ActionPreparedStatementExecute have access to the SQL query.

However, the CommandPrepareStatementQuery message to bind parameters does not return anything to the client that is sent to calls to ActionPreparedStatementExecute. Thus there is no way for the server that processes ActionPreparedStatementExecute to know the values of the parameters.

FlightSQL sequence Diagram

Here is the sequence diagram from https://arrow.apache.org/docs/format/FlightSql.html for reference

CommandPreparedStatementQuery mmd

Strawman Proposal

One way to support stateless implementation of prepared statements with bind parameters would be to extend the response returned from calling DoPut with CommandPrepareStatementQuery to include a new CommandPrepareStatementQueryResponse , similar to

arrow/format/FlightSql.proto

Lines 1782 to 1792 in 15a8ac3

* Returned from the RPC call DoPut when a CommandStatementUpdate
* CommandPreparedStatementUpdate was in the request, containing
* results from the update.
*/
message DoPutUpdateResult {
option (experimental) = true;
// The number of records updated. A return value of -1 represents
// an unknown updated record count.
int64 record_count = 1;
}

/**
* Response returned when `DoPut` is called with `CommandPrepareStatementQuery`
message DoPutStatementPrepareResult {
  option (experimental) = true;

  // (potentially updated) opaque handle for the prepared statement on the server.
  // All subsequent requests for his prepared statement must use this new handle, if specified
  bytes prepared_statement_handle = 1;
}

I think this would be a fairly low overhead and easy extension. Existing clients that support bind parameters would require an update, but existing servers would not. Given that bind parameters are just starting to be used more I think the overall ecosystem impact would be low

See also

See this discussion for more context: https://github.com/apache/arrow-rs/pull/4797/files#r1319807938

Component(s)

FlightRPC

@alamb
Copy link
Contributor Author

alamb commented Sep 14, 2023

cc @lidavidm and @kou

@tustvold
Copy link
Contributor

tustvold commented Sep 14, 2023

no server side state to clean up (via timeout or other mechanism)

I think this alludes to something I think would make the current protocol hard to implement even if you could guarantee that requests are routed to a particular server. The server has no reliable mechanism to tell when it can clean up after a session. This is because, unlike the protocols on which FlightSQL appears to be based, gRPC is not a connection-oriented transport. That is unlike say postgres where a session state can be associated with a particular TCP connection, there is no equivalent concept for gRPC.

Most (if not all) gRPC implementations do not expose connection-level hooks for this reason, as clients should be free to multiplex as many or as few "sessions" on the underlying HTTP/2.0 transport as they deem fit, without this having any implications on the application protocol.

@lidavidm
Copy link
Member

Given the JDBC driver doesn't yet support binding parameters to prepared statements (see #33961) I am not sure how widely used the parameter support is

The ADBC driver has long supported parameter binding.

I think this alludes to something I think would make the current protocol hard to implement even if you could guarantee that requests are routed to a particular server.

Most things rely on a client-side token already (transactions, prepared statements), I think parameters are the only exception.

I think this would be a fairly low overhead and easy extension.

This seems reasonable to me. I suppose you're planning to embed the parameters directly into the client-side handle? This is a little unfortunate since then you're bouncing through Protobuf, but I suppose most systems are only thinking about 'small' sets of parameters anyways (and this can always be optimized if needed).

@lidavidm
Copy link
Member

CC @zeroshade @ywc88 to comment as well

@zeroshade
Copy link
Member

This seems reasonable to me. I suppose you're planning to embed the parameters directly into the client-side handle? This is a little unfortunate since then you're bouncing through Protobuf, but I suppose most systems are only thinking about 'small' sets of parameters anyways (and this can always be optimized if needed).

I agree that this seems to be fairly low overhead, provided that the parameters are few and small in number (which isn't always the case, such as a parameterized INSERT statement etc.) I guess my only real concern is that we end up losing a lot of performance or will require externally preserved state to manage that handle (of course these are orthogonal to FlightSQL as it would be application defined). I wouldn't want the protocol to encourage having to embed the parameters directly in the client-side handle, but I also don't see a better solution to this offhand.

@alamb
Copy link
Contributor Author

alamb commented Sep 14, 2023

I agree that this seems to be fairly low overhead, provided that the parameters are few and small in number (which isn't always the case, such as a parameterized INSERT statement etc.) I guess my only real concern is that we end up losing a lot of performance or will require externally preserved state to manage that handle (of course these are orthogonal to FlightSQL as it would be application defined). I wouldn't want the protocol to encourage having to embed the parameters directly in the client-side handle, but I also don't see a better solution to this offhand.

I agree that the protocol should not require having the parameters in the handle and should not require a new handle to be sent in the response to CommandPrepareStatementQuery . However, allowing for a new handle makes stateless services possible, which I think adds significant value to FlightSQL

In terms of the performance penalty of passing parameter values back and forth, as you say that likely be an application level tradeoff. For example, many systems have special (non SQL) ingest bulk ingest APIs that can avoid INSERT entirely for large values (e.g. the Postgres COPY command is such an API)

@zeroshade
Copy link
Member

So the semantics would be that the response may contain an updated handle and it is the responsibility of the client to check whether it is set, in which case the new handle must be used for future requests, or it is unset, in which case the client continues to use the existing handle?

@lidavidm
Copy link
Member

So the semantics would be that the response may contain an updated handle and it is the responsibility of the client to check whether it is set, in which case the new handle must be used for future requests, or it is unset, in which case the client continues to use the existing handle?

That was my reading, yes. Whether the parameters are embedded there or not, the client would have no clue.

@kou
Copy link
Member

kou commented Sep 15, 2023

I agree that the proposed approach will work.

I thought of some more approaches:

  1. How about returning FlightInfo by DoPut (making GetFlightInfo call omittable) for this use case?

    message DoPutStatementPrepareResult {
      option (experimental) = true;
    
      // If FlightInfo is returned in `DoPut`, the client doesn't call `GetFlightInfo` explicitly.
      // Instead, the client can use the returned FlightInfo.
      FlightInfo info = 1;
    }

    With this approach, parameters aren't needed to send-back to client-side but it breaks consistency. (FlightInfo may be returned from non GetFlightInfo call.)

  2. How about using DoExchange https://arrow.apache.org/docs/format/Flight.html#exchanging-data instead of DoPut + GetFlightInfo + DoGet for this use case?

    With this approach, parameters aren't needed to send-back to client-side but the query execution result can be returned by only one server.

@tustvold
Copy link
Contributor

I like the idea of DoExchange

@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

How about using DoExchange https://arrow.apache.org/docs/format/Flight.html#exchanging-data instead of DoPut + GetFlightInfo + DoGet for this use case?

I think this is a good idea as it reduces the number of required network roundtrips for some cases. However, I still think it is important that we change the existing DoPut based sequence to support stateless services to maintain feature parity

Perhaps we can discuss adding a DoExchange option for prepared statement execution as part of a separate ticket. If you agree I will file one

@lidavidm
Copy link
Member

I asked about it on the original proposal but there didn't seem to be much interest. Though I am somewhat concerned about the number of special cases that might need to be implemented and how servers/clients will negotiate all this.

@lidavidm
Copy link
Member

That said let's file the ticket and discuss there

@zeroshade
Copy link
Member

Personally I do like the DoExchange solution, just adding my two cents

@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

Filed #37741 to track the DoExchange idea

@alamb
Copy link
Contributor Author

alamb commented Feb 12, 2024

I believe @kallisti-dev is working on a specific proposal for this item

@erratic-pattern
Copy link
Contributor

apache/arrow-rs#5433 - client implementation for Rust

#40243 - documentation updates

will submit a PR soon for the golang client and then look at server implementations

erratic-pattern added a commit to erratic-pattern/arrow that referenced this issue Mar 1, 2024
erratic-pattern added a commit to erratic-pattern/arrow that referenced this issue Mar 1, 2024
@lidavidm
Copy link
Member

Issue resolved by pull request 40311
#40311

@alamb
Copy link
Contributor Author

alamb commented Apr 23, 2024

Thank you @lidavidm

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Apr 24, 2024
…t result (apache#40311)

### Rationale for this change
See discussion on apache#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR apache#40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: apache#37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
…t result (apache#40311)

### Rationale for this change
See discussion on apache#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR apache#40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: apache#37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
tolleybot pushed a commit to tmct/arrow that referenced this issue May 4, 2024
…t result (apache#40311)

### Rationale for this change
See discussion on apache#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR apache#40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: apache#37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
rok pushed a commit to tmct/arrow that referenced this issue May 8, 2024
…t result (apache#40311)

### Rationale for this change
See discussion on apache#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR apache#40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: apache#37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
rok pushed a commit to tmct/arrow that referenced this issue May 8, 2024
…t result (apache#40311)

### Rationale for this change
See discussion on apache#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR apache#40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: apache#37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 10, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 10, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 10, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 10, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 10, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 16, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 16, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 16, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 16, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 16, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 22, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 22, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 22, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 22, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue May 22, 2024
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…t result (apache#40311)

### Rationale for this change
See discussion on apache#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR apache#40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: apache#37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue Jun 3, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue Jun 3, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue Jun 3, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue Jun 3, 2024
stevelorddremio added a commit to stevelorddremio/arrow that referenced this issue Jun 3, 2024
kou pushed a commit to apache/arrow-go that referenced this issue Aug 30, 2024
…lt (#40311)

### Rationale for this change
See discussion on apache/arrow#37720 and mailing list: https://lists.apache.org/thread/3kb82ypx99q96g84qv555l6x8r0bppyq

### What changes are included in this PR?

Changes the Go FlightSQL client and server implementations to support returning an updated prepared statement handle to the client as part of the `DoPut(PreparedStatement)` RPC call.

### Are these changes tested?

### Are there any user-facing changes?

See parent issue and docs PR #40243  for details of user facing changes.

**This PR includes breaking changes to public APIs.**

* GitHub Issue: #37720

Lead-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
Co-authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants