Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecuteStoredProcedureStreamAsync is only half-Stream #1059

Closed
joshlang opened this issue Nov 27, 2019 · 8 comments · Fixed by #1061
Closed

ExecuteStoredProcedureStreamAsync is only half-Stream #1059

joshlang opened this issue Nov 27, 2019 · 8 comments · Fixed by #1061
Labels
discussion-wanted Need a discussion on an area feature-request New feature or request

Comments

@joshlang
Copy link
Contributor

tldr: I would like an overload of ExecuteStoredProcedureStreamAsync which takes a Stream as a parameter.

Scripts.ExecuteStoredProcedureStreamAsync returns a ResponseMessage (aka Stream) so we can handle it however we wish. Streams are handy because we can control serialization (and other reasons).

However, it takes dynamic[] as a parameter representing the parameters to the stored procedure. In ScriptsCore, this is converted to a stream using this.clientContext.CosmosSerializer.ToStream<dynamic[]>(parameters);. In other words, it uses CosmosSerializer.

While I suppose this is fine, I would like more control over when things get serialized.

From a higher-level point of view, I would prefer that my entire experience with Cosmos is Stream-based, and that when using Stream methods, there's "no such thing" as serialization. In other words, Stream-based methods should never serialize nor deserialize "user" objects (but can still serialize internal structures, eg: change feed [documents:[...]]).

Challenges with adding an overload to ExecuteStoredProcedureStreamAsync:

  • The system expects an array but the user passes something else, and as a stream, we can't tell the difference until the server probably gives us a BadRequest response.
  • dynamic[] parameters and Stream streamPayload cannot be in the same position because null can be passed in and this would be a breaking change

Mitigations:

If the user serializes something that's not an array, just allow it and let the server blow up with a BadRequest. No big deal.

As to the parameter positions:
Option 1: Change the positions. It seems "awkward" but we can do string storedProcedureId, Stream streamPayload, PartitionKey partitionKey which is safely differentiated from string storedProcedureId, PartitionKey partitionKey, dynamic[] parameters
Option 2: Change the name. But I can't think of a non-awkward alternative name for ExecuteStoredProcedureStreamAsync.

I'll post a PR for this. But I realize I might be stepping on some design-toes so feel free to consider the PR as just a further explanation to this issue.

@joshlang
Copy link
Contributor Author

This would also bring ExecuteStoredProcedureStreamAsync into line with other stream methods like CreateItemStreamAsync, UpsertItemStreamAsync, and ReplaceItemStreamAsync, which all accept a stream payload and return a ResponseMessage.

@kirankumarkolli
Copy link
Member

Stream API's are scenarios where the payload is given to the middle tiers and they are just streamed to Cosmos service (no serialization/de-serialization in SDK/service except for indexing).

With SPROC the stream payload do need to be de-serialized at-least once by SPROC on service and then process those operations. So there does exist some de-serialization and serialization still.

In 3.4.0 Cosmos included support for TranasctionalBatch (#965) which enables pure stream based batching of operations. Can you please take a look at them once and see if they are better fit for your scenarios?

@kirankumarkolli kirankumarkolli added discussion-wanted Need a discussion on an area feature-request New feature or request labels Dec 2, 2019
@joshlang
Copy link
Contributor Author

joshlang commented Dec 2, 2019

My goal: Execute a stored procedure without the SDK needing to serialize or deserialize the payload.

The ExecuteStoredProcedureStreamAsync overload I added in ScriptsCore accepts a stream and it returns a ResponseMessage (with a stream). This fits my needs (and seems to fit with the goal of the SDK when it comes to streams).

The existing overload didn't accept a stream - only a dynamic[], which gets serialized by the cosmos serializer.

Once it's out of the SDK's hands, I don't care what Cosmos does with it. I just don't want the SDK to serialize/deserialize it.

...

Separately, regarding TransactionBatch. TransactionBatch was a great idea. Prior to that, we had our own custom processBatch.js which simply took createDocs, replaceDocs, deleteDocs parameters and processed them as a batch. Then when TransactionBatch was introduced, we immediately ditched the stored procedure in favor of it! But then we ran into #1057 ... and now we're back to our processBatch.js solution.

Which did bring me to create this issue & PR. In our software, when we add to a "batch" that we're preparing, we immediately serialize the payload. Later, when the batch is executed, we've basically got some List<MemoryStream> which we combine and send to ExecuteStoredProcedureStreamAsync.

@joshlang
Copy link
Contributor Author

joshlang commented Dec 2, 2019

Regarding use cases other than our own. While it's true that often parameters are scalar, they're often documents (without any relation to Batch operations). Example: https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-write-stored-procedures-triggers-udfs#stored-procedures (second example) ... The control over serialization/deserialization for things like this would still be useful.

@kirankumarkolli
Copy link
Member

Stream API's makes fully sense if the required payload comes to application layer (WebAPI etc...) already as stream. If the application is preparing the stream then serialization is happening at-least once on client (with SDK or application).

We are not against having a stream input API for SPROC. But best is to address #1057.

/cc: @ealsur, @abhijitpai

@joshlang
Copy link
Contributor Author

joshlang commented Dec 2, 2019

I would love for #1057 to be addressed

But I do think that without this issue being fixed, the SDK is inconsistent.

Everywhere a document might be sent as part of the payload, the methods take a stream:

public abstract class Container
{
...
Task<ResponseMessage> CreateItemStreamAsync(Stream streamPayload...
Task<ResponseMessage> ReplaceItemStreamAsync(Stream streamPayload...
Task<ResponseMessage> UpsertItemStreamAsync(Stream streamPayload...
...
}
public abstract class TransactionalBatch
{
...
TransactionalBatch CreateItemStream(Stream streamPayload...
TransactionalBatch ReplaceItemStream(string id, Stream streamPayload...
TransactionalBatch UpsertItemStream(Stream streamPayload...
...
}

Except here. parameters:

public abstract class Scripts
{
...
Task<ResponseMessage> ExecuteStoredProcedureStreamAsync(
    string storedProcedureId,
    PartitionKey partitionKey,
    dynamic[] parameters,
    StoredProcedureRequestOptions requestOptions = null,
    CancellationToken cancellationToken = default(CancellationToken));
...
}

Adding this would bring it in line with the rest.

Task<ResponseMessage> ExecuteStoredProcedureStreamAsync(
    string storedProcedureId,
    Stream streamPayload,
    PartitionKey partitionKey,
    StoredProcedureRequestOptions requestOptions = null,
    CancellationToken cancellationToken = default(CancellationToken));

@abhijitpai
Copy link
Contributor

@joshlang - what is the processBatch.js you wrote intended for - are you trying to get higher throughput by trading off latency of individual calls? We have bulk functionality in the SDK (blog here: https://medium.com/@Ealsur/azure-cosmos-db-net-sql-sdk-bulk-support-9dea1d0ee023) which automatically groups operations into server requests and attempts to achieve the same.

If you are instead trying to achieve atomic multi-document transactions, do you have a bound on the number and size of the operations - what are these bounds? In the absence of such bounds, calls would hit some other limit even if we can figure out a way to bump up the 100 operation limit.

@joshlang
Copy link
Contributor Author

joshlang commented Dec 5, 2019

@abhijitpai Transactions.

The bounds are the 2MB payload limit.

What I'd like is predictability in transactions. It's one of the reasons I was so excited to see TransactionalBatch and why I was so disappointed to run into #1057 the 100 document limit.

I suppose that 100 documents is... well, predictable.

The way I see it is that stored procedures have an unpredictable ability to do work before saying "Welp, that's all for you this time!". But whatever that limit is, it seemed to handle whatever we threw at it within the 2MB payload limit without any problems.

My perception when I saw TransactionalBatch was that now there was predictable functionality and as long as I stayed within the 2MB payload limit, the server would accept it.

In many ways, 100 documents is more predictable than 2MB. Maybe it's the right way to go (if you need limits). Selfishly, I'd prefer that only the 2MB limit existed. Even more selfishly, I'd prefer that there were no limits at all. But knowing the feasibility of this in your back-end is beyond me.

kirankumarkolli pushed a commit that referenced this issue Dec 12, 2019
* #1059 - Introducing ExecuteStoredProcedureStreamAsync overload accepting a stream as a parameter

* Update changelog to add draft pull request #1061

* Fixing ExecuteTestWithStreamParameter to use the streaming version in both calls.

* Fixing a broken test.  Serialize parameters must be an array.

* Executed "testbaseline.cmd /update" to update DotNetSDKAPI.json

* Adding unit tests for stored procedure calls to cover some edge cases: Unparseable json, not-an-object, "null" literal, and array-like objects

* Correction to docs:  An Object is accepted (but ignored if not array-like) by the stored procedure executor.

* Additional edge cases covered in unit tests

* Adding ExecuteStoredProcedureStreamAsync to ScriptsInlineCore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-wanted Need a discussion on an area feature-request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants