-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Data Pipeline Operators #1378
Draft
WhitWaldo
wants to merge
19
commits into
dapr:master
Choose a base branch
from
WhitWaldo:dapr-data-operator
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
…ention for each operation. Added more unit tests, including E2E of standard and duplicated operations. Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
For some time now, there's been an active discussion about how much the SDK should do in terms of handling serialization for developers. I'm in the camp that the SDK should do exactly as much as the developer wants: that is, is the developer wants to provide the SDK some generic type and get that type back, the SDK should handle all the serialization necessary behind the scenes.
Today, the SDK does this, but it handles it differently throughout the clients.
Dapr.Client
usesSystem.Text.Json
under the hood and while the options can be themselves customized by the developer, the serializer itself cannot be replaced with another.Dapr.Actors.Client
gives developers the option of using Data Contracts orSystem.Text.Json
, but again the former is baked into the project and the developer can only configure it with the provided options.I've seen issues like this and requests in Discord for support for compression and today, there's not a great place to put that in the SDK. Do we want to bake in support for Gzip, Zip and LZMA? How are options configured for each?
I've also thought there could be an opportunity for interplay between different functionality. Today, the state store provides transparent encryption handled by the runtime with a few limited options, but what if the developer wants to utilize the more broadly capable cryptography block to perform that encryption/decryption operation in a more customizable manner?
I can think of other such operations a developer might want to optionally run against data they're putting into and taking out of the various SDKs:
I propose that instead of making all this declaratively done uniformly on a per-component basis in the runtime, it should instead be done at the SDK level (utilizing the runtime functionality as able).
However, the .NET SDK as built today doesn't have a clear place to put any of that, much less make it optional. As I intend to make the SDK more DI-friendly, I would like to further I propose a solution to this problem adding a shared functionality that heavily relies on DI to provide an opt-in bidirectional data processing pipeline on a per-type basis.
Terminology
I'll review some terminology first so we're on the same page.
DaprEncodingPipeline
converts from the generic type passed in to theReadOnlyMemory<byte>
it produces and aDaprDecodingPipeline
handles the reverse fromReadOnlyMemory<byte>
back to the generic type.DaprDataPipeline
used to process or reverse a payload.Now, this system relies heavily on the DI registration of the various operations used so it can inject them later on, but it does not rely on reflection, by design. I'll cover registration itself in a moment. Do note that if the following seems more convoluted than it should be, I agree, but C# doesn't support second-order generics and I really didn't want to get into reflection, so we're stuck with what we have until that happy day.
Pipeline Lifecycle
The processing lifecycle and the types of operations are as follows in this order (note that you only need to know the interface names if you're going to extend this pipeline with your own operations):
IDaprTStringTransitionOperation
.IDaprStringBasedOperation
will then process this string value and return a string in the order in which they were specified in the type's attribute. More on this later.ReadOnlyMemory<byte>
by theIDaprStringByteTransitionOperation
.IDaprByteBasedOperation
will then process this byte array and return aReadOnlyMemory<byte>
in the order in which they were specified in the type's attribute.It is mandatory that if the developer opts into this pipeline, they must register at least a serializer and an encoder and that the other operations are optional.
This implementation provides the following:
System.Text.Json
inSystemTextJsonSerializer
implemented as anIDaprTStringTransitionOperation
.Utf8Encoder
implemented as anIDaprStringByteTransitionOperation
.Sha256Verifier
implemented as anIDaprByteBasedOperation
GzipCompressor
implemented as anIDaprByteBasedOperation
Dependency Injection Registration
So how should these types be registered in DI so we can ensure they're mapped up to the right lifecycle portions? There's a fluent registration API provided that looks like the following:
Operation lifetimes default to singletons, but can be registered as scoped or transient. They can be registered as types or via a
Func<IServiceProvider>
in which it is provided an instance ofIServiceProvider
in case other services need to be injected for the operation registration. You must provide a registration for the operation if it is going to be referenced in an attribute, and that's an opportunity for an analyzer down the road. You're welcome to register as many compressors, serializers, integrity operations and encoders as you'd like here as they'll only be used as indicated in the pipeline attribute.Defining operational order
The order in which the operations are processed are important as one cannot get from a generic type to a
ReadOnlyMemory<byte>
and do things in the middle without some sort of process, but as the developer is welcome to do any number of string-based and byte-based operations between the mandatory serialization and encoding steps, the system needs to understand the order so it can properly run the operation in reverse when the developer pulls the data back out of the backing store. The developer can specify the order using theDataPipelineAttribute
as illustrated below:Now, order is important, but only with regards to the
IDaprStringBasedOperation
andIDaprByteBasedOperation
types. Order will be recognized from first to last as provided in the arguments for this attribute with regards to each of those types, but the encoder and serializer and the other types can be defined anywhere at all. If more than oneIDaprTStringTransitionOperation
(e.g. serializer) orIDaprStringByteTransitionOperation
(e.g. encoder) is specified, only the first is used and the others ignored. You're also welcome to duplicate operations in the attribute as much as you'd like and it will run them in their lifecycle order in the order presented. For example:This will run the operations in the following order:
Some operations may need to create and persist metadata (e.g. hard to validate an operation down the line without persisting the hash), so the pipeline supports a shared dictionary of string-based values keyed to the ordered type that created it. In the same vein, the pipeline retains the order in which the operations are performed. This because reverse operations are performed in the order specified in the metadata irrespective of the order on the type attribute (only matters during forward processing). That said, any configuration options provided to each operation are not persisted in the encoding exercise, so any migration using different options would require versioning these operations in this version of the implementation.
Conclusion
On the SDK side then, there need only be a check on the optionally injected
DataPipelineFactory
to see if it was populated or not and if it is either:This is submitted as a draft for review, but is fully functional and tested at the time of this writing.
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list: