Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure grain state serializer per storage provider #7416

Merged
merged 4 commits into from
Jan 20, 2022

Conversation

benjaminpetit
Copy link
Member

@benjaminpetit benjaminpetit commented Dec 2, 2021

Replace #6936

Newtonsoft.Json or the Orleans SerializationManager.

Users can configure the SerializationManager to use a custom serializer for certains types, but this method has two main limitations:

  • The custom serializer will be used for wire-transmission
  • The output is still padded by some binary content from SerializationManager.

This PR enable configuration of custom serializer for grain state per storage provider.

Here is the new way to configure the Azure table storage provider:

siloBuilder.AddGrainStorage(name, configure =>
{
    configure.UseOrleansSerializer(); // Configure the serializer here
    configure.UseAzureTable(options => options.ConnectionString = "xxx");
});

For now, old configuration method are still available, and the GrainStorageSerializer will be used by default.

The configuration may seems more complicated, but maybe we can add some configuration helper methods like that:

siloBuilder.UseAzureTable<JsonGrainStorageSerializer>(options => options.ConnectionString = "xxx")

for serialization implementation that requires none or few configuration steps.

Microsoft Reviewers: Open in CodeFlow

@benjaminpetit benjaminpetit added this to the 4.0.0 milestone Dec 2, 2021
@benjaminpetit benjaminpetit force-pushed the feature/grainstorageserializer branch from ac786a0 to aed253a Compare December 2, 2021 12:45
@veikkoeeva
Copy link
Contributor

@benjaminpetit I see the serializers inherit from IGrainStorageSerializer, such as JsonGrainStorageSerializer.

Is there a way to dynamically design which (de)serializer to use based on { grainType, grainId, grainState }? I am thinking if it were possible to configure a lambda that were called when a new state is being either serialized or deserialized and then that returns a configured IGrainStorageSerializer for the given parameters (it can be a constant too).

Would it be possible to add streaming to IGrainStorageSerializer API with the idea that internally it can also be unstreaming (as IGrainStorage does not stream in any case (yet))?

@benjaminpetit
Copy link
Member Author

I would like to keep the interface as clean as possible.

I think the serializer to use should be configured on the storage provider itself. I guess it's not ideal when using the Grain<T> pattern, but could be easier to setup using facet.

If we were to add parameters to IGrainStorageSerializer, I guess we could add IGrainContext to every operations, would that fit your scenario?

But again, I think a clean lean interface is better, and would avoid the temptation expensive operations at runtime (like doing reflection to get some attribute).

BinaryData should support streams.... but it seems that it's reading and copying the data to another MemoryStream. I wonder how streaming would work with Grain<T> ?

@benjaminpetit benjaminpetit force-pushed the feature/grainstorageserializer branch from f4f87bd to 64084d8 Compare January 5, 2022 16:03
@veikkoeeva
Copy link
Contributor

veikkoeeva commented Jan 6, 2022

@benjaminpetit The scenario I have mind is in a way or another to be able to dynamically decide which (de)serialization library is used for some data. Reasons being of changing storage format (easily) or due to change in how the data arrives. If it is possible in any way, it would be nice. Can you give an example how to get access to the storage provider itself during configuration? I am a bit out of touch regarding this aspect at the moment.

I agree keeping it simple. Maybe easy access to underlying provider is the way to expose "all the functionality there can be".

Indeed, I do not think streaming through persistence is possible at the moment. At most internally within persistence provider it can be done. I think it would be an interesting scenario to enable as in if there are objects megabytes in sizes such as files or maybe continuous streaming of data (this has its own problems, of course). This could make also patching larger chunks of data easier.

@benjaminpetit benjaminpetit force-pushed the feature/grainstorageserializer branch from 442146f to b2ca6e5 Compare January 7, 2022 16:47
@benjaminpetit
Copy link
Member Author

The scenario I have mind is in a way or another to be able to dynamically decide which (de)serialization library is used for some data. Reasons being of changing storage format (easily) or due to change in how the data arrives. If it is possible in any way, it would be nice.

I agree that would be nice, that's what I tried to do in the first PR. But it made the code a bit difficult to manage, since the provider needs to pass and store some additional information.

It should still be possible to change the serialization format, using some ugly try/catch mechanism... Not ideal.

Can you give an example how to get access to the storage provider itself during configuration? I am a bit out of touch regarding this aspect at the moment.

Using storage facets (not updated here), you could specify the serializer to use along with the provider name maybe?

Indeed, I do not think streaming through persistence is possible at the moment. At most internally within persistence provider it can be done. I think it would be an interesting scenario to enable as in if there are objects megabytes in sizes such as files or maybe continuous streaming of data (this has its own problems, of course). This could make also patching larger chunks of data easier.

We definitely could implement new providers for that. But I think we should use facets for that instead of relying on Grain<T>

@@ -81,7 +74,7 @@ public async Task ReadStateAsync<T>(string grainType, GrainReference grainId, IG
return;
}

if (contents == null || contents.Length == 0)
if (contents == null) // TODO bpetit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO pre-merge or post? Either way, we should clean it up

@rafikiassumani-msft rafikiassumani-msft linked an issue Jan 13, 2022 that may be closed by this pull request
@veikkoeeva
Copy link
Contributor

veikkoeeva commented Jan 14, 2022

@benjaminpetit I think I need to get a lot more familiar with facets this spring! Thanks for taking time to explain!

(About streaming, I'm thinking IoT scenarios and various range reads and updates. FlatGeobuf, Arrow (which could be interesting in any case) and such).

@ReubenBond ReubenBond merged commit 1aa703f into dotnet:main Jan 20, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable serializers for grain persistence
3 participants