Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Schema Registry Avro] New Encoder Design #19842

Merged
merged 16 commits into from
Jan 26, 2022

Conversation

deyaaeldeen
Copy link
Member

@deyaaeldeen deyaaeldeen commented Jan 14, 2022

Fixes #20061

Overview

Revamps the schema registry encoder to work on messages instead of buffers based on the recommendation of the Azure messaging architect.

This changes the APIs as follows:

  const buffer: NodeJS.Buffer = await serializer.serialize(value, schema);

becomes

  const message: MessageWithMetadata = await encoder.encodeMessageData(value, schema);

where MessageWithMetadata has a body field as well as a contentType field. The latter's format is avro/binary+<Schema ID>

For derserializing, the change is as follows:

  const deserializedValue = await serializer.deserialize(buffer);

becomes:

  const decodedObject = await encoder.decodeMessageData(message);

Improvement upon #15959

This design introduces a new messageAdapter option in the encoder constructor to support processing of any message type (e.g. cloud event):

  const encoder = new SchemaRegistryAvroEncoder(schemaRegistryClient, {
    groupName,
    messageAdapter: adapter
  });

where adapter is a message adapter that follows the following contract:

interface MessageAdapter<MessageT> {
  produceMessage: (messageWithMetadata: MessageWithMetadata) => MessageT;
  consumeMessage: (message: MessageT) => MessageWithMetadata;
}

 interface MessageWithMetadata {
  body: Uint8Array;
  contentType: string;
}

For convenience, the PR adds a couple of convenience adapter factories for Event Hubs's EventData and Event Grid's SendCloudEventInput<Uint8Array>. For example, the createCloudEventAdapter factory can be called to construct an adapter for the latter as follows:

const adapter = createCloudEventAdapter({
      type: "azure.sdk.eventgrid.samples.cloudevent",
      source: "/azure/sdk/schemaregistry/samples/withEventGrid",
    }),

Note that these adapter factories are exported by their respective messaging package without explicitly implementing the contract and the PR adds new encoder tests that check whether the produced adapters follow the contract. This organization could change in the future if we create a new core place for the contract to be imported from.

See the newly added samples for how to send such messages with Event Hubs and Event Grid.

Schema Registry commitment tracking: #15959
Tracking issue: #18608
First iteration design: #18365

@ghost ghost added the Schema Registry label Jan 14, 2022
@azure-sdk
Copy link
Collaborator

API changes have been detected in @azure/schema-registry-avro. You can review API changes here

1 similar comment
@azure-sdk
Copy link
Collaborator

API changes have been detected in @azure/schema-registry-avro. You can review API changes here

@azure-sdk
Copy link
Collaborator

API changes have been detected in @azure/event-hubs. You can review API changes here

API changes

+ export declare function createEventDataAdapter(params?: EventDataAdapterParameters): MessageAdapter<EventData>;
+             [key: string]: any;
+         };
+ }
+ export interface EventDataAdapterParameters {
+     correlationId?: string | number | Buffer;
+     messageId?: string | number | Buffer;
+     properties?: {

@deyaaeldeen deyaaeldeen force-pushed the schemaregistry/new-avro-encoder branch from b3d505a to 068d9de Compare January 20, 2022 00:19
@deyaaeldeen
Copy link
Member Author

@ellismg Hey Matt, I am updating @azure/event-grid to export cloud event adapter for the Schema Registry Avro encoder, could you take a look? Also, the APIView warnings are because I am not exporting the adapter interface for now. Eventually, we could move it to a place where all messaging libraries could import it from.

Also, could you take a look at the withEventGrid sample being added here?

Copy link
Member

@xirzec xirzec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New API looks great!

Can you also file an issue for tracking moving the common interfaces to some core package if you haven't already?

@@ -344,6 +344,17 @@ export interface CloudEvent<T> {
type: string;
}

// @public
export interface CloudEventAdapterParameters {
dataschema?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataSchema? Or is this an unfortunate external standard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that this particular field is not coming from the cloud event spec but I noticed that all fields in the SendCloudEventInput<T> follow the same all-lower-case pattern. /cc @ellismg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdk/eventgrid/eventgrid/review/eventgrid.api.md Outdated Show resolved Hide resolved
sdk/eventgrid/eventgrid/src/cloudEventAdapter.ts Outdated Show resolved Hide resolved
sdk/eventhub/event-hubs/review/event-hubs.api.md Outdated Show resolved Hide resolved
@deyaaeldeen
Copy link
Member Author

@ellismg Hey Matt, @JoshLove-msft and I talked and we agreed on updating Event Grid later when we hear more from Event Hubs customers. I moved Event Grid changes to a draft in #20037.

@deyaaeldeen deyaaeldeen requested a review from xirzec January 25, 2022 00:26
@azure-sdk
Copy link
Collaborator

API changes have been detected in @azure/event-hubs. You can review API changes here

@@ -79,6 +82,15 @@ export interface EventData {
};
}

// @public
export interface EventDataAdapterParameters {
correlationId?: string | number | Buffer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we choose these properties for the adapter parameters? Do you have an example of how they are used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're chosen by selecting everything from the message type sane the payload and content type fields. See the EventData type for reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will need to be kept in sync with the EventData surface manually? Is there a way to define it once and use it in both places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could take out those fields from EventData and make it extend this new interface instead. The only drawbacks I can think of for doing this is if the customer wants to look at the type, they'll have to hop into another place to see the rest of it. @ramya-rao-a should we expect EventData to gain new fields in the future? if so, how do you feel about splitting EventData interface into two like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep EventData as is and you can use the Omit magic to omit payload and content type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair. I remember that @bterlson does not like Omit to show up in our public APIs so I will just keep things as they're assuming that we do not expect EventData to evolve. I can add a comment to point out the coupling between it and this interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unit test that checks whether EventDataAdapterParameters has the same keys as Omit<Omit<EventData, "body">, "contentType">.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unit test that checks whether EventDataAdapterParameters has the same keys as Omit<Omit<EventData, "body">, "contentType">.

Nit: Omit<Omit<EventData, "body">, "contentType"> can be expressed as Omit<EventData, "body" | "contentType">.

Copy link
Member

@jeremymeng jeremymeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I added some minor comments

sdk/schemaregistry/schema-registry-avro/README.md Outdated Show resolved Hide resolved
@@ -9,3 +9,12 @@ SCHEMA_REGISTRY_GROUP=<Group name for schemas in registry>
AZURE_TENANT_ID=<AD tenant id or name>
AZURE_CLIENT_ID=<ID of the user/service principal to authenticate as>
AZURE_CLIENT_SECRET=<client secret used to authenticate to Azure AD>

# Used in samples that use Event Hubs. Retrieve these values from an Event Hub in the Azure portal.
EVENTHUB_CONNECTION_STRING=<Event Hub connection string>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it okay for the samples to depend on a beta version of event hubs? Like in the case of samples of this lib are published but event hubs hasn't released yet. Probably fine if they are published not too long apart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will double check, but I believe the release pipeline will not let me release @azure/schema-registry-avro without releasing @azure/event-hubs@5.7.0-beta.2 first since the semver for the EH dep is ^5.7.0-beta.2.

sdk/schemaregistry/schema-registry-avro/README.md Outdated Show resolved Hide resolved
Copy link
Member

@xirzec xirzec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API surface looks good!

@deyaaeldeen deyaaeldeen merged commit be4578a into Azure:main Jan 26, 2022
@deyaaeldeen deyaaeldeen deleted the schemaregistry/new-avro-encoder branch January 26, 2022 01:42
@deyaaeldeen
Copy link
Member Author

Can you also file an issue for tracking moving the common interfaces to some core package if you haven't already?

I created #20086

azure-sdk pushed a commit to azure-sdk/azure-sdk-for-js that referenced this pull request Aug 5, 2022
[Hub Generated] Review request for Microsoft.MachineLearningServices to add version preview/2022-06-01-preview (Azure#19842)

* Adds base for updating Microsoft.MachineLearningServices from version stable/2022-05-01 to version 2022-06-01-preview

* Updates readme

* Updates API version in new specs and examples

* Add idleShutdownSetting to Compute Instances (Azure#19748)

* Added idleTimeBeforeShutdown to compute instance properties

* Added api UpdateIdleShutdownSetting

Co-authored-by: Teo Magnino Chaban <teom@microsoft.com>

* Add mfe.json 2022-06-01-preview changes (Azure#19740)

* update mfe.json with 2022-06-01-preview spec

* update go.md with preview-2022-06

* add AutoMLJob examples (initial)

* add LabelingJob examples (initial)

* add Schedule examples (initial)

* prettier

* update AutoMLJob examples

* update LabelingJob examples

* update LabelingJob examples

* prettier

* typo fixes

* add Vectorizer to custom-words.txt

* add missing

* is this a breaking change?

* Revert "update go.md with preview-2022-06"

This reverts commit b49c64ae0eb258069251c8c13085fd1e5dde56e2.

* Updating Swagger Spec for MLC for CustomServices (Azure#19781)

* Updating Swagger Spec for MLC for CustomServices

* Added Examples

* Removing update API in favor of exisiting Patch API and Fixing some Validation issues

* Fixed Pretty Issue

* Fixed Id Issue

* Changed case

* Updated Examples

* Changing Description

* Added nullable fields

* Added update APIs

* Fixing validation errors

* Fix model validation

Co-authored-by: Srivatsa Sinha <srsinha@microsoft.com>

* Add encryption property to workspace update parameters (Azure#19893)

* Add encryption property to workspace update parameters

Adding the ability for users to update CMK keyIdentifier property for workspace update.

* Apply prettier changes

Co-authored-by: Joshua Loeffler <jloeffler@microsoft.com>

* Add enum value to OutputDataDeliveryMode (Azure#19964)

* add soft delete related properties (Azure#19907)

* add soft delete related properties

* resolve lint diff issue that enum name duplicaton

Co-authored-by: Xi Jin <jinxi@microsoft.com>

* Remove SoftDeleteEnabled. (Azure#20005)

* reorder DeploymentResourceConfiguration Definition (Azure#20057)

Co-authored-by: Teddy Todorov <forteddyt@gmail.com>
Co-authored-by: teochaban <teochaban@hotmail.com>
Co-authored-by: Teo Magnino Chaban <teom@microsoft.com>
Co-authored-by: Teddy Todorov <thtodoro@microsoft.com>
Co-authored-by: srivatsasinha <102133347+srivatsasinha@users.noreply.github.com>
Co-authored-by: Srivatsa Sinha <srsinha@microsoft.com>
Co-authored-by: jloeffler7 <102193043+jloeffler7@users.noreply.github.com>
Co-authored-by: Joshua Loeffler <jloeffler@microsoft.com>
Co-authored-by: XI JIN <giovannijin1990@gmail.com>
Co-authored-by: Xi Jin <jinxi@microsoft.com>
Co-authored-by: ZhidaLiu <86350902+ZhidaLiu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Schema Registry Avro] Improve the encoder according to architects feedback
6 participants