-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pulsar Functions: allow a Function<GenericObject,?> to access the original Schema of the Message and use it #14847
Pulsar Functions: allow a Function<GenericObject,?> to access the original Schema of the Message and use it #14847
Conversation
@congbobo184 This patch is for early preview, I am going to add integration tests that covers the example function. |
This is another kind of Function that is unblocked by this patch.
|
@eolivelli:Thanks for your contribution. For this PR, do we need to update docs? |
192037b
to
c025e5a
Compare
5ffb1ad
to
0ef1cc6
Compare
0ef1cc6
to
fcde959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
fcde959
to
4057ce0
Compare
…ginal Schema of the Message and use it
4057ce0
to
01c0b70
Compare
…ginal Schema of the Message and use it (apache#14847)
…ginal Schema of the Message and use it (apache#14847) (cherry picked from commit 193f5b2)
Motivation
Currently a Function cannot access the original Schema of the Message but it only receives AutoConsumeSchema that is a special schema that is not suitable to Producing messages.
This is an example of Identity Function that picks any message, in spite of the Schema and writes it to a output topic.
This kind of Functions must also work well with KeyValue<GenericRecord, GenericRecord> input messages, and preserve the schema properties (like KeyValueEncoding.SEPARATED, or the SchemaType of the components).
Please note that Function<GenericObject, GenericObject> cannot work, because GenericObject (or GenericRecord) does not carry full Schema information, so you cannot set a Schema to the output Record just by returning a POJO or a GenericObject. The user has to use
newOutputMessage(topic, Schema)
Modifications
Unwrap AutoConsumeSchema in PulsarSource, when we pick a Message from the Pulsar topic, and set on the PulsarRecord the wrapped Schema.
Verifying this change
I will add tests