schema registry support #110

brewkode · 2021-11-19T00:11:19Z

I started looking into franz-go and I'm quite excited with what I'm seeing. I wanted to check if there's plan to support schema registry as part of this project. I checked existing issues, did not find anything relevant.

twmb · 2021-11-19T01:20:54Z

Supporting a schema registry itself is easy enough (http requests), the problem is encoding/decoding. Produce expects already serialized structs, and PollFetches gives you raw data.

Some registry flow would have to go in front of Produce and at the end of PollFetches. I'm not quite sure what that API would look like, or if I should even provide an API to produce structs, vs. just having a separate registry package.

I think the clearest API would be for an end user to say "encode/decode type T with schema S". This sidesteps problems of different languages encoding schemas differently based on types (nullable stuff in Java, non-nullable in Go), but it means that you have to generate schemas ahead of time, as well as register types to schemas ahead of time, before encoding/decoding.

What do you think?

Also I don't have time for this right now, but perhaps in a month or two unless somebody else gets to it or if I find some random time.

brewkode · 2021-11-19T15:26:47Z

I agree, the schema registry api support itself is a bunch of http requests.. Tying out schema during a produce/consume workflow means we need a serde layer.

I think the clearest API would be for an end user to say "encode/decode type T with schema S". This sidesteps problems of different languages encoding schemas differently based on types (nullable stuff in Java, non-nullable in Go), but it means that you have to generate schemas ahead of time, as well as register types to schemas ahead of time, before encoding/decoding.

The fundamental premise of a schema registry is to provide some level of type guarantees to the events/messages in a topic. So, for most workflows a schema is already generated and registered ahead of time. Here's some doc on Serialization formats supported by the confluent schema registry - https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html

I do think a separate API to encode/decode T with Schema S is cleaner and thus can decouple the serde layer more concisely. Some thoughts around this are:

should the schema registry serde be passed in as kgo.Opt ? if we wanted the entire schema registry support as a separate go package, then, I need to look at the codebase to see if we would run into dep cycles with the way its structured today, etc. I'm not saying, we will, just saying, I don't know.
can we leverage the Hooks infra to transparently to serde based on the serde opt ? Is it too much magic ?

twmb · 2021-11-29T19:08:09Z

AFAIK, this cannot be done with generics for the kgo.Client now because it would be a breaking change. Go does not allow generic methods, so the entire kgo.Client type would need to be generic.

Hooks are also not an option for essentially the same reason. The only way to do it with hooks would be to pass an interface{}.

I think a wrapper type may be possible, or some top level package functions (which would be a bit ugly).

peterbourgon · 2021-12-05T21:14:30Z

if I should even provide an API to produce structs, vs. just having a separate registry package

In Go, even with generics, it's awkward to the point of infeasibility to generate instances of types dynamically. I think you'd want to offload most of the responsibility to your callers, i.e. to deserialize you'd provide a schema in one of the supported formats along with some bytes, and callers would hydrate a struct which they themselves have defined and which would be opaque to you; to serialize you'd provide a schema and they'd return you some bytes.

vtolstov · 2021-12-06T06:22:13Z

may be if we have func that can register some schema desc with go struct like
RegisterSchema(schema string, iface interface{})
we can decide, create new copy of passed interface and unmarshal to it?

owenhaynes · 2022-01-28T12:48:29Z

Yeah I feel this is best left to the calling code or a wrapper around the client.

Have something mocked up locally

type Decoder interface {
	Decode(ctx context.Context, record *kgo.Record) (interface{}, error)
}

This get called by a wrapper on each record and the caller passes a consume function which looks like this:

func(ctx context.Context, record *kgo.Record, value interface{})

Go Generics then can have a middleware type thing to do the casting for you:
Something like

func ConsumeAs[T any](consume func(ctx context.Context, record *kgo.Record, value T)) func(ctx context.Context, record *kgo.Record, value interface{}) {
	return func(ctx context.Context, record *kgo.Record, value interface{}) {
		c := value.(T)
		consume(ctx, record, c)
	}
}

twmb · 2022-05-23T00:29:25Z

I've added some support for the schema registry in these two commits:

This currently exists as a separate, unstable module github.com/twmb/franz-go/pkg/sr. I've tested most of the HTTP API, and I aim to test the Serde client soon (tm) to see if it's an alright API for actual usage in code.

I've punted on the entire issue of accepting arbitrary types because there really isn't a way to do it properly in Go. The Serde option is actually close to what was proposed above by @vtolstov -- to encode or decode a type, you must register the ID and the type along with its encode or decode functions ahead of time. I'm going to close this for now this this is fully implemented an in a separate module, but if anybody in this thread could wants to play with the unstable API, that'd be great. I'm probably going to leave this as a separate module for now because I don't trust the HTTP API not changing underfoot, which may prompt major version bumps of the Go API, and I'd like to keep that separate from franz-go itself.

I'm mostly ok with the Client API (although query parameters get odd fast), and I think the Serde API is ok though I have yet to use it -- I'll probably 1.0 this separate module with franz-go v1.6 (and I might make it a part of franz-go proper, but am leaning no per reasons just above).

twmb · 2022-05-23T00:30:23Z

I'll also add an example or two of how to use this new package, and mention it in the readme, so that the usage is a lot more obvious. This will be done before 1.6.

twmb · 2022-05-26T05:17:13Z

Example is added here, and I've tested the sr.Serde type as well: efc48f0

yuzhichang mentioned this issue Dec 3, 2021

schema registry housepower/clickhouse_sinker#137

Open

twmb mentioned this issue May 9, 2022

1.6 release status #164

Closed

10 tasks

twmb closed this as completed May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema registry support #110

schema registry support #110

brewkode commented Nov 19, 2021

twmb commented Nov 19, 2021

brewkode commented Nov 19, 2021

twmb commented Nov 29, 2021

peterbourgon commented Dec 5, 2021

vtolstov commented Dec 6, 2021

owenhaynes commented Jan 28, 2022

twmb commented May 23, 2022

twmb commented May 23, 2022

twmb commented May 26, 2022

schema registry support #110

schema registry support #110

Comments

brewkode commented Nov 19, 2021

twmb commented Nov 19, 2021

brewkode commented Nov 19, 2021

twmb commented Nov 29, 2021

peterbourgon commented Dec 5, 2021

vtolstov commented Dec 6, 2021

owenhaynes commented Jan 28, 2022

twmb commented May 23, 2022

twmb commented May 23, 2022

twmb commented May 26, 2022