transform-sdk: Introduce sr package #13049

rockwotj · 2023-08-28T18:46:59Z

Introduce a sr (schema registry) client within our tinygo SDK.

Exposes the ability to read/write from schema registry in a limited way.
We currently only support registring, and looking up schemas.

Additionally add a test case that shows the ability to convert from avro
-> json using a schema registry encoded schema.

NOTE: The host side of the schema-registry was checked in as part of #12666

Backports Required

Release Notes

none

emaxerrno · 2023-08-28T19:44:14Z

src/go/transform-sdk/sr/client.go

+	if ok {
+		return cached, nil
+	}
+	s, err = sr.underlying.LookupSchemaById(id)


rockwotj · 2023-08-29T01:43:03Z

CI Failure: #12120

dotnwat

c++ bits look good to me. since it's SR stuff maybe @BenPope @NyaliaLui @oleiman also wanna take a peak

BenPope

I've had a glance at this. In general it looks ok to me.

Without documentation it's hard to know what limitations are expected.

For example, I see no [en|de]coding of the protobuf message offsets in the wire format.

If possible, it would be nice to break commits into slightly smaller than 1200 line chunks so that discrete functionality is easier to review.

rockwotj · 2023-08-30T16:31:49Z

Without documentation it's hard to know what limitations are expected.
For example, I see no [en|de]coding of the protobuf message offsets in the wire format.

As far as I know Tinygo doesn't support protobuf right now: tinygo-org/tinygo#2667

I do have some tasks to look into if that's fixed in the most recent version, so I'm going to give that a go then I may have to have a followup PR to add the message offsets, in a similar manner to Tinygo. Honestly I'd love to have a much better API here, but that needs a lot of time to flesh out the right API, which will have to come later.

If possible, it would be nice to break commits into slightly smaller than 1200 line chunks so that discrete functionality is easier to review.

Apologies, I had meant to break this up before I submitted this for review. I've gone ahead and broken this up now

BenPope · 2023-09-05T09:38:45Z

src/go/transform-sdk/sr/client.go

+package sr
+
+// schemaId is an ID of a schema registered with schema registry
+type schemaId uint32


In Redpanda, this is signed, is the difference intentional?

redpanda/src/v/pandaproxy/schema_registry/types.h

Line 75 in 57abf1c

using schema_version = named_type<int32_t, struct schema_version_tag>;

Good catch - it was not.

src/go/transform-sdk/sr/encoding.go

BenPope · 2023-09-05T09:54:26Z

src/go/transform-sdk/sr/client.go

+
+type (
+	clientOpts struct {
+		disableCaches bool


I think specifying a cache size might give some more flexibility here, otherwise the cache is unbounded.

Agreed, my thought process was:

I expect people to likely call LookupSchemaById for every record initially, so we should cache by default

schemas change infrequently

changes in schema will likely also mean code will be deployed soon (because you'll want your transform to use the new data your schema just added right?), resetting the cache.

if you want to do something custom we should just disable caches and let you write whatever fancy caching logic you want.

I've documented the unbounded cache default more and added a simple entry count based map. My hope is that is simple enough for basic usage and more complex use cases can just disable this level of caching then use whatever custom logic they want.

src/go/transform-sdk/sr/client.go

rockwotj · 2023-09-06T01:15:09Z

CI Failures: #13181, #13278

BenPope

Are there any useful go-packages with cache implementations? I wonder if LRU or LFU evictions would be better, and it's mostly a solved problem.

Most of them seem to be "thread safe", which I assume is undesirable?

BenPope · 2023-09-07T11:18:53Z

src/go/transform-sdk/internal/cache/cache.go

+		value           V
+		insertionNumber int
+	}
+	// A cache that evicts based on number of entries


It appears to use FIFO eviction, correct?

Correct, updated the documentation.

BenPope · 2023-09-07T11:21:12Z

src/go/transform-sdk/internal/cache/cache.go

+
+// Put adds an entry into the cache
+func (c *Cache[K, V]) Put(k K, v V) {
+	c.underlying[k] = entry[V]{value: v, insertionNumber: c.latestEntryInsertionNumber}


This temporarily inserts one more than the limit, which means if you're really unlucky, the number of buckets could be twice what's required.

Nice catch - fixed.

BenPope · 2023-09-07T11:26:22Z

src/go/transform-sdk/internal/cache/cache.go

+	return len(c.underlying)
+}
+
+func (c *Cache[K, V]) prune() {


The values (overall) are quite large, so I wonder if it's worth choosing a different data structure that would trade off some memory for bookkeeping, but reduce insertion time?

rockwotj · 2023-09-07T15:01:01Z

Are there any useful go-packages with cache implementations? I wonder if LRU or LFU evictions would be better, and it's mostly a solved problem.
Most of them seem to be "thread safe", which I assume is undesirable?

All of the major ones that I've seen use goroutines and currently are not supported in the way we use tinygo + wasm, or don't have friendly licenses. I've written a small LRU cache that hopefully works good enough for 99% of users, if folks want to do something more complex they can disable the cache and put something on top of our client.

These functions where already defined in redpanda-data#12666 and their encoding. We also add a stub version of the ABI for IDEs that don't use the tinygo build tags (and we can build/test this package in standard go). Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

The corresponding side of this contract on the broker lies in src/v/wasm/schema_registry_module.cc Also add tests that this can be roundtripped. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

This wraps the ABI exposed in previous commits in a user friendly fashion. By default all data from schema registry is cached forever in the client, since we expect schemas to change infrequently this seems to be the right tradeoff. This also is the default in most kafka clients. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

The API is mostly the same as the one in franz-go, but it uses generics for some additional typesafety. We currently only support avro schemas here. At the time of writing, Redpanda does not support JSON schema, and tinygo does not support protocol buffers. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

We'll use this transform to test the schema registry code in Redpanda. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

This will allow us to test the schema registry functionality in wasm without needing to pull in or spin up all of Redpanda. It's very simple and certainly wrong, but will be enough functionality to write some simple tests that use schema registry. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

This ensures our ABI contract works with the SDK and we can perform basic operations using the fake schema registry. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

client To give more control than all or nothing. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj · 2023-09-07T15:14:20Z

Force push: fixed adding in the client then removing it in the first two commits.

github-actions bot added area/redpanda area/wasm WASM Data Transforms labels Aug 28, 2023

rockwotj requested review from dotnwat, BenPope and michael-redpanda August 28, 2023 18:50

rockwotj force-pushed the sr branch 2 times, most recently from 978f052 to fc67782 Compare August 28, 2023 19:16

emaxerrno reviewed Aug 28, 2023

View reviewed changes

dotnwat previously approved these changes Aug 30, 2023

View reviewed changes

dotnwat requested a review from oleiman August 30, 2023 01:01

BenPope reviewed Aug 30, 2023

View reviewed changes

rockwotj dismissed dotnwat’s stale review via e0eb30b August 30, 2023 16:24

rockwotj force-pushed the sr branch from 64f5e51 to e0eb30b Compare August 30, 2023 16:24

rockwotj requested review from dotnwat and BenPope August 30, 2023 16:34

BenPope reviewed Sep 5, 2023

View reviewed changes

rockwotj force-pushed the sr branch from e0eb30b to f37ebb2 Compare September 5, 2023 19:33

rockwotj requested a review from BenPope September 5, 2023 20:17

BenPope reviewed Sep 7, 2023

View reviewed changes

rockwotj force-pushed the sr branch from 8495d60 to 417c654 Compare September 7, 2023 15:03

rockwotj added 6 commits September 7, 2023 10:13

Add encoding/decoding for the schema registry ABI

1f9b387

The corresponding side of this contract on the broker lies in src/v/wasm/schema_registry_module.cc Also add tests that this can be roundtripped. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Add module documentation for sr

31353e7

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Add an example of the schema registry for internal tests

7a683a0

We'll use this transform to test the schema registry code in Redpanda. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj added 8 commits September 7, 2023 10:13

wasm: Add a test for schema registry functionality in Transforms

1bd980d

This ensures our ABI contract works with the SDK and we can perform basic operations using the fake schema registry. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Give each module it's own gopath

6838063

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Add -modcacherw to tinygo builds

fb7bd16

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Add a simple number of entries based cache for the schema registry

e814403

client To give more control than all or nothing. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Prevent cache from temporarily being n+1 in size

2733c83

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Clarify eviction policy in cache

66f2794

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Add a small LRU queue with a linked list

71b9745

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj force-pushed the sr branch from 417c654 to 71b9745 Compare September 7, 2023 15:13

rockwotj requested a review from BenPope September 7, 2023 15:14

BenPope approved these changes Sep 11, 2023

View reviewed changes

rockwotj merged commit 935193c into redpanda-data:dev Sep 11, 2023
10 checks passed

rockwotj deleted the sr branch September 11, 2023 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform-sdk: Introduce sr package #13049

transform-sdk: Introduce sr package #13049

rockwotj commented Aug 28, 2023 •

edited

Loading

emaxerrno Aug 28, 2023

rockwotj commented Aug 29, 2023

dotnwat left a comment

BenPope left a comment •

edited

Loading

rockwotj commented Aug 30, 2023

BenPope Sep 5, 2023

rockwotj Sep 5, 2023

BenPope Sep 5, 2023

rockwotj Sep 5, 2023

rockwotj commented Sep 6, 2023

BenPope left a comment •

edited

Loading

BenPope Sep 7, 2023

rockwotj Sep 7, 2023

BenPope Sep 7, 2023

rockwotj Sep 7, 2023

BenPope Sep 7, 2023

rockwotj Sep 7, 2023

rockwotj commented Sep 7, 2023

rockwotj commented Sep 7, 2023

transform-sdk: Introduce sr package #13049

transform-sdk: Introduce sr package #13049

Conversation

rockwotj commented Aug 28, 2023 • edited Loading

Backports Required

Release Notes

Choose a reason for hiding this comment

rockwotj commented Aug 29, 2023

dotnwat left a comment

Choose a reason for hiding this comment

BenPope left a comment • edited Loading

Choose a reason for hiding this comment

rockwotj commented Aug 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj commented Sep 6, 2023

BenPope left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj commented Sep 7, 2023

rockwotj commented Sep 7, 2023

rockwotj commented Aug 28, 2023 •

edited

Loading

BenPope left a comment •

edited

Loading

BenPope left a comment •

edited

Loading