Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafkit fails to register schema under new subject if schema_id already exists in cache #4

Open
ulrikjohansson opened this issue Feb 12, 2020 · 2 comments · May be fixed by #12
Open

Comments

@ulrikjohansson
Copy link

ulrikjohansson commented Feb 12, 2020

Hi! Thanks again for a really useful library.

I'm working on moving centrally managed schemas into the application that actually owns them, and I've stumbled on an issue with the register_schema method.

We're using avro encoding for the keys as well as the values in the kafka messages, and in this instance the key schema is just a simple string schema. All schemas owned by the application share the same key schema.

I noticed that when testing schema creation from the application using kafkit, only the key schema for the first subject is actually created in the schema registry, the others get short circuited in the register_schema method here:

# look in cache first
try:
schema_id = self.schema_cache[schema]
return schema_id

This becomes a problem when other clients rely on the key schema_id being registered under the other subject as well in the schema registry, and they now crash when they can't find a key-schema for their subjects.

Update: A quick workaround for this problem is to wipe the schema cache in the registry manually when I loop through the schemas I need to make sure exist in the confluent schema registry, like so:

registry._schema_cache = SchemaCache()

I'm happy to provide a PR with a solution when I know a bit more about the rationale and use case for short circuiting the schema registering.

@jonathansick
Copy link
Member

That's a really good point. I don't think we've run into this issue yet because we've always set the name in the Avro schema to match the subject (so we haven't been properly exercising the subject keyword argument).

Your use-case is totally valid, though, to support the {topic}-key naming convention.

If you'd like to PR a solution, I'd be happy to accept it. Or I can get to it :)

I'm trying to think off the top of my head of what the best way to handle this is? Add a subject-based caching layer that gets used if the subject name is provided?

@ulrikjohansson
Copy link
Author

Hi!
Sorry for the looong wait. I've stopped working on services using kafka/kafkit since a while ago, and made do with the workaround in our code. It still bugged me that I complained about this and then didn't make an attempt to resolve it, so here goes 😃 Please have a look at #12 to see if you think it solves the problem in an acceptable way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants