-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avro support #1844
Comments
We have had avro working for a while, but code is not generic enough and very specific to our schemas. In fact, it will not be possible to take a general avro schema and convert it to druid row because avro has support for very many complex types, so we will have to compromise anyway. Also, it was written pre druid-0.8.0 era where it wasn't possible to have InputFormats that could return anything but Text records. |
I'm using avro with druid for production, for batch indexing, it's not complicated based on @himanshug 's #1472, and I'm using my But for realtime indexing, it's a bit more cumbersome because you need an schema to deserialize avro object from binary stream, and you don't want to send schema with every serialized record to kafka. Then you need an schema registry, currently we are using schemarepo and I'll try to clean my code and try to submit an PR for this this weekend if I got some time. |
Please check #1858 |
Should probably be an extension.
For realtime we need a ByteBufferInputRowParser (something similar to the ProtoBufInputRowParser, but for Avro).
For batch we need a recommended Avro-aware InputFormat and an InputRowParser that can read whatever type is returned by that InputFormat. I haven't used Avro before so I'm not sure what the right choice of InputFormat is.
AvroKeyInputFormat
from https://avro.apache.org/docs/1.7.0/api/java/org/apache/avro/mapreduce/AvroKeyInputFormat.html seems like a possible candidate.The text was updated successfully, but these errors were encountered: