-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Apache Avro #62
Comments
We were reviewing different Apache formats, indeed. Idea was to try to implement Apache Parquet, mainly, because it's more efficient than Avro for Hadoop ecosystem. However, I'm not very familiar with both of this formats, so if you can comment on Avro vs Parquet choice or show different usecases for both of them, feel free to post here. |
Actually I don't have any experience with Parquet. I'm using Avro in an event sourced system mainly because I want to facilitate schema evolutions/transitions and also because of the natural integration with streaming platforms. |
+1 for Avro. Parquet seems to support a bit more a specialized use-case afaik. Avro plays a powerful (but optional) role in the Spring Cloud Stream framework alongside the Avro-based schema registry. Since the Spring folks have embraced Kotlin so completely, would love to see key Kotlin frameworks reciprocate. |
Big +1 for Avro. Avro is a closer comparison to protobuf than parquet, as parquet is a more compressed columnar format often used for spark and hdfs (where you need a rigid schema that doesn't change). Avro and protobuf seems to support scenarios more with data communicated across boundaries, with protobuf integrating well with HTTP layers and Avro integrating well with many data ingestion pipelines (eg. Spark, Kafka, AWS Kinesis, AWS Redshift) Here's a decent read... |
I'm also looking at using Avro. It would also be nice if |
I'd also love to see Avro support and code generation from |
Avro is a first class format (along with Json) in the Confluent Kafka ecosystem. Avro is the preferred format for loading data into Google BigQuery, though BigQuery does support other formats as well: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro. |
Support for Avro, Parquet & ORC would be great. With priority on Avro. It's more appropriate when data needs to be serialized to be send over a stream (e.g. Kafka). Streaming approaches are a key concept in distributed systems/Microservices so IMHO of higher practical relevance than the other two. Those are mainly used to store data in a blob storage like S3 or HDFS. The files are often proceeded in big data context, which is more specific, e.g. using Spark. |
Take a look at my implementation of an Avro format here: https://github.com/sksamuel/avro4k |
Is there any library or tool to generate data classes given Avro schema? |
is started something a while ago (but it's not stable yet and i didn't preceed with it for quite a while)...so if you like to support me there :-) |
Sure @fuchsst . I will take a look. |
Unfortunately, we don't have resources to maintain one more full-blown format of a production-level quality. |
Would it be possible to add support for Apache Avro (https://avro.apache.org)?
Avro is a very useful library for serializing objects to an interoperable format with a strongly typed schema. Avro is an interesting alternative to Protobuf (see http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html) and supported out of the box by Apache Kafka (Confluent): https://www.confluent.io/blog/avro-kafka-data/.
Avro supports code generation based on JSON/IDL schema files, but also programmatic schema generation and derivation from annotated classes.
The text was updated successfully, but these errors were encountered: