Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Apache Avro #62

Closed
dsebastien opened this issue Dec 22, 2017 · 13 comments
Closed

Add support for Apache Avro #62

dsebastien opened this issue Dec 22, 2017 · 13 comments

Comments

@dsebastien
Copy link

dsebastien commented Dec 22, 2017

Would it be possible to add support for Apache Avro (https://avro.apache.org)?

Avro is a very useful library for serializing objects to an interoperable format with a strongly typed schema. Avro is an interesting alternative to Protobuf (see http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html) and supported out of the box by Apache Kafka (Confluent): https://www.confluent.io/blog/avro-kafka-data/.

Avro supports code generation based on JSON/IDL schema files, but also programmatic schema generation and derivation from annotated classes.

@sandwwraith
Copy link
Member

We were reviewing different Apache formats, indeed. Idea was to try to implement Apache Parquet, mainly, because it's more efficient than Avro for Hadoop ecosystem. However, I'm not very familiar with both of this formats, so if you can comment on Avro vs Parquet choice or show different usecases for both of them, feel free to post here.

@dsebastien
Copy link
Author

Actually I don't have any experience with Parquet.

I'm using Avro in an event sourced system mainly because I want to facilitate schema evolutions/transitions and also because of the natural integration with streaming platforms.

@snoop244
Copy link

snoop244 commented Jan 6, 2018

+1 for Avro. Parquet seems to support a bit more a specialized use-case afaik. Avro plays a powerful (but optional) role in the Spring Cloud Stream framework alongside the Avro-based schema registry. Since the Spring folks have embraced Kotlin so completely, would love to see key Kotlin frameworks reciprocate.
Thanks for all the great work!

@baseman
Copy link

baseman commented May 9, 2018

Big +1 for Avro.

Avro is a closer comparison to protobuf than parquet, as parquet is a more compressed columnar format often used for spark and hdfs (where you need a rigid schema that doesn't change).

Avro and protobuf seems to support scenarios more with data communicated across boundaries, with protobuf integrating well with HTTP layers and Avro integrating well with many data ingestion pipelines (eg. Spark, Kafka, AWS Kinesis, AWS Redshift)

Here's a decent read...

@jaccozilla
Copy link

I'm also looking at using Avro. It would also be nice if .avsc files could be generated along with #34

@richardcase
Copy link

I'd also love to see Avro support and code generation from .avsc files. This would be especially useful for working with Kafka Streams.

@rocketraman
Copy link

rocketraman commented May 14, 2019

Avro is a first class format (along with Json) in the Confluent Kafka ecosystem.

Avro is the preferred format for loading data into Google BigQuery, though BigQuery does support other formats as well: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro.

@fuchsst
Copy link

fuchsst commented May 26, 2019

Support for Avro, Parquet & ORC would be great. With priority on Avro. It's more appropriate when data needs to be serialized to be send over a stream (e.g. Kafka). Streaming approaches are a key concept in distributed systems/Microservices so IMHO of higher practical relevance than the other two. Those are mainly used to store data in a blob storage like S3 or HDFS. The files are often proceeded in big data context, which is more specific, e.g. using Spark.

@sksamuel
Copy link
Contributor

sksamuel commented Oct 7, 2019

Take a look at my implementation of an Avro format here: https://github.com/sksamuel/avro4k
It's a port of my scala Avro library, https://github.com/sksamuel/avro4s with the test suite and functionality pretty much the same.
There's still some rough edges to finish around nested collections - such as lists of maps, so bug reports welcome (and expected!)

@ArpanMajumdar
Copy link

Is there any library or tool to generate data classes given Avro schema?

@fuchsst
Copy link

fuchsst commented Nov 6, 2019

Is there any library or tool to generate data classes given Avro schema?

is started something a while ago (but it's not stable yet and i didn't preceed with it for quite a while)...so if you like to support me there :-)
https://github.com/fuchsst/avro-kotlin-maven-plugin

@ArpanMajumdar
Copy link

Sure @fuchsst . I will take a look.

@qwwdfsad
Copy link
Contributor

Unfortunately, we don't have resources to maintain one more full-blown format of a production-level quality.
We encourage to use community-developed libraries instead, thanks @sksamuel for implementing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests