Add support for Apache Avro #62

dsebastien · 2017-12-22T14:09:51Z

Would it be possible to add support for Apache Avro (https://avro.apache.org)?

Avro is a very useful library for serializing objects to an interoperable format with a strongly typed schema. Avro is an interesting alternative to Protobuf (see http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html) and supported out of the box by Apache Kafka (Confluent): https://www.confluent.io/blog/avro-kafka-data/.

Avro supports code generation based on JSON/IDL schema files, but also programmatic schema generation and derivation from annotated classes.

sandwwraith · 2017-12-22T17:21:07Z

We were reviewing different Apache formats, indeed. Idea was to try to implement Apache Parquet, mainly, because it's more efficient than Avro for Hadoop ecosystem. However, I'm not very familiar with both of this formats, so if you can comment on Avro vs Parquet choice or show different usecases for both of them, feel free to post here.

dsebastien · 2017-12-22T22:26:49Z

Actually I don't have any experience with Parquet.

I'm using Avro in an event sourced system mainly because I want to facilitate schema evolutions/transitions and also because of the natural integration with streaming platforms.

snoop244 · 2018-01-06T06:32:16Z

+1 for Avro. Parquet seems to support a bit more a specialized use-case afaik. Avro plays a powerful (but optional) role in the Spring Cloud Stream framework alongside the Avro-based schema registry. Since the Spring folks have embraced Kotlin so completely, would love to see key Kotlin frameworks reciprocate.
Thanks for all the great work!

baseman · 2018-05-09T13:33:08Z

Big +1 for Avro.

Avro is a closer comparison to protobuf than parquet, as parquet is a more compressed columnar format often used for spark and hdfs (where you need a rigid schema that doesn't change).

Avro and protobuf seems to support scenarios more with data communicated across boundaries, with protobuf integrating well with HTTP layers and Avro integrating well with many data ingestion pipelines (eg. Spark, Kafka, AWS Kinesis, AWS Redshift)

Here's a decent read...

https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html

jaccozilla · 2018-06-20T00:23:35Z

I'm also looking at using Avro. It would also be nice if .avsc files could be generated along with #34

richardcase · 2019-05-14T08:41:05Z

I'd also love to see Avro support and code generation from .avsc files. This would be especially useful for working with Kafka Streams.

rocketraman · 2019-05-14T13:43:44Z

Avro is a first class format (along with Json) in the Confluent Kafka ecosystem.

Avro is the preferred format for loading data into Google BigQuery, though BigQuery does support other formats as well: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro.

fuchsst · 2019-05-26T00:52:54Z

Support for Avro, Parquet & ORC would be great. With priority on Avro. It's more appropriate when data needs to be serialized to be send over a stream (e.g. Kafka). Streaming approaches are a key concept in distributed systems/Microservices so IMHO of higher practical relevance than the other two. Those are mainly used to store data in a blob storage like S3 or HDFS. The files are often proceeded in big data context, which is more specific, e.g. using Spark.

sksamuel · 2019-10-07T12:37:07Z

Take a look at my implementation of an Avro format here: https://github.com/sksamuel/avro4k
It's a port of my scala Avro library, https://github.com/sksamuel/avro4s with the test suite and functionality pretty much the same.
There's still some rough edges to finish around nested collections - such as lists of maps, so bug reports welcome (and expected!)

ArpanMajumdar · 2019-11-06T06:57:16Z

Is there any library or tool to generate data classes given Avro schema?

fuchsst · 2019-11-06T08:02:48Z

Is there any library or tool to generate data classes given Avro schema?

is started something a while ago (but it's not stable yet and i didn't preceed with it for quite a while)...so if you like to support me there :-)
https://github.com/fuchsst/avro-kotlin-maven-plugin

ArpanMajumdar · 2019-11-06T14:55:05Z

Sure @fuchsst . I will take a look.

qwwdfsad · 2020-07-13T17:18:56Z

Unfortunately, we don't have resources to maintain one more full-blown format of a production-level quality.
We encourage to use community-developed libraries instead, thanks @sksamuel for implementing it!

sandwwraith added the feature label Dec 22, 2017

sandwwraith added format request and removed feature labels Jun 28, 2018

qwwdfsad closed this as completed Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Apache Avro #62

Add support for Apache Avro #62

dsebastien commented Dec 22, 2017 •

edited

Loading

sandwwraith commented Dec 22, 2017

dsebastien commented Dec 22, 2017

snoop244 commented Jan 6, 2018

baseman commented May 9, 2018

jaccozilla commented Jun 20, 2018

richardcase commented May 14, 2019

rocketraman commented May 14, 2019 •

edited

Loading

fuchsst commented May 26, 2019

sksamuel commented Oct 7, 2019

ArpanMajumdar commented Nov 6, 2019

fuchsst commented Nov 6, 2019

ArpanMajumdar commented Nov 6, 2019

qwwdfsad commented Jul 13, 2020

Add support for Apache Avro #62

Add support for Apache Avro #62

Comments

dsebastien commented Dec 22, 2017 • edited Loading

sandwwraith commented Dec 22, 2017

dsebastien commented Dec 22, 2017

snoop244 commented Jan 6, 2018

baseman commented May 9, 2018

jaccozilla commented Jun 20, 2018

richardcase commented May 14, 2019

rocketraman commented May 14, 2019 • edited Loading

fuchsst commented May 26, 2019

sksamuel commented Oct 7, 2019

ArpanMajumdar commented Nov 6, 2019

fuchsst commented Nov 6, 2019

ArpanMajumdar commented Nov 6, 2019

qwwdfsad commented Jul 13, 2020

dsebastien commented Dec 22, 2017 •

edited

Loading

rocketraman commented May 14, 2019 •

edited

Loading