-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add joda-time and ByteBuffer support in avro #740
Conversation
Codecov Report
@@ Coverage Diff @@
## main #740 +/- ##
==========================================
+ Coverage 94.96% 95.04% +0.07%
==========================================
Files 52 52
Lines 1789 1856 +67
Branches 168 168
==========================================
+ Hits 1699 1764 +65
- Misses 90 92 +2
|
override def from(v: ByteBuffer)(cm: CaseMapper): Array[Byte] = | ||
ju.Arrays.copyOfRange(v.array(), v.position(), v.limit()) | ||
override def to(v: Array[Byte])(cm: CaseMapper): ByteBuffer = ByteBuffer.wrap(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have a non symmetric definition:
- in the
from
we copy the data - in the
to
we only wrap, meaning a change to the converted avro object will modify the scala array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting--yeah, I can't see why we would need to do an array-copy here, your change makes sense to me.
} | ||
implicit val afBytes: AvroField[Array[Byte]] = from[ByteBuffer](_.array())(ByteBuffer.wrap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the behavior to pass the same mutable array from avro to scala
override def from(v: ByteBuffer)(cm: CaseMapper): Array[Byte] = | ||
ju.Arrays.copyOfRange(v.array(), v.position(), v.limit()) | ||
override def to(v: Array[Byte])(cm: CaseMapper): ByteBuffer = ByteBuffer.wrap(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting--yeah, I can't see why we would need to do an array-copy here, your change makes sense to me.
implicit val afDate: AvroField[LocalDate] = | ||
logicalType[Int](LogicalTypes.date())(x => LocalDate.ofEpochDay(x.toLong))(_.toEpochDay.toInt) | ||
private val EpochJodaDate = new org.joda.time.LocalDate(1970, 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be lazy?
@@ -16,7 +16,7 @@ | |||
|
|||
package magnolify.avro | |||
|
|||
import java.nio.ByteBuffer | |||
import java.nio.{ByteBuffer, ByteOrder} | |||
import java.time._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could import joda as import org.joda.{time => joda}
so that the Joda APIs could simply refer to joda.LocalDate
?
// A duration logical type annotates Avro fixed type of size 12, which stores three little-endian unsigned integers | ||
// that represent durations at different granularities of time. | ||
// The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds. | ||
val afDuration: AvroField[(Long, Long, Long)] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use java.time.Duration
for this? It supports a nanosecond granularity which we could use for encoding in the ByteBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duration
is not made to hold month
but Period
can, so we could define as (Period, Duration)
.
I was unsure though where to store the days
, as both can hold the value. I think we can split like this:
Period
: months + daysDuration
: milliseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Period
is defined as (years: Int, months: Int, days: Int), so we can have an overflow when converting from avro, as the specification tells the value are unsinged integers (handled as long)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it. This is probably the least painful solution, then -- can you add documentation (both on how to import it, and on what it represents) to https://github.com/spotify/magnolify/blob/main/docs/avro.md + https://github.com/spotify/magnolify/blob/main/docs/mapping.md ?
@@ -196,14 +196,14 @@ object AvroField { | |||
implicit val afLong: AvroField[Long] = id[Long](Schema.Type.LONG) | |||
implicit val afFloat: AvroField[Float] = id[Float](Schema.Type.FLOAT) | |||
implicit val afDouble: AvroField[Double] = id[Double](Schema.Type.DOUBLE) | |||
implicit val afBytes: AvroField[Array[Byte]] = new Aux[Array[Byte], ByteBuffer, ByteBuffer] { | |||
implicit val afByteBuffer: AvroField[ByteBuffer] = new Aux[ByteBuffer, ByteBuffer, ByteBuffer] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it could be declared as id[ByteBuffer](Schema.Type. BYTES)
, but id
needs to have an override to accommodate d.array()
|
||
implicit val afAvroDuration: AvroField[AvroDuration] = AvroField.from[(Long, Long, Long)] { | ||
case (months, days, millis) => AvroDuration(months, days, millis) | ||
}(d => (d.months, d.days, d.millis))(AvroField.afDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We either need explain how to use AvroField.afDuration
in docs, or introduce AvroDuration
in magnolify. It is concerning that it won't be useful to users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not for exposing a custom AvroDuration
class. IHMO, the avro java lib does not expose the duration
logical type because there does not exists a proper java data structure to handle the value as defined in the spec.
In the end afDuration
is not different from any other type, so users should not be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was thinking is that it might be not obvious for a user how to discover afDuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can include that in #612 . We'll need to create a proper doc for all modules
* Add joda-time and ByteBuffer support in avro * Move joda-time to common * Add avro duration logical-type * handle unsigned integers values properly * Add all logical types tests * FIx constant naming * Fix default bytes behavior * Use deepEquals for DefaultBytes test * Import org.joda.time package
avro module part of #708 and #726
Added
val afDuration: AvroField[(Long, Long, Long)]
definition (not implicit) for matching duration logical type spec. Part of #255