Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add joda-time and ByteBuffer support in avro #740

Merged
merged 9 commits into from
May 12, 2023
Merged

Add joda-time and ByteBuffer support in avro #740

merged 9 commits into from
May 12, 2023

Conversation

RustedBones
Copy link
Contributor

@RustedBones RustedBones commented May 4, 2023

avro module part of #708 and #726

Added val afDuration: AvroField[(Long, Long, Long)] definition (not implicit) for matching duration logical type spec. Part of #255

@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

Merging #740 (f0114e0) into main (67eb61f) will increase coverage by 0.07%.
The diff coverage is 95.74%.

@@            Coverage Diff             @@
##             main     #740      +/-   ##
==========================================
+ Coverage   94.96%   95.04%   +0.07%     
==========================================
  Files          52       52              
  Lines        1789     1856      +67     
  Branches      168      168              
==========================================
+ Hits         1699     1764      +65     
- Misses         90       92       +2     
Impacted Files Coverage Δ
...rc/main/scala/magnolify/avro/logical/package.scala 93.82% <94.66%> (+0.49%) ⬆️
avro/src/main/scala/magnolify/avro/AvroType.scala 97.10% <100.00%> (+1.19%) ⬆️

Comment on lines -203 to -205
override def from(v: ByteBuffer)(cm: CaseMapper): Array[Byte] =
ju.Arrays.copyOfRange(v.array(), v.position(), v.limit())
override def to(v: Array[Byte])(cm: CaseMapper): ByteBuffer = ByteBuffer.wrap(v)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a non symmetric definition:

  • in the from we copy the data
  • in the to we only wrap, meaning a change to the converted avro object will modify the scala array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting--yeah, I can't see why we would need to do an array-copy here, your change makes sense to me.

}
implicit val afBytes: AvroField[Array[Byte]] = from[ByteBuffer](_.array())(ByteBuffer.wrap)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the behavior to pass the same mutable array from avro to scala

Comment on lines -203 to -205
override def from(v: ByteBuffer)(cm: CaseMapper): Array[Byte] =
ju.Arrays.copyOfRange(v.array(), v.position(), v.limit())
override def to(v: Array[Byte])(cm: CaseMapper): ByteBuffer = ByteBuffer.wrap(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting--yeah, I can't see why we would need to do an array-copy here, your change makes sense to me.

implicit val afDate: AvroField[LocalDate] =
logicalType[Int](LogicalTypes.date())(x => LocalDate.ofEpochDay(x.toLong))(_.toEpochDay.toInt)
private val EpochJodaDate = new org.joda.time.LocalDate(1970, 1, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be lazy?

@@ -16,7 +16,7 @@

package magnolify.avro

import java.nio.ByteBuffer
import java.nio.{ByteBuffer, ByteOrder}
import java.time._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could import joda as import org.joda.{time => joda} so that the Joda APIs could simply refer to joda.LocalDate?

// A duration logical type annotates Avro fixed type of size 12, which stores three little-endian unsigned integers
// that represent durations at different granularities of time.
// The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds.
val afDuration: AvroField[(Long, Long, Long)] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use java.time.Duration for this? It supports a nanosecond granularity which we could use for encoding in the ByteBuffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duration is not made to hold month but Period can, so we could define as (Period, Duration).
I was unsure though where to store the days, as both can hold the value. I think we can split like this:

  • Period: months + days
  • Duration: milliseconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Period is defined as (years: Int, months: Int, days: Int), so we can have an overflow when converting from avro, as the specification tells the value are unsinged integers (handled as long)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it. This is probably the least painful solution, then -- can you add documentation (both on how to import it, and on what it represents) to https://github.com/spotify/magnolify/blob/main/docs/avro.md + https://github.com/spotify/magnolify/blob/main/docs/mapping.md ?

@@ -196,14 +196,14 @@ object AvroField {
implicit val afLong: AvroField[Long] = id[Long](Schema.Type.LONG)
implicit val afFloat: AvroField[Float] = id[Float](Schema.Type.FLOAT)
implicit val afDouble: AvroField[Double] = id[Double](Schema.Type.DOUBLE)
implicit val afBytes: AvroField[Array[Byte]] = new Aux[Array[Byte], ByteBuffer, ByteBuffer] {
implicit val afByteBuffer: AvroField[ByteBuffer] = new Aux[ByteBuffer, ByteBuffer, ByteBuffer] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it could be declared as id[ByteBuffer](Schema.Type. BYTES), but id needs to have an override to accommodate d.array()


implicit val afAvroDuration: AvroField[AvroDuration] = AvroField.from[(Long, Long, Long)] {
case (months, days, millis) => AvroDuration(months, days, millis)
}(d => (d.months, d.days, d.millis))(AvroField.afDuration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either need explain how to use AvroField.afDuration in docs, or introduce AvroDuration in magnolify. It is concerning that it won't be useful to users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not for exposing a custom AvroDuration class. IHMO, the avro java lib does not expose the duration logical type because there does not exists a proper java data structure to handle the value as defined in the spec.

In the end afDuration is not different from any other type, so users should not be lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was thinking is that it might be not obvious for a user how to discover afDuration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can include that in #612 . We'll need to create a proper doc for all modules

@RustedBones RustedBones merged commit 4754196 into main May 12, 2023
@RustedBones RustedBones deleted the avro-types branch May 12, 2023 07:28
RustedBones added a commit that referenced this pull request Sep 12, 2023
* Add joda-time and ByteBuffer support in avro

* Move joda-time to common

* Add avro duration logical-type

* handle unsigned integers values properly

* Add all logical types tests

* FIx constant naming

* Fix default bytes behavior

* Use deepEquals for DefaultBytes test

* Import org.joda.time package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants