AnyVal blows up TypedEncoder / Injection #161

Voltir · 2017-07-21T23:00:03Z

import frameless._
import frameless.syntax._

case class CustomerId(value: Long) //extends AnyVal

object CustomerId {
  implicit val _injection: Injection[CustomerId, Long] =
    Injection[CustomerId, Long](_.value,CustomerId.apply)

  implicit val _encoder: TypedEncoder[CustomerId] =
    frameless.TypedEncoder.usingInjection[CustomerId, Long]
}

Im using CustomerId in a larger Product type, if I decide to uncomment the //extends AnyVal I get a lovely explosion in a catalyst.codegen step:

(P.S - It seems odd that i have to explicitly add the field _encoder -- for some reason, the implicit def for usingInjection isnt being implicitly found - but im not ruling out me forgetting something basic about implicits)

[info] - can load the past travel sample *** FAILED ***
[info]   java.lang.RuntimeException: Error while decoding: java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 805, Column 89: No applicable constructor/method found for actual parameters "ai.takt.data.valkyrie.domain.CustomerId, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option"; candidates are: "ai.takt.data.valkyrie.csv.schemas.PastTravel$Row(long, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option, scala.Option)"
[info] /* 001 */ public java.lang.Object generate(Object[] references) {
[info] /* 002 */   return new SpecificSafeProjection(references);
[info] /* 003 */ }
[info] /* 004 */
[info] /* 005 */ class SpecificSafeProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
[info] /* 006 */
[info] /* 007 */   private Object[] references;
[info] /* 008 */   private InternalRow mutableRow;
[info] /* 009 */   private boolean resultIsNull;
[info] /* 010 */   private ai.takt.data.valkyrie.domain.CustomerId argValue;
[info] /* 011 */   private scala.Option argValue1;
[info] /* 012 */   private scala.Option argValue2;
[info] /* 013 */   private scala.Option argValue3;
[info] /* 014 */   private scala.Option argValue4;
[info] /* 015 */   private scala.Option argValue5;
[info] /* 016 */   private scala.Option argValue6;

The text was updated successfully, but these errors were encountered:

imarios · 2017-07-21T23:03:37Z

Good catch! Ok, can you remove the injection? Any case class should be straightforwardly encoded without Injection.

Also, what version are you on?

imarios · 2017-07-21T23:15:08Z

The following works just fine for me:

import frameless._
import frameless.syntax._

case class CustomerId(value: Long) //extends AnyVal

object CustomerId {
  implicit val _injection: Injection[CustomerId, Long] =
    Injection[CustomerId, Long](_.value,CustomerId.apply)

  implicit val _encoder: TypedEncoder[CustomerId] =
    frameless.TypedEncoder.usingInjection[CustomerId, Long]
}

TypedDataset.create(Seq(CustomerId(2))).show().run

Can you provide a minimal example that showcases this issue?

kanterov · 2017-07-22T13:59:34Z

I've reproduced this issue with following code:

case class CustomerId(value: Long) extends AnyVal

object CustomerId {
  implicit def injection: Injection[CustomerId, Long] =
    Injection(_.value, CustomerId(_))

  implicit def encoder: TypedEncoder[CustomerId] =
    TypedEncoder.usingInjection[CustomerId, Long]

  implicit def arbitrary: Arbitrary[CustomerId] = Arbitrary {
    Arbitrary.arbLong.arbitrary.map(CustomerId(_))
  }
}

case class Customer(id: CustomerId, name: String)

object Customer {
  implicit def arbitrary: Arbitrary[Customer] = Arbitrary {
    for {
      id <- Arbitrary.arbitrary[CustomerId]
      name <- Gen.alphaNumStr
    } yield Customer(id, name)
  }
}

There are 3 related issues:

TypedEncoder derivation isn't friendly if you have both injection and case class, or regular AnyVal class with a single field.
Invoke in Spark always assumes that output is nullable unless it's a primitive type, therefore codegen isn't optimal, it was fixed in Spark 2.2.0, we should start using returnNullable once we migrate, see apache/spark@38b9e69
The way Scala encoders AnyVal if it's a field in case class is just creating a method to return an underlying object:

case class Customer(id: CustomerId)

becomes

class Customer {
  long id() { ... }
}

But we generate code assuming id() would return CustomerId. it's possible to fix this using TypeTag instead of ClassTag. It would involve using a lot of reflection and could be very dirty.

The other way is to provide specialize encoder for AnyVal automatically doing "fixed" injection that would work, that makes a lot of sense to be a default behavior. I think we should go with the second approach.

One of workarounds is to manually declare encoders:

  implicit def encoder: TypedEncoder[CustomerId] = new TypedEncoder[CustomerId] {
    def nullable: Boolean = false
    def sourceDataType: DataType = LongType
    def targetDataType: DataType = LongType

    def constructorFor(path: Expression): Expression = path
    def extractorFor(path: Expression): Expression = path
  }

But it would work only if CustomerId is a field of a case class, it wouldn't work to collect DataFrame of CustomerId, or call UDF that accepts CustomerId.

cchantep · 2021-09-07T13:55:06Z

Closed by #546

imarios added the bug label Nov 11, 2017

kanterov mentioned this issue Apr 14, 2021

TypedEncoder not working with UDF (or not properly documented) for class having a field typed with a ValueClass #516

Closed

imarios pinned this issue Apr 15, 2021

larsrh unpinned this issue Apr 26, 2021

cchantep closed this as completed Sep 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnyVal blows up TypedEncoder / Injection #161

AnyVal blows up TypedEncoder / Injection #161

Voltir commented Jul 21, 2017

imarios commented Jul 21, 2017 •

edited

Loading

imarios commented Jul 21, 2017

kanterov commented Jul 22, 2017 •

edited

Loading

cchantep commented Sep 7, 2021

AnyVal blows up TypedEncoder / Injection #161

AnyVal blows up TypedEncoder / Injection #161

Comments

Voltir commented Jul 21, 2017

imarios commented Jul 21, 2017 • edited Loading

imarios commented Jul 21, 2017

kanterov commented Jul 22, 2017 • edited Loading

cchantep commented Sep 7, 2021

imarios commented Jul 21, 2017 •

edited

Loading

kanterov commented Jul 22, 2017 •

edited

Loading