Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simply extending Iterable would break on Java serialization #11192

Closed
eed3si9n opened this issue Oct 6, 2018 · 4 comments
Closed

Simply extending Iterable would break on Java serialization #11192

eed3si9n opened this issue Oct 6, 2018 · 4 comments

Comments

@eed3si9n
Copy link
Member

eed3si9n commented Oct 6, 2018

This is a generalization of scala/scala-xml#254 reported by @ashawley and analyzed by @xuwei-k

steps

To minimize scala.xml.XMLTestJVM.serializeAttribute failure Yoshida-san created a minimization using a custom collection that looks like this:

class MyCollection[B](val list: List[B]) extends scala.collection.Iterable[B] {
  override def iterator = list.iterator
  // protected[this] override def writeReplace(): AnyRef = this
}

I'm breaking up the test into the following steps:

  @Test
  def testMyCollection: Unit = {
    val list = List(1, 2, 3)
    val arr = serialize(new MyCollection(list))
    val obj2 = deserialize[MyCollection[Int]](arr)
    assert(obj2.list == list)
  }

  def serialize[A <: Serializable](obj: A): Array[Byte] = {
    val o = new ByteArrayOutputStream()
    val os = new ObjectOutputStream(o)
    os.writeObject(obj)
    o.toByteArray()
  }

  def deserialize[A <: Serializable](bytes: Array[Byte]): A = {
    val s = new ByteArrayInputStream(bytes)
    val is = new ObjectInputStream(s)
    is.readObject().asInstanceOf[A]
  }

problem

When I run this the error I get is as follows:

[error] Test issue.IssueTest.testMyCollection failed: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to issue.MyCollection, took 0.009 sec
[error]     at issue.IssueTest.testMyCollection(IssueTest.scala:20)
[error]     ...

In other words, I don't get to the assertion, but instead it's failing at casting in the deserialization which deserialized $colon$colon instead of MyCollection. This looks to be a different problem than #9237.

expectation

Either the serialization works out of the box, or MyCollection does not compile without providing some serialization mechanism.

workaround

A workaround identified by Yoshida-san is uncommenting the following:

// protected[this] override def writeReplace(): AnyRef = this

note

scala/scala#6676 makes Iterable Serializable by default.

trait Iterable[+A] extends IterableOnce[A] with IterableOps[A, Iterable, Iterable[A]] with Serializable {

with writeReplace implemented as follows:

  protected[this] def writeReplace(): AnyRef = new DefaultSerializationProxy(iterableFactory.iterableFactory, this)

In other words, the serialization of all things Iterable are passed into DefaultSerializationProxy, including all subtypes that exist in the wild. Perhaps it should fail at the point of serialization when it detects a type that it cannot handle.

Letting it serialize the data, but not deserialize sounds like a potentially data-losing behavior. Another thing to consider is forcing subclasses of Iterable to implement a serialization method. The situation where it's easy to roll your own collection, but it will blow up on Spark by default is not a happy experience.

@ashawley

This comment has been minimized.

xerial added a commit to wvlet/airframe that referenced this issue Oct 31, 2018
* Upgrade to Scala 2.13.0-M5
* Upgrade ScalaTest and Scalacheck
* Update travis config for Scala 2.13.0-M5
* Scala 2.13.0-M5 confused java.util.logging.LogRecord and wvlet.log.LogRecord
* Fix scala/bug#11202 by explicitely specifying JMX annotation package path
* Use Scala test 3.0.6-SNAP4
* Add a workaround for Scala 2.13.0-M5 serialization bug: scala/bug#11192
* Exclude airframe-fluentd from Scala 2.13 build
* Exclude serilaization test for Scala 2.13.0-M5 instead of forking JVM
@szeiger
Copy link

szeiger commented Nov 8, 2018

I think the current behavior fits well into the collections design and should be considered correct. Deserialization will deserialize to the C type of the collection, so when you're extending Iterable you get back the IterableC. The default for this is List. This is the same collection type you would get when you call, for example, filter on your collection. If you want to rebuild to a more specific collection type (whether through serialization/deserialization or a filtering method) you need to extend IterableOps with a more specific C type and provide a factory for it.

@szeiger
Copy link

szeiger commented Nov 14, 2018

We can make the new serialization scheme opt-in by putting it into a mix-in trait for the concrete collection types. This leaves serialization undefined by default, which was the case in 2.12. (Note that simply extending Iterable would not have worked in 2.12, either, because Iterable is not Serializable)

@eed3si9n
Copy link
Member Author

If it fails at serialization as opposed to deserialization, that would be a safer failure mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants