Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate possible serialization regression(s) in 2.13 milestones #562

Closed
SethTisue opened this issue Sep 25, 2018 · 28 comments
Closed

investigate possible serialization regression(s) in 2.13 milestones #562

SethTisue opened this issue Sep 25, 2018 · 28 comments
Milestone

Comments

@SethTisue
Copy link
Member

@SethTisue SethTisue added this to the 2.13.0-RC1 milestone Sep 25, 2018
@SethTisue SethTisue changed the title investigate possible serialization-regression(s) in 2.13 milestones investigate possible serialization regression(s) in 2.13 milestones Sep 25, 2018
@SethTisue
Copy link
Member Author

@kailuowang did cats hit something in this area, or am I misremembering?
@xuwei-k are you aware of any other repos that hit issues in this area?

@kailuowang
Copy link

@SethTisue I am sorry I don't quite remember either, I quickly checked the PR I added 2.13 support for cats and didn't find any mention of serialization.

@milessabin
Copy link

There was an issue with shapeless which I worked around by suppressing a couple of tests.

See discussion here: scala/scala#6729 (comment).

@SethTisue
Copy link
Member Author

SethTisue commented Sep 26, 2018

for any given project, it's useful to know whether the problem goes away with set fork in Test := true or not (see comments on the ScalaCheck ticket)

@xuwei-k
Copy link
Contributor

xuwei-k commented Sep 26, 2018

I think these are different problems (scala/scala-xml#254, typelevel/scalacheck#427 (comment))

see my reproduce example scala/scala-xml#254 (comment). scala-xml test fail whether fork is true or false.

@SethTisue
Copy link
Member Author

@kailuowang found it! typelevel/cats#2360

@ashawley
Copy link
Member

Looks like that is the community build showing the ScalaCheck bug, but in Cats's test suite?

@eed3si9n
Copy link
Member

eed3si9n commented Oct 6, 2018

It's not mentioned yet here so I'll link to an existing issue surrounding Java serialization, List, and sbt's layered class loader - scala/bug#9237 (SI-9237) "Can't deserialize scala.List[M] if scala-library is not in the same classloader as M" (see also https://bugs.openjdk.java.net/browse/JDK-4340158 and sbt/sbt#89)

@retronym foretold this in scala/collection-strawman#523:

However, there is a gotcha with custom serialization: scala/bug#9237

In some classloader setups, we end up trying to deserialize the elements in the wrong classloader.

Here's the reproduction of scala/bug#9237 modified to Scala 2.13.0-M5 and JUnit: https://github.com/eed3si9n/minimized-list-serialization/blob/v0.1.0/src/test/scala/issue/IssueTest.scala

running test from sbt should produce:

[debug] Test issue.IssueTest.testSerialization started
[error] Test issue.IssueTest.testSerialization failed: java.lang.ClassNotFoundException: issue.Meh, took 0.027 sec
[error]     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[error]     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[error]     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[error]     at java.lang.Class.forName0(Native Method)
[error]     at java.lang.Class.forName(Class.java:348)
[error]     at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)
[error]     at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1866)
[error]     at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
[error]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
[error]     at scala.collection.generic.DefaultSerializationProxy.readObject(DefaultSerializationProxy.scala:46)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:498)
[error]     at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
[error]     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
[error]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
[error]     at issue.IssueTest.deserialize(IssueTest.scala:25)
[error]     at issue.IssueTest.testSerialization(IssueTest.scala:11)

This is caused by ObjectInputStream not handling layered class loaders, which explains why it often goes away if you fork the test.

workaround 1: fork the test and/or run

If you use Java serialization, fork your test.

Test / fork := true

Would that work?

workaround 2: custom ObjectInputStream

As a workaround, when possible you can use a custom ObjectInputStream. Here's something I implemented as a demonstration: https://github.com/eed3si9n/minimized-list-serialization/blob/v0.2.0/src/main/scala/issue/ScalaObjectInputStream.scala

Using that deserialize would look like this:

  def deserialize[A <: Serializable: ClassTag](bytes: Array[Byte]): A = {
    val s = new ByteArrayInputStream(bytes)
    val is = ScalaObjectInputStream[A](s)
    is.readObject
  }

If your code is the one calling the deserialization that might be ok, but I'm guessing that's not always the case.

workaround 3: provide alternative class loader for sbt

The other workaround might be to provide an alternative flatter class loader for sbt.
At that point, however, might as well just use forked run. I guess it depends on how often Java serialization is used in the wild.

@eed3si9n
Copy link
Member

eed3si9n commented Oct 6, 2018

About the other issue reported by Yoshida-san. To minimize scala.xml.XMLTestJVM.serializeAttribute failure he created a minimization using a custom collection that looks like this:

class MyCollection[B](val list: List[B]) extends scala.collection.Iterable[B] {
  override def iterator = list.iterator
  // protected[this] override def writeReplace(): AnyRef = this
}

I'm breaking up the test into the following steps:

  @Test
  def testMyCollection: Unit = {
    val list = List(1, 2, 3)
    val arr = serialize(new MyCollection(list))
    val obj2 = deserialize[MyCollection[Int]](arr)
    assert(obj2.list == list)
  }

When I run this the error I get is as follows:

[error] Test issue.IssueTest.testMyCollection failed: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to issue.MyCollection, took 0.01 sec
[error]     at issue.IssueTest.testMyCollection(IssueTest.scala:20)
[error]     ...

In other words, I don't get to the assertion, but instead it's failing at casting in the deserialization which deserialized $colon$colon instead of MyCollection. This looks to be a different problem than scala/bug#9237.

what's going on?

scala/scala#6676 makes Iterable Serializable by default.

trait Iterable[+A] extends IterableOnce[A] with IterableOps[A, Iterable, Iterable[A]] with Serializable {

with writeReplace implemented as follows:

  protected[this] def writeReplace(): AnyRef = new DefaultSerializationProxy(iterableFactory.iterableFactory, this)

In other words, the serialization of all things Iterable are passed into DefaultSerializationProxy, including all subtypes that exist in the wild. Perhaps it should fail at the point of serialization when it detects a type that it cannot handle. Letting it serialize the data, but not deserialize sounds like a potentially data-losing behavior. Another thing to consider is forcing subclasses of Iterable to implement a serialization method. /cc @szeiger

@ashawley
Copy link
Member

ashawley commented Oct 6, 2018

These are great write-ups! Sounds like the first issue can be carried forward in SI-9237.

For the second issue, should a new bug ticket be created so it can be tracked?

@eed3si9n
Copy link
Member

eed3si9n commented Oct 6, 2018

@ashawley Thanks. Yea. The added context is that Scala 2.13 adopts Serialization Proxy pattern, so it became all collection problem, not just List?

I opened scala/bug#11192 for the second issue.

@xerial
Copy link

xerial commented Oct 11, 2018

I also hit a similar issue when serializing Vector. It seems scala.collection.generic.DefaultSerializationProxy is wrongly involved during serialization targets:
wvlet/airframe#232

[info]   java.lang.ClassCastException: cannot assign instance of scala.collection.generic.DefaultSerializationProxy to field wvlet.airframe.Design.binding of type scala.collection.immutable.Vector in instance of wvlet.airframe.Design
[info]   at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
[info]   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
[info]   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
[info]   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
[info]   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
[info]   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
[info]   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
[info]   at wvlet.airframe.DesignSerializationTest$.deserialize(DesignSerializationTest.scala:33)
[info]   at wvlet.airframe.SerializationTest.$anonfun$new$2(SerializationTest.scala:43)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)

@xerial
Copy link

xerial commented Oct 11, 2018

@eed3si9n workaround 2 (custom serializer) is not ideal because, for example, Spark uses its own serializer (based on ObjectInputStream of Java) for sending closures to remote workers. It means almost all code in Spark will fail when using Scala 2.13.0-M5.

Actually my above test cases are added for supporting Spark. I think we should add serialization/deserialization test cases that use ObjectInput/OutputStream against Scala collection classes (Maybe it's already addressed in scala/bug#11192)

@eed3si9n
Copy link
Member

@xerial If the issue goes away if you fork the test, then you're hitting scala/bug#9237 (also known as sbt/sbt#89) as long as we are using Serialization Proxy pattern together with sbt, this will remain an issue unless we either fork or fix sbt.

@xerial
Copy link

xerial commented Oct 12, 2018

fork in test := true doesn't resolve the issue in my case. I'll try to create a minimization.

@xerial
Copy link

xerial commented Oct 12, 2018

Created a minimization https://gist.github.com/xerial/58e14b21a4492d6f87765fe9bff4b61a andfork in test := true resolved the error, but the error message is different from existing error reports.

I'll file another ticket. Thanks.

@retronym
Copy link
Member

retronym commented Oct 15, 2018

The other workaround might be to provide an alternative flatter class loader for sbt.
At that point, however, might as well just use forked run. I guess it depends on how often Java serialization is used in the wild.

@eed3si9n I think there is still value in non-forked execution without the layered classpaths. The Java standard library will still be warmed up. And it doesn't require configuration of settings to specify the -Xmx, etc, of the child process, as it shares these with the already-running SBT process.

@smarter
Copy link
Member

smarter commented Oct 15, 2018

So if I understand the problem correctly, when we get to:
https://github.com/scala/scala/blob/2d9e6acce8c4246e6cba5a22ff9d965ec3bd117c/src/library/scala/collection/generic/DefaultSerializationProxy.scala#L46
We cannot safely call in.readObject because the default implementation of ObjectInputStream is dumb, but @eed3si9n has demonstrated that a custom implementation can work: https://github.com/eed3si9n/minimized-list-serialization/blob/v0.2.0/src/main/scala/issue/ScalaObjectInputStream.scala, so why not simply wrap the ObjectInputStream we get into a non-dumb one ?

private[this] def readObject(in: ObjectInputStream): Unit = {
  val in2 = in match {
     case in: ScalaObjectInputStream => in
     case _ => new ScalaObjectInputStream(in)
  }
  // rest of the method as before but replacing "in" by "in2"
}

@milessabin
Copy link

I'm suddenly seeing a bunch of serialization failures running testAll against the latest from 2.13.x,

test/files/run/lambda-serialization.scala
test/files/run/t5262.scala
test/files/run/t8188.scala
test/files/jvm/manifests-new.scala
test/files/jvm/manifests-old.scala
test/files/jvm/serialization-new.scala
test/files/jvm/serialization.scala
test/files/jvm/t1600.scala

Also,

scala.collection.convert.WrapperSerializationTest.test
scala.collection.immutable.SerializationTest.vector
scala.collection.immutable.SerializationTest.list
scala.collection.immutable.SerializationTest.hashMap
scala.collection.immutable.SerializationTest.hashSet
scala.SerializationStabilityTest.testAll

Although I saw some issues earlier this seems more severe: I've just updated my JDK to 1.8.0_192 ... could that be relevant?

@milessabin
Copy link

Sorry I should have said, that's both Oracle and OpenJDK 1.8.0_192.

@milessabin
Copy link

Downgrading to OpenJDK 1.8.0_181 doesn't help, so it's not the version bump. Is anyone else seeing this?

@NthPortal
Copy link

I've been seeing it locally, yeah

@milessabin
Copy link

milessabin commented Oct 24, 2018

When did it start? Is there an existing issue for it (other than this one)?

@NthPortal
Copy link

It started for me when I rebased my month-and-a-half-old LazyList PR on 2.13.x. Oddly though, I haven't seen the tests fail on Jenkins.

@milessabin
Copy link

I've been busy with Dotty for the last couple of weeks. My last successful testAll would have been around 2nd October.

@SethTisue
Copy link
Member Author

new ticket on just the test failures in scala/scala: #569

@SethTisue
Copy link
Member Author

ScalaTest was bitten, here: scalatest/scalatest#1423

@SethTisue
Copy link
Member Author

SethTisue commented Nov 15, 2018

closing because there are now separate tickets for everything that has turned up (most notably scala/bug#9237 and scala/bug#11192 and sbt/sbt#89) (#569 is now fixed)

unless or until sbt moves to a flatter classloader structure for running test in-VM, people doing deserialization in their test suites will either have to use fork in Test := true, or alter their code to use a custom ObjectInputStream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants