Skip to content

Serialization of static modules does not need to contain any data #10412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lrytz opened this issue Jul 17, 2017 · 11 comments
Closed

Serialization of static modules does not need to contain any data #10412

lrytz opened this issue Jul 17, 2017 · 11 comments
Assignees
Milestone

Comments

@lrytz
Copy link
Member

lrytz commented Jul 17, 2017

Static modules get a readResolve method that returns the singleton instance, so the de-serialized object is not used. So there's no need to serialize all the fields. See also #9375.

@lrytz lrytz added this to the 2.12.4 milestone Jul 17, 2017
@lrytz lrytz self-assigned this Jul 17, 2017
@SethTisue
Copy link
Member

nice catch

@lrytz
Copy link
Member Author

lrytz commented Jul 17, 2017

(It was some else's 🎣 )

@adriaanm adriaanm modified the milestones: 2.12.4, 2.12.5 Oct 10, 2017
@SethTisue SethTisue modified the milestones: 2.12.5, Backlog Mar 1, 2018
@SethTisue
Copy link
Member

@lrytz I'm reassigning most 2.12.5 tickets to "Backlog", but this seems like it might be low-hanging fruit you might actually like to tackle for 2.12.5 or 2.12.6?

@lrytz lrytz modified the milestones: Backlog, 2.12.6 Mar 2, 2018
@SethTisue SethTisue modified the milestones: 2.12.6, 2.12.7 Mar 23, 2018
@szeiger
Copy link

szeiger commented May 22, 2018

The problem gets worse when the object contains non-serializable state or when you use serialization proxies and circular dependencies (in which case a proxy of the wrong type can leak out into other deserialization code).

There does not seem to be an easy way to avoid this problem in a generic way (i.e. in a trait which could be implemented by multiple objects). It can't extend Externalizable because an object's class does not have a public no-args constructor. It can't serialize a dummy via writeReplace because this dummy is visible to other proxies in case of circular dependencies. The only workaround I found is to implement custom serialization code in every object (and superclass with state):

  private[this] def writeObject(out: ObjectOutputStream): Unit = ()
  private[this] def readObject(in: ObjectInputStream): Unit = ()

A proper fix could consist of giving module classes a public constructor and implementing Externalizable.

@lrytz
Copy link
Member Author

lrytz commented May 25, 2018

Hmm, I just merged scala/scala#6676 which adds a number of these overrides that @szeiger suggests above. However, reading https://docs.oracle.com/javase/8/docs/platform/serialization/spec/output.html#a861

writeObject is responsible for saving the state of the class. Either defaultWriteObject or writeFields method must be called before writing any optional data; Even if no optional data is written, defaultWriteObject or writeFields must still be invoked once.

Maybe we're fine because modules have a readResolve method that discards the deserialized object. But can we be sure?

For superclasses of modules, this is not the case; they don't have a readResolve method.

@szeiger
Copy link

szeiger commented May 25, 2018

I also read that section before making the change. Continuing in the same paragraph:

If defaultWriteObject or writeFields is not invoked once prior to the writing of optional data (if any), then the behavior of instance deserialization is undefined in cases where the ObjectInputStream cannot resolve the class which defined the writeObject method in question.

Maybe I don't understand the implications but this looks to me like it is perfectly safe not do do default serialization. What's the point of deserializing an object when the class cannot be resolved? The only use case appears to be partial parsing of object streams, i.e. skipping over objects that cannot be deserialized. Maybe deserializing with a new, not binary compatible, version of a class that omits a superclass? Since we don't even guarantee serialization compatibility for binary compatible classes we can ignore this case.

For superclasses of modules, this is not the case; they don't have a readResolve method.

Note that readResolve affects the whole object whereas writeObject and readObject only affect the state that is defined by the current class. If you use Serializable instead of Externalizable there is no way to prevent serialization of superclass state other than using writeReplace (which is too fragile for a default solution that could be implemented automatically by the compiler). I think we have to live with the fact that superclass state will be serialized unnecessarily.

The best we can safely do is to not serialize the object's own state. If we implement this directly in the compiler, an alternative to writeObject / readObject that avoids possible defaultWriteObject problems is to mark all fields as transient. This includes synthetic fields like outer pointers of nested objects.

Another alternative that would prevent serialization of superclass state is to automatically implement Externalizable along with the required methods whenever an object implements Serializable. This would require changes to the initialization of objects. The implementing class must get a public no-args constructor (which is called during deserialization) that initializes all final fields with default values. The real initialization for the static MODULE$ requires a second constructor with a dummy argument. But this is not a perfect solution for superclass state, either. Now you won't get unnecessary serialization and deserialization of that state but instead the constructors of superclasses are run, which could be even worse.

@mkeskells
Copy link

mkeskells commented Jun 24, 2018

Cant this simple be fixed with writeReplace/readResolve, that the normal way to do singleton serialisation,
quoting from https://docs.oracle.com/javase/7/docs/api/java/io/Serializable.html

Serializable classes that need to designate an alternative object to be used when writing an object to the stream should implement this special method with the exact signature:

ANY-ACCESS-MODIFIER Object writeReplace() throws ObjectStreamException;

This writeReplace method is invoked by serialization if the method exists and it would be accessible from a method defined within the class of the object being serialized. Thus, the method can have private, protected and package-private access. Subclass access to this method follows java accessibility rules.
Classes that need to designate a replacement when an instance of it is read from the stream should implement this special method with the exact signature.
ANY-ACCESS-MODIFIER Object readResolve() throws ObjectStreamException;

so we generate in the static module

@throws[classOf[ObjectStreamException]]
private[this] def writeReplace = new ModuleToken(getClass)

and a utility class

final class ModuleToken(cls:Class) extends Serialisable {
  @throws[classOf[ObjectStreamException]]
  private def readResolve = ... // return the singleton
}

@retronym
Copy link
Member

@szeiger What problems are you thinking about when you say:

which is too fragile for a default solution that could be implemented automatically by the compiler

@szeiger
Copy link

szeiger commented Jun 29, 2018

I was referring to the cyclic references problem with serialization proxies but a quick test run of serialization tests for the collections (with my current workaround replaced by a proxy) shows no problems. I'll have to perform a broader test by letting the compiler generate this pattern for all singleton objects.

@szeiger
Copy link

szeiger commented Jun 29, 2018

Here's my attempt so far: scala/scala@2.13.x...szeiger:issue/10412

But I can't make it work. Whatever way I use to generate the call to getClass() (see SyntheticMethods.scala), if it gets past type-checking I end up with an error like this:

scala> object O extends Serializable
ReplGlobal.abort: Unexpected tree in genLoad: type/class scala.reflect.internal.Trees$TypeTree at: source-<console>,line-1,offset=7
error:
  Unexpected tree in genLoad: type/class scala.reflect.internal.Trees$TypeTree at: source-<console>,line-1,offset=7
     while compiling: <console>
        during phase: jvm
     library version: version 2.13.0-20180627-110910-02a4e88
    compiler version: version 2.13.0-20180627-110910-02a4e88
  reconstructed args: -deprecation -usejavacp -feature

  last tree to typer: Select(Select(Select(Select(Ident($line3), $read), $iw), $iw), O)
       tree position: line 1 of <console>
            tree tpe: type
              symbol: object iw$O in package $line3
   symbol definition: class iw$O extends Serializable (a ModuleClassSymbol)
      symbol package: $line3
       symbol owners: object iw$O
           call site: constructor $read$O in package $line3

== Source file context for tree position ==

     1 object O extends Serializable
     2
error: Error while emitting <console>

  Unexpected tree in genLoad: type/class scala.reflect.internal.Trees$TypeTree at: source-<console>,line-1,offset=7
     while compiling: <console>
        during phase: jvm
     library version: version 2.13.0-20180627-110910-02a4e88
    compiler version: version 2.13.0-20180627-110910-02a4e88
  reconstructed args: -deprecation -usejavacp -feature

  last tree to typer: Select(Select(Select(Select(Ident($line3), $read), $iw), $iw), O)
       tree position: line 1 of <console>
            tree tpe: type
              symbol: object iw$O in package $line3
   symbol definition: class iw$O extends Serializable (a ModuleClassSymbol)
      symbol package: $line3
       symbol owners: object iw$O
           call site: constructor $read$O in package $line3

== Source file context for tree position ==

     1 object O extends Serializable
     2

Any suggestions welcome on how to do that adaptation to the existential type correctly.

@SethTisue
Copy link
Member

(scala/scala#7297 replaced scala/scala#6877)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants