Skip to content

NodeSeq.Empty is not serializable #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joescii opened this issue Aug 2, 2017 · 5 comments
Closed

NodeSeq.Empty is not serializable #154

joescii opened this issue Aug 2, 2017 · 5 comments
Milestone

Comments

@joescii
Copy link
Contributor

joescii commented Aug 2, 2017

It appears that the typical use case of NodeSeq serializes with no problems (I'm speaking of Node, which we can see here is marked as Serializable), but the NodeSeq.Empty surprisingly doesn't. I would expect it to behave like String where the empty string is trivially serializable.

Is there any reason that NodeSeq should not extend Serializable? The offender in the case of NodeSeq.Empty could trivially be marked to extend Serializable here

You can see the behavior below in this console session. Note that java version details are printed at the end. This was run on a MBP running Sierra 10.12.6.

[info] Starting scala interpreter...
[info] 
Welcome to Scala 2.12.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

scala> import java.io._
import java.io._

scala> import scala.xml.NodeSeq
import scala.xml.NodeSeq

scala>       def serialize(in: Any): Array[Byte] = {
     |         val bos = new ByteArrayOutputStream()
     |         val oos = new ObjectOutputStream(bos)
     |         oos.writeObject(in)
     |         oos.flush()
     |         bos.toByteArray()
     |       }
serialize: (in: Any)Array[Byte]

scala> serialize("joescii")
res0: Array[Byte] = Array(-84, -19, 0, 5, 116, 0, 7, 106, 111, 101, 115, 99, 105, 105)

scala> serialize("")
res3: Array[Byte] = Array(-84, -19, 0, 5, 116, 0, 0)

scala> serialize(<joescii/>)
res1: Array[Byte] = Array(-84, -19, 0, 5, 115, 114, 0, 14, 115, 99, 97, 108, 97, 46, 120, 109, 108, 46, 69, 108, 101, 109, -121, -95, 0, -70, 78, 93, -40, -69, 2, 0, 6, 90, 0, 13, 109, 105, 110, 105, 109, 105, 122, 101, 69, 109, 112, 116, 121, 76, 0, 10, 97, 116, 116, 114, 105, 98, 117, 116, 101, 115, 116, 0, 20, 76, 115, 99, 97, 108, 97, 47, 120, 109, 108, 47, 77, 101, 116, 97, 68, 97, 116, 97, 59, 76, 0, 5, 99, 104, 105, 108, 100, 116, 0, 22, 76, 115, 99, 97, 108, 97, 47, 99, 111, 108, 108, 101, 99, 116, 105, 111, 110, 47, 83, 101, 113, 59, 76, 0, 5, 108, 97, 98, 101, 108, 116, 0, 18, 76, 106, 97, 118, 97, 47, 108, 97, 110, 103, 47, 83, 116, 114, 105, 110, 103, 59, 76, 0, 6, 112, 114, 101, 102, 105, 120, 113, 0, 126, 0, 3, 76, 0, 5, 115, 99, 111, 112, 101, 11...

scala> val xml = <joescii/>
xml: scala.xml.Elem = <joescii/>

scala> serialize(xml)
res4: Array[Byte] = Array(-84, -19, 0, 5, 115, 114, 0, 14, 115, 99, 97, 108, 97, 46, 120, 109, 108, 46, 69, 108, 101, 109, -121, -95, 0, -70, 78, 93, -40, -69, 2, 0, 6, 90, 0, 13, 109, 105, 110, 105, 109, 105, 122, 101, 69, 109, 112, 116, 121, 76, 0, 10, 97, 116, 116, 114, 105, 98, 117, 116, 101, 115, 116, 0, 20, 76, 115, 99, 97, 108, 97, 47, 120, 109, 108, 47, 77, 101, 116, 97, 68, 97, 116, 97, 59, 76, 0, 5, 99, 104, 105, 108, 100, 116, 0, 22, 76, 115, 99, 97, 108, 97, 47, 99, 111, 108, 108, 101, 99, 116, 105, 111, 110, 47, 83, 101, 113, 59, 76, 0, 5, 108, 97, 98, 101, 108, 116, 0, 18, 76, 106, 97, 118, 97, 47, 108, 97, 110, 103, 47, 83, 116, 114, 105, 110, 103, 59, 76, 0, 6, 112, 114, 101, 102, 105, 120, 113, 0, 126, 0, 3, 76, 0, 5, 115, 99, 111, 112, 101, 11...

scala> serialize(NodeSeq.Empty)
java.io.NotSerializableException: scala.xml.NodeSeq$$anon$1

  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1182)
  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
  at .serialize(<console>:18)
  ... 39 elided

scala> 
descartes:lift-framework joescii$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
@ashawley
Copy link
Member

ashawley commented Aug 3, 2017

Is there any reason that NodeSeq should not extend Serializable?

It's a good question. I'm not sure there is. For completeness, NodeSeq including Empty should be serializable. Perhaps it just got missed. Combing through the commit history, it seems there were multiple attempts to improve or modify serialization in this library, including the use and subsequent dropping of the old compiler annotation @serialization. I predict just no one has ever generated an empty NodeSeq and attempted to serialize it.

How are you coming across this? I presume you're loading XML, and then writing code that is filtering stuff out that is generating NodeSeq.Empty. Am I close?

@joescii
Copy link
Contributor Author

joescii commented Aug 3, 2017

Lift uses NodeSeq all over the place. I'm working on storing more of Lift's session state into the underlying web container, which then uses java serialization to form clusters. It is very common for page state to include snippets of markup, which are passed around as instances of NodeSeq.

It turns out this problem is more than just NodeSeq.Empty. The implicit conversion from Seq[Node] => NodeSeq also uses this code, causing it to have serialization failures.

I simply marked NodeSeq as Serializable in PR #155. I included a few tests to cover cases I'm aware of.

@ashawley
Copy link
Member

After looking at this a bit, I'm not sure serialization ever worked given both this defect and possibly others. I have hard time believing this, since there is at least a semblance of serialization sprinkled around the library including some basic tests of its function.

I worry that supporting serialization may have consequences, but I suppose most people using it will know and accept the risks.

@joescii
Copy link
Contributor Author

joescii commented Aug 15, 2017

I've fortunately or unfortunately (not sure which LOL) been exposed to java serialization often over the past decade, hence why I wasn't a bit hesitant to depend on it for a new feature for Lift. So if you have any specific questions or concerns, I can certainly help us discuss them and understand how everyone involved feels about fixing this issue.

FWIW, the way you finish sums it up well. Serialization has its surprises, but if you're reaching for serialization you're ready to watch for them. Furthermore if we add Serializable to NodeSeq, users who are NOT using serialization are unaffected.

As it stands today, the surprise is that most instances of NodeSeq are serializable until you get to the most basic corner case, NodeSeq.Empty. AFAIK, NodeSeq can never be anything other than pure data so it's a great candidate for extending Serialization, much like how a case class does so by default. I feel quite strongly this should be regarded as a bug.

What do you believe is the best due diligence to determine the best way to resolve this issue? I suspect I will eventually find any other corner cases as I ago (for instance, fixing Empty also fixes an implicit conversion tested here). I could comb through the source a bit more and add more tests to give better coverage for future-proofing as serialization is easily missed in the Scala community because of how much we're used to case class giving it to us for free. Then they're aren't caught until some downstream project like Lift or Akka tries to do some complicated clustering junk where the easiest path to serialization is what the JVM provides by default.

@ashawley
Copy link
Member

ashawley commented Oct 6, 2017

Thanks for reporting this and contributing the fix. I've merged #155 and it will be in the next release, version 1.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants