This is Serialysis, a library to parse serialized Java objects. It was originally published in this blog entry.
Thanks to Chris Frohoff (@frohoff) for rescuing the source code from oblivion. It had not survived
the migration of blogs from weblogs.java.net
to community.oracle.com
.
The text below is closely based on the original blog entry, dating from 12 June 2007.
Presenting Serialysis, a library that allows you to disassemble the serial form of Java objects. This can allow you to retrieve information about an object that is not available through its public API. It is also a useful tool when testing the serialization of your classes.
My reason for writing this library is that I encountered a couple of problems where I found that I needed information from an object that was not available through its public API, but that was available through its serial form.
One example is if you have a stub for a remote RMI object, and you want to know what address it will connect to, or what port, or using what socket factory The standard RMI API doesn't give you any way to extract this information from the stub. But the information is there, and it must be included when the stub is serialized so that the stub is usable when it is later deserialized. So if we could somehow parse the serialized stub we could get the information we want.
A second example comes from the
JMX API.
Queries to the MBean Server are represented by the interface
QueryExp
.
QueryExp
instances are constructed
using the methods of the
Query
class. If you have an object implementing
QueryExp
, how can you know what query it executes? The JMX API
doesn't include any method to find out. The information must be
present in the serial form, so that when a client sends a query to
a remote server it can be reconstituted on the server. If we could
look at the serial form, we could find out what the query
was.
This second example is what prompted me to
write this library. The existing standard JMX connectors are based
on Java serialization, so they don't need to do anything special
for QueryExp
s. But the new
Web Services Connector
being defined by JSR 262
uses XML for serialization. How can it analyze a QueryExp
in order
to convert it into XML? The answer is that the WS Connector uses a
version of this library to look at the Java-serialized
QueryExp
.
What these examples have in common is
that they illustrate gaps in the relevant APIs. There ought
to be methods that allow you to extract the information contained
in an RMI stub. There ought to be methods that convert back
from a QueryExp
object to the original Query
methods that constructed
it. (Even a standardized parseable toString()
would be enough.) But
those methods aren't there today, and if we want code that works
with those APIs as they are now, we need another
approach.
If you have the source code of the
classes you're interested in, it's tempting just to barrel in and
grab the information you need. In the RMI stub example, we can find
out by experiment that the stub's
getRef()
method returns
a sun.rmi.server.UnicastRef
, and by studying the JDK
source we might be able to figure out that this class contains a
field ref
of type sun.rmi.transport.LiveRef
with the information we need. So we might end up with code like this:
// This is NOT a good idea!!!
import sun.rmi.server.*; // !
import sun.rmi.transport.*; // !
import java.rmi.*;
import java.rmi.server.*; // !
public class StubDigger {
public static getPort(RemoteStub stub) throws Exception {
RemoteRef ref = stub.getRef();
UnicastRef uref = (UnicastRef) ref; // !
Field refField = UnicastRef.class.getDeclaredField("ref"); // !
refField.setAccessible(true); // !
LiveRef lref = (LiveRef) refField.get(uref); // !
return lref.getPort();
}
}
You might be satisfied with this, but you shouldn't be. The lines
marked // !
are full of horrors. First of all, you should
never depend on sun.*
classes, because there's
no guarantee they won't change unrecognizably in any JDK update,
plus of course your code probably won't be portable to platforms
other than the JDK. Secondly, it's a huge red flag when you see
Field.setAccessible
being called. That means the
code is depending on undocumented fields, which again could change
between releases, or, worse, which might continue to exist but with
subtly different semantics.
(The above code was
written for JDK 5. It turns out that in JDK 6, LiveRef acquires a
public getPort()
method, so you no longer need Field.setAccessible
.
But you still need to depend on sun.*
classes.)
Well, sometimes you can't do any better than this. But if the class you're interested in is serializable, often you can. The reason is that the serial form of a class is part of its public interface. If the API is any good at all then its public interfaces will evolve compatibly in every update. This is a very strong requirement on the JDK platform in particular.
So if the information you need isn't available through a class's public methods, but is part of the documented serial form, then you can rely on it remaining in the serial form in the future.
The serial form is included in the Javadoc output as part of the See Also for each serializable class. You can see the serial form of all public JDK classes in a single giant page.
My library to parse serialized objects is called Serialysis, the result of cramming the words "serial analysis" too close together.
Here's a simple example of what it looks like in action. This code...
SEntity sint = SerialScan.examine(new Integer(5));
System.out.println(sint);
...produces this output...
SObject(java.lang.Integer){
value = Prim(int){5}
}
This tells us that the java.lang.Integer
that we gave to
SerialScan.examine
serializes as an object with a
single field value
of type int
. If we
check out the documented serialized form of
java.lang.Integer
we can see that this is indeed what is expected.
If you check out the source code of java.lang.Integer
, you'll see that the class
itself also has a single field value
of
type int
:
/**
* The value of the <code>Integer</code>.
*
* @serial
*/
private final int value;
But private fields are an implementation
detail. An update could rename this field, or replace it with a new
field inherited from the parent class
java.lang.Number
,
or whatever.
There's no guarantee that that won't happen, but there is
a guarantee that the serial form will remain the same. Serialization provides
mechanisms
to keep the serial form the same even
when the class's fields change.
Here's a more
complicated example. Suppose that, for some reason, we want to know
how big the array in an
ArrayList
is. The API doesn't allow us to find out, though it does allow us to
force
the array to be at least a certain size.
If we check the serial form of
ArrayList
,
we see that it does
contain the information we're looking for. There's a serialized
field size
, which is the number of elements in the list.
That's not what we want. But the Serial Data in
the writeObject
method does have what we want:
- Serial Data:
- The length of the array backing the
ArrayList
instance is emitted (int), followed by all of its elements (each anObject
) in the proper order.
If we execute this code...
List<Integer> list = new ArrayList<Integer>();
list.add(5);
SObject slist = (SObject) SerialScan.examine(list);
System.out.println(slist);
...we get this output...
SObject(java.util.ArrayList){
size = SPrim(int){1}
-- data written by class's writeObject:
SBlockData(blockdata){4 bytes of binary data}
SObject(java.lang.Integer){
value = SPrim(int){5}
}
}
This is where we get into the gory details of serialization. In
addition to, or instead of, serializing an object's fields, its
class can declare a method writeObject(ObjectOutputStream)
that writes arbitrary data to the serial stream using methods like
ObjectOutputStream.writeInt
It must declare a corresponding readObject
that reads the same data, and
it should document via a
@serialData
tag what the writeObject
method writes, as ArrayList
does.
The writeObject
data is accessible in Serialysis through
the method SObject.getAnnotations()
, which returns a
List<SEntity>
. Each Object
that was written via the method
ObjectOutputStream.writeObject(Object)
appears as an SObject
in this list. Each chunk of data written
by one or more consecutive calls to the methods that ObjectOutputStream
gets from
DataOutput
(writeInt,
writeUTF,
etc) appears as an SBlockData
.
The serial stream doesn't include enough information to separate
out individual items within the chunk; that information is an
agreement between writer and reader that is documented by
the @serialData
tag.
Based on the ArrayList
documentation, we can find the size of the array like
this:
SObject slist = (SObject) SerialScan.examine(list);
List<SEntity> writeObjectData = slist.getAnnotations();
SBlockData data = (SBlockData) writeObjectData.get(0);
DataInputStream din = data.getDataInputStream();
int alen = din.readInt();
System.out.println("Array length: " + alen);
Without showing all the details of the code, here's the outline of
the solution to the QueryExp
problem I mentioned.
Suppose I have a QueryExp
constructed like this:
QueryExp query =
Query.or(Query.gt(Query.attr("Version"), Query.value(5)),
Query.eq(Query.attr("SupportsSpume"), Query.value(true)));
This means, "MBeans where the Version
attribute is greater than
5 or the SupportsSpume
attribute is true. The toString()
of this query in the JDK looks like this:
((Version) > (5)) or ((SupportsSpume) = (true))
The result of SerialScan.examine
looks like this:
SObject(javax.management.OrQueryExp){
exp1 = SObject(javax.management.BinaryRelQueryExp){
relOp = SPrim(int){0}
exp1 = SObject(javax.management.AttributeValueExp){
attr = SString(String){"version"}
}
exp2 = SObject(javax.management.NumericValueExp){
val = SObject(java.lang.Long){
value = SPrim(long){5}
}
}
}
exp2 = SObject(javax.management.BinaryRelQueryExp){
relOp = SPrim(int){4}
exp1 = SObject(javax.management.AttributeValueExp){
attr = SString(String){"supportsSpume"}
}
exp2 = SObject(javax.management.BooleanValueExp){
val = SPrim(boolean){true}
}
}
}
You can imagine code that descends into this structure producing an XML equivalent. Every conformant implementation of the JMX API is required to produce this same serial form, so the code that parses it is guaranteed to work everywhere.
Now here's the code that solves the RMI stub port number problem:
public static int getPort(RemoteStub stub) throws IOException {
SObject sstub = (SObject) SerialScan.examine(stub);
List<SEntity> writeObjectData = sstub.getAnnotations();
SBlockData sdata = (SBlockData) writeObjectData.get(0);
DataInputStream din = sdata.getDataInputStream();
String type = din.readUTF();
if (type.equals("UnicastRef"))
return getPortUnicastRef(din);
else if (type.equals("UnicastRef2"))
return getPortUnicastRef2(din);
else
throw new IOException("Can't handle ref type " + type);
}
private static int getPortUnicastRef(DataInputStream din) throws IOException {
String host = din.readUTF();
return din.readInt();
}
private static int getPortUnicastRef2(DataInputStream din) throws IOException {
byte hasCSF = din.readByte();
String host = din.readUTF();
return din.readInt();
}
To understand this, you need to see the serial form for
RemoteObject
.
This code is admittedly difficult, but it is portable and futureproof.
It should be fairly clear how to extract the other information I
mentioned from RMI stubs using the same
approach.
You really don't want to get into disassembling serial forms unless you have to. But if you do have to, then Serialysis should make your task a little less painful.
It's also a good way to check that your own classes serialize the way you expect them to.