Skip to content

Conversation

@MattWhelan
Copy link

The fundamental issue is that you can't change the delegation scheme
without overriding loadClass (rather than findClass). And, if you
override loadClass, you kind of have to do it in Java, because you need
a static initializer call to register yourself as parallel-capable.

Note that the current PR changes the Java dep to 1.7. That's necessary for the current implementation of GreedyClassLoader. 1.7 adds a fix to a long-standing deadlock issue with classloaders that don't follow the default delegation model. Basically, locking is now fine-grained, which is good.

That said, if requiring 1.7 is perceived to be a big problem, then I can strip out that code and submit a 1.6 solution.

The fundamental issue is that you can't change the delegation scheme
without overriding loadClass (rather than findClass). And, if you
override loadClass, you kind of have to do it in Java, because you need
a static initializer call to register yourself as parallel-capable.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@nchammas
Copy link
Contributor

I'll leave it to a maintainer to confirm, but I believe that Java 6 support is an official commitment of the Spark project. Dropping that support would be a major decision.

@vanzin
Copy link
Contributor

vanzin commented Jan 22, 2015

The 1.7 dependency will probably be shot down pretty quickly. But one way you could work around it in Scala is to have a factory method to create instances of this classloader. That method would then make sure the registration method is called before any instances of the class are created. That would complicate inheritance a bit, though.

I've played with fixing parts of this code in #3233, and it seems like the only thing missing is the locking, which shouldn't be hard to re-create in Java 6.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is missing a copyright header BTW)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll fix that.

@MattWhelan
Copy link
Author

@vanzin
The way the registration method works is also really weird. It takes no params, you'll notice. In ClassLoader, it calls a Reflection native method to actually get a reference to the caller's Class. Which feels all kinds of horrible to me, though it does make for a really simple interface (in Java). But, the upshot is that registerAsParallelCapable() must be called from the class being registered, before an instance of that class has been created. So basically, a static initializer.

Since there aren't real static methods in Scala, I don't think the factory thing would work. Even though you could probably get the timing right, the next level down the stack would be a companion object class, or something, not the CL itself.

Really, I'm pretty OK with that. ClassLoaders and all their related issues are deep JVM voodoo. It's a low-level concept, and Scala is a high level language.

@vanzin
Copy link
Contributor

vanzin commented Jan 22, 2015

@MattWhelan I see. Still, I'm not sure we need to call that method at all. From the docs:

In environments in which the delegation model is not strictly hierarchical, class loaders need to be
parallel capable

I don't think Spark is one of those cases; the class loaders are still hierarchical. The findClass vs. loadClass difference is a real issue, but I'm not sure the locking is.

@MattWhelan
Copy link
Author

@vanzin
There needs to be some locking. The 1.6 version of loadClass is simply a synchronized method (on 'this'). That's a viable alternative, and the way I'll do in in 1.6, since it's likely that 1.7 isn't an option. I haven't thought through the deadlock scenario in this context; I'm not sure whether it applies.

I don't know what 1.7's fallback is for non-registered CLs. The docs imply that it's not better, but that 1.7 can still support 1.6 CLs, so it's not the end of the world.

The 1.6 way has the nice property that you could implement it entirely in Scala though. Probably even as a trait.

See http://www.ibm.com/developerworks/java/library/j-dclp4/index.html for more on CL deadlocks.

MattWhelan pushed a commit to MattWhelan/spark that referenced this pull request Jan 22, 2015
The fundamental issue is that you can't change the delegation scheme
without overriding loadClass (rather than findClass). And, if you
override loadClass, you kind of have to do it in Java, because you need
a static initializer call to register yourself as parallel-capable.

This is an alternative to PR apache#4165.  That PR requires Java 1.7.  This
one sticks with 1.6, and implements the classloader as a trait, because
it can.
@srowen
Copy link
Member

srowen commented Feb 16, 2015

@MattWhelan Given that you closed #4166 is this live too? and/or is SPARK-5358 live or resolved already?

@MattWhelan
Copy link
Author

@srowen I forgot about this one :) This should be covered by @vanzin's #3233

@MattWhelan MattWhelan closed this Feb 16, 2015
@MattWhelan MattWhelan deleted the greedyClassloader branch February 16, 2015 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants