Skip to content

Conversation

@sylvinus
Copy link
Contributor

@sylvinus sylvinus commented Aug 4, 2016

What changes were proposed in this pull request?

The java.net.URL class has a globally synchronized Hashtable, which limits the throughput of any single executor doing lots of calls to parse_url(). Tests have shown that a 36-core machine can only get to 10% CPU use because the threads are locked most of the time.

This patch switches to java.net.URI which has less features than java.net.URL but focuses on URI parsing, which is enough for parse_url().

New tests were added to make sure a few common edge cases didn't change behaviour.
https://issues.apache.org/jira/browse/SPARK-16826

How was this patch tested?

I've kept the old URL code commented for now, so that people can verify that the new unit tests do pass with java.net.URL.

Thanks to @srowen for the help!

@sylvinus
Copy link
Contributor Author

sylvinus commented Aug 4, 2016

Grr, looks like I pushed an extra commit (in HashedRelation.scala). Should I rebase?

@rxin
Copy link
Contributor

rxin commented Aug 4, 2016

Can you make the JIRA description self-contained? It makes it much easier to understand commits. Thanks.

@sylvinus
Copy link
Contributor Author

sylvinus commented Aug 4, 2016

@rxin is that better?

@srowen
Copy link
Member

srowen commented Aug 4, 2016

Description looks good. You can use git rebase -i HEAD~4 or similar to drop the extra commit here. Pending that and tests passing, looks good.

@sylvinus
Copy link
Contributor Author

sylvinus commented Aug 4, 2016

rebase done!

new URI(url.toString)
} catch {
case e: MalformedURLException => null
case e: URISyntaxException => null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change this unless you need to make another change anyway, but this can be case _: ... I know it wasn't like that before

@srowen
Copy link
Member

srowen commented Aug 5, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Aug 5, 2016

Test build #63265 has finished for PR 14488 at commit 4ff8200.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 5, 2016

LGTM, that does fix a fairly important bottleneck for anyone doing URL parsing.

@srowen
Copy link
Member

srowen commented Aug 5, 2016

Merged to master

@asfgit asfgit closed this in 2460f03 Aug 5, 2016
@sylvinus
Copy link
Contributor Author

sylvinus commented Aug 5, 2016

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants