-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16826][SQL] Switch to java.net.URI for parse_url() #14488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Grr, looks like I pushed an extra commit (in HashedRelation.scala). Should I rebase? |
|
Can you make the JIRA description self-contained? It makes it much easier to understand commits. Thanks. |
|
@rxin is that better? |
|
Description looks good. You can use |
|
rebase done! |
| new URI(url.toString) | ||
| } catch { | ||
| case e: MalformedURLException => null | ||
| case e: URISyntaxException => null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change this unless you need to make another change anyway, but this can be case _: ... I know it wasn't like that before
|
Jenkins test this please |
|
Test build #63265 has finished for PR 14488 at commit
|
|
LGTM, that does fix a fairly important bottleneck for anyone doing URL parsing. |
|
Merged to master |
|
Thanks a lot! |
What changes were proposed in this pull request?
The java.net.URL class has a globally synchronized Hashtable, which limits the throughput of any single executor doing lots of calls to parse_url(). Tests have shown that a 36-core machine can only get to 10% CPU use because the threads are locked most of the time.
This patch switches to java.net.URI which has less features than java.net.URL but focuses on URI parsing, which is enough for parse_url().
New tests were added to make sure a few common edge cases didn't change behaviour.
https://issues.apache.org/jira/browse/SPARK-16826
How was this patch tested?
I've kept the old URL code commented for now, so that people can verify that the new unit tests do pass with java.net.URL.
Thanks to @srowen for the help!