-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs #14308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs #14308
Conversation
|
Test build #62693 has finished for PR 14308 at commit
|
|
Test build #62974 has finished for PR 14308 at commit
|
|
ping @mengxr @jkbradley @MLnick , any of you mind taking a look at this? There were a few Java examples I fixed up that wouldn't run because of using mllib.linalg.Vectors. If it would be easier, I could separate those in another PR to get that in asap. Thanks! |
|
@BryanCutler yeah if there are some changes that are more bug-fixes to make the examples work, let's separate those out into a new JIRA & PR. That should be a little higher priority for |
|
|
||
| System.out.println("Boundaries in increasing order: " + model.boundaries()); | ||
| System.out.println("Predictions associated with the boundaries: " + model.predictions()); | ||
| System.out.println("Boundaries in increasing order: " + model.boundaries() + "\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal, but why the extra line break?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2 arrays that are printed are large and all the output get clumped together, looking like a huge block of text, so adding some separation makes it a bit more readable.
|
It's probably OK on the whole, improving or standardizing examples slightly. I left a number of small questions. Some of the changes didn't feel quite worth making but maybe I miss the logic. |
Sure @MLnick , I realized I should probably do that about half-way into this. I'll make another JIRA and fix the Java errors there. opened #14405 for this |
|
Thanks for the review @srowen! I added some before/after outputs, so hopefully some of the changes make more sense. I'll fix up the rest after I make another JIRA for the Java errors. |
…lib.Vectors" This reverts commit d2d0671.
…ve-output-SPARK-16260
|
Test build #63087 has finished for PR 14308 at commit
|
|
Test build #63089 has finished for PR 14308 at commit
|
|
There's a lot of change here; I skimmed it and it all looks generally positive, adding some consistency or clarification, or a fix in some cases. Is sample_libsvm_data.txt used anymore then? it's low risk to merge because they're example changes. I'm OK with it. |
|
Thanks for taking another look @srowen. I can't place where |
|
attaching a quick audit of example data files and what examples reference them, taken from this branch |
|
I think it's fine to remove files that aren't referenced here too. |
|
Ok, I removed these data files and added example usage to reference |
|
Test build #63274 has finished for PR 14308 at commit
|
|
Merged to master |
|
Thanks @srowen! |
What changes were proposed in this pull request?
Improve example outputs to better reflect the functionality that is being presented. This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output. Explicitly set parameters when they are used as part of the example. Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema. Synced examples between different APIs.
How was this patch tested?
Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.