Skip to content

Conversation

@BryanCutler
Copy link
Member

What changes were proposed in this pull request?

Improve example outputs to better reflect the functionality that is being presented. This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output. Explicitly set parameters when they are used as part of the example. Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema. Synced examples between different APIs.

How was this patch tested?

Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62693 has finished for PR 14308 at commit ae2249a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler BryanCutler changed the title [SPARK-16260][EXAMPLES][ML] Improve ML Example Outputs [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs Jul 22, 2016
@SparkQA
Copy link

SparkQA commented Jul 28, 2016

Test build #62974 has finished for PR 14308 at commit bb2fcee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

ping @mengxr @jkbradley @MLnick , any of you mind taking a look at this? There were a few Java examples I fixed up that wouldn't run because of using mllib.linalg.Vectors. If it would be easier, I could separate those in another PR to get that in asap. Thanks!

@MLnick
Copy link
Contributor

MLnick commented Jul 29, 2016

@BryanCutler yeah if there are some changes that are more bug-fixes to make the examples work, let's separate those out into a new JIRA & PR. That should be a little higher priority for 2.0.1


System.out.println("Boundaries in increasing order: " + model.boundaries());
System.out.println("Predictions associated with the boundaries: " + model.predictions());
System.out.println("Boundaries in increasing order: " + model.boundaries() + "\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal, but why the extra line break?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 arrays that are printed are large and all the output get clumped together, looking like a huge block of text, so adding some separation makes it a bit more readable.

@srowen
Copy link
Member

srowen commented Jul 29, 2016

It's probably OK on the whole, improving or standardizing examples slightly. I left a number of small questions. Some of the changes didn't feel quite worth making but maybe I miss the logic.

@BryanCutler
Copy link
Member Author

BryanCutler commented Jul 29, 2016

@BryanCutler yeah if there are some changes that are more bug-fixes to make the examples work, let's separate those out into a new JIRA & PR. That should be a little higher priority for 2.0.1

Sure @MLnick , I realized I should probably do that about half-way into this. I'll make another JIRA and fix the Java errors there.

opened #14405 for this

@BryanCutler
Copy link
Member Author

Thanks for the review @srowen! I added some before/after outputs, so hopefully some of the changes make more sense. I'll fix up the rest after I make another JIRA for the Java errors.

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63087 has finished for PR 14308 at commit 479819d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63089 has finished for PR 14308 at commit a556742.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member Author

This has been updated since fixing the errors in Java @srowen @MLnick . I know most of these changes are trivial, but will hopefully make some of the examples easier to follow. Thanks!

@srowen
Copy link
Member

srowen commented Aug 4, 2016

There's a lot of change here; I skimmed it and it all looks generally positive, adding some consistency or clarification, or a fix in some cases. Is sample_libsvm_data.txt used anymore then? it's low risk to merge because they're example changes. I'm OK with it.

@BryanCutler
Copy link
Member Author

BryanCutler commented Aug 4, 2016

Thanks for taking another look @srowen. sample_libsvm_data.txt is still used but it looks these are
never referenced

sample_tree_data.csv
pagerank_data.txt
lr_data.txt
random.data

I can't place where sample_tree_data.csv might have belonged, pagerank_data.txt is obvious (just missing reference in usage), and lr_data.txt & random.data look like labeled points probably from some older MLlib examples.

@BryanCutler
Copy link
Member Author

BryanCutler commented Aug 4, 2016

attaching a quick audit of example data files and what examples reference them, taken from this branch
spark_example_data_audit.txt

@srowen
Copy link
Member

srowen commented Aug 5, 2016

I think it's fine to remove files that aren't referenced here too.

@BryanCutler
Copy link
Member Author

Ok, I removed these data files

sample_tree_data.csv
lr_data.txt
random.data

and added example usage to reference pagerank_data.txt

@SparkQA
Copy link

SparkQA commented Aug 5, 2016

Test build #63274 has finished for PR 14308 at commit b634f9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 5, 2016

Merged to master

@asfgit asfgit closed this in 180fd3e Aug 5, 2016
@BryanCutler
Copy link
Member Author

Thanks @srowen!

@BryanCutler BryanCutler deleted the ml-examples-improve-output-SPARK-16260 branch December 2, 2016 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants