Skip to content

Conversation

@staple
Copy link
Contributor

@staple staple commented Sep 16, 2014

The NaiveBayes, ALS, and DecisionTree learners do not require external caching to prevent repeated RDD re-evaluation during learning iterations. NaiveBayes only evaluates its input RDD once, while ALS and DecisionTree internally persist transformations of their input RDDs.

@SparkQA
Copy link

SparkQA commented Sep 16, 2014

Can one of the admins verify this patch?

@mengxr
Copy link
Contributor

mengxr commented Sep 16, 2014

add to whitelist

@mengxr
Copy link
Contributor

mengxr commented Sep 16, 2014

this is ok to test

@SparkQA
Copy link

SparkQA commented Sep 16, 2014

QA tests have started for PR 2412 at commit c8ff120.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 16, 2014

QA tests have finished for PR 2412 at commit c8ff120.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class NonASCIICharacterChecker extends ScalariformChecker

@davies
Copy link
Contributor

davies commented Sep 17, 2014

@staple I also addressed this in #2378 , could you help to review this part?

@davies
Copy link
Contributor

davies commented Sep 24, 2014

@staple could you rebase this PR?

@staple
Copy link
Contributor Author

staple commented Sep 25, 2014

@davies It looks like in your #2378 you already disabled caching for NaiveBayes and DecisionTree. The only difference from this patch is that I disabled caching for ALS as well.

We discussed this a bit here: #2378 (comment). I filed SPARK-3550 as a follow up of the work on uncached input warnings (#2347). The warnings are only supposed to be printed if the input data is accessed repeatedly on many iterations during learning. That's not the case with ALS, so a warning shouldn't be printed there. But I can see there's a case for caching because the input data is accessed not once but twice when constructing an intermediate representation of the data. I don't have a strong preference on whether we should or should not cache in python for the ALS learner.

If you are fine with continuing to cache in python for ALS, then there's no more work to be done for this ticket, SPARK-3550.

@davies
Copy link
Contributor

davies commented Sep 25, 2014

@staple thanks, I'd like to keep it as before for ALS, could you close this PR (maybe also the issue)?

@staple
Copy link
Contributor Author

staple commented Sep 25, 2014

@davies, sure will do

@staple staple closed this Sep 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants