Skip to content

Conversation

@techaddict
Copy link
Contributor

@techaddict techaddict commented Nov 9, 2016

What changes were proposed in this pull request?

added the new handleInvalid param for these transformers to Python to maintain API parity.

How was this patch tested?

existing tests
testing is done with new doctests

@SparkQA
Copy link

SparkQA commented Nov 9, 2016

Test build #68378 has finished for PR 15817 at commit b4720aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GBTClassifierWrapperWriter(instance: GBTClassifierWrapper)
    • class GBTClassifierWrapperReader extends MLReader[GBTClassifierWrapper]
    • class GBTRegressorWrapperWriter(instance: GBTRegressorWrapper)
    • class GBTRegressorWrapperReader extends MLReader[GBTRegressorWrapper]

@techaddict
Copy link
Contributor Author

techaddict commented Nov 11, 2016

cc: @sethah @jkbradley

Copy link
Contributor

@MLnick MLnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things, otherwise looks good.

handleInvalid="error"):
"""
__init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001)
__init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be

__init__(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001, \
         handleInvalid="error")

for API doc formatting

@since("2.0.0")
def setParams(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001):
def setParams(self, numBuckets=2, inputCol=None, outputCol=None, relativeError=0.001,
handleInvalid="error"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing handleInvalid in doc string below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@keyword_only
@since("1.4.0")
def setParams(self, splits=None, inputCol=None, outputCol=None):
def setParams(self, splits=None, inputCol=None, outputCol=None, handleInvalid="error"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing handleInvalid in doc string below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@SparkQA
Copy link

SparkQA commented Nov 11, 2016

Test build #68525 has finished for PR 15817 at commit 234d165.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

typeConverter=TypeConverters.toListFloat)

handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid entries. " +
"Options are skip (filter out rows with invalid values), " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put the options in single quotes, e.g. "Options are 'skip' ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@techaddict I don't think you addressed this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair we don't have it quoted in the scala param description, so if we want to make this change we should probably also change it in the scala side just for consistencies sake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's pretty minor. Maybe we can do it later in a follow up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, since we've already cut RC1 and it would be nice to have these params in sooner rather than later and @techaddict seems to be a bit busy I've created a follow up JIRA ( SPARK-18628 ) for this so that we can maybe move ahead with this as is.

... inputCol="values", outputCol="buckets", relativeError=0.01, handleInvalid="error")
>>> qds.getRelativeError()
0.01
>>> qds.getHandleInvalid()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't add anything to the doctest of bucketizer. Actually, I think it would be nice in both places to set handleInvalid='skip' and then add an invalid value to the example data. That way we can show what we mean by invalid and prove that it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! adding

@SparkQA
Copy link

SparkQA commented Nov 11, 2016

Test build #68534 has finished for PR 15817 at commit d589515.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class AesCipher
    • public class AesConfigMessage implements Encodable
    • public class ByteArrayReadableChannel implements ReadableByteChannel

@jkbradley
Copy link
Member

jkbradley commented Nov 14, 2016

Can you please add "[ML]" to the PR title? Thanks!

@techaddict techaddict changed the title [SPARK-18366][PYSPARK] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer Nov 14, 2016
@jkbradley
Copy link
Member

Can you please implement the Param directly in Bucketizer and QuantileDiscretizer? Just like in Scala, HasHandleInvalid has built-in Param doc which applies to existing use cases but not Bucketizer and QuantileDiscretizer. It will be better to copy the Param, setter, and getter into Bucketizer and QuantileDiscretizer so that we can specialize the built-in Param doc.

@techaddict
Copy link
Contributor Author

@jkbradley done 👍

@SparkQA
Copy link

SparkQA commented Nov 15, 2016

Test build #68649 has finished for PR 15817 at commit 6687d3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@techaddict
Copy link
Contributor Author

ping @davies @jkbradley

@holdenk
Copy link
Contributor

holdenk commented Nov 26, 2016

Thanks for working on this @techaddict - one super minor point , but could you also maybe update the PR description to mention the testing is done with new doctests? This is really minor but for people skimming the changelog the PR description will end up as the commit message.

@holdenk
Copy link
Contributor

holdenk commented Nov 28, 2016

ok let's re-ping @MLnick / @sethah - I know we asked to update the docstring - but the current one is consistent with the Scala docstring so maybe it make sense as is (otherwise we should probably also update the scala docstring).

@MLnick
Copy link
Contributor

MLnick commented Nov 29, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Nov 29, 2016

Test build #69331 has finished for PR 15817 at commit 6687d3c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Nov 29, 2016

LGTM given our planned follow up to update the documentation for both Python and Scala.

asfgit pushed a commit that referenced this pull request Nov 30, 2016
…iscretizer and Bucketizer

## What changes were proposed in this pull request?
added the new handleInvalid param for these transformers to Python to maintain API parity.

## How was this patch tested?
existing tests
testing is done with new doctests

Author: Sandeep Singh <sandeep@techaddict.me>

Closes #15817 from techaddict/SPARK-18366.

(cherry picked from commit fe854f2)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
@MLnick
Copy link
Contributor

MLnick commented Nov 30, 2016

Sorry for delay - this LGTM. Given it's been around for a while and given RC2 is likely to be cut, I've gone ahead and merged to master / branch-2.1. Thanks!

@asfgit asfgit closed this in fe854f2 Nov 30, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…iscretizer and Bucketizer

## What changes were proposed in this pull request?
added the new handleInvalid param for these transformers to Python to maintain API parity.

## How was this patch tested?
existing tests
testing is done with new doctests

Author: Sandeep Singh <sandeep@techaddict.me>

Closes apache#15817 from techaddict/SPARK-18366.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…iscretizer and Bucketizer

## What changes were proposed in this pull request?
added the new handleInvalid param for these transformers to Python to maintain API parity.

## How was this patch tested?
existing tests
testing is done with new doctests

Author: Sandeep Singh <sandeep@techaddict.me>

Closes apache#15817 from techaddict/SPARK-18366.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants