Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Sep 29, 2016

What changes were proposed in this pull request?

Partial revert of #15277 to instead sort and store input to model rather than require sorted input

How was this patch tested?

Existing tests.

@yanboliang
Copy link
Contributor

@srowen I think this PR may fail MiMa tests, since it makes binary incompatible change. The major disagreement between this and #15277 is whether to keep selectedFeatures sorted. I think both is OK and these two PRs have no difference comparing computation cost and others. So I still not prefer to make this change(due to binary incompatible), since we don't have strong requirements to make it. I'm still open to hear others' thoughts. Thanks!

@yanboliang
Copy link
Contributor

We need sort cost in any case, and put it in fit/training or model has no difference. So I think if we want to introduce this binary incompatible change, there should be strong requirements or users complain. Thanks!

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66103 has finished for PR 15299 at commit c0c0174.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Sep 29, 2016

What is binary incompatible here?

@yanboliang
Copy link
Contributor

Oh, isSorted is left and it's not introduce binary incompatible right now. Thanks for your remind. I'm neutral for this change. Thanks!

@srowen
Copy link
Member Author

srowen commented Sep 29, 2016

Yeah I left it deprecated because I really can't figure why it was left protected. If you're ok with this then yes I'd like to restore these parts from the original change. I think it makes slightly more sense API wise

@yanboliang
Copy link
Contributor

@srowen It looks strange to left it protected, and deprecating it looks ok to me except someone tells me any reason. BTW, please update Python API docs to reflect that selectedFeatures is not necessary to be sorted. Thanks.

@mpjlu
Copy link

mpjlu commented Sep 29, 2016

hi @srowen , is @transient needed for val selectedFeatures or val filterIndices, one of them?
is it good to define filterIndices lazy?

@srowen
Copy link
Member Author

srowen commented Sep 29, 2016

I don't think it can be transient, because if serialized, this data must be part of the model. It is just an array of ints. I don't think lazy makes sense because it is just holding on to an unsorted array to later sort it. But this is an array of, in any reasonable case, less than a thousand elements. quicksorting it is trivial.

Done, I added the selectedFeatures change.

@SparkQA
Copy link

SparkQA commented Sep 29, 2016

Test build #66107 has finished for PR 15299 at commit 02a2b40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented Sep 30, 2016

Hey folks - what exactly is the issue we're concerned about here for binary compat? SPARK-17017 is for 2.1, not branch-2.0.

Is MiMa really an issue since 2.1 is not released yet?

@yanboliang
Copy link
Contributor

@MLnick This change will not break binary compatibility currently. It marks isSorted as deprecated and will break binary compatibility when we delete that method. This should be not a big issue then, since a deprecated method may not be used. What's your opinion about this change? Thanks.

@MLnick
Copy link
Contributor

MLnick commented Sep 30, 2016

Ah sorry ok I was a bit confused - I thought the isSorted method had only been added after 2.0, but it has been around since the beginning. It is indeed public (well protected) so is a MiMa issue if removed.

require(isSorted(selectedFeatures), "Array has to be sorted asc")
private val filterIndices = selectedFeatures.sorted

@deprecated("not intended for subclasses to use", "2.1.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also fail to see why this needs to be exposed. +1 on deprecation.

* Chi Squared selector model.
*
* @param selectedFeatures list of indices to select (filter). Must be ordered asc
* @param selectedFeatures list of indices to select (filter).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should say "since the model requires sorted indices, selectedFeatures will be sorted" or something - just to make it clear the model does have this requirement, but takes care of that itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind that, though, my original theory behind this little change was that the sorting is wholly an implementation detail that callers don't need to promise or be promised about these features. It's very small, but, do we need to even promise these are sorted here?

@srowen
Copy link
Member Author

srowen commented Oct 1, 2016

Merged to master

@asfgit asfgit closed this in b88cb63 Oct 1, 2016
@srowen srowen deleted the SPARK-17704.2 branch October 1, 2016 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants