Skip to content

Conversation

@feynmanliang
Copy link
Contributor

Adds unit test for equals on mllib.linalg.Matrix class and equals to both SparseMatrix and DenseMatrix. Supports equality testing between SparseMatrix and DenseMatrix.

@mengxr

@feynmanliang
Copy link
Contributor Author

To start the discussion; should a SparseMatrix and a DenseMatrix be equal if they contain the same underlying content (i.e. do we define == as equality in semantics or as equality in representation)?

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40202 has finished for PR 8042 at commit 78f9426.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@feynmanliang
Copy link
Contributor Author

Jenkins test please

@feynmanliang
Copy link
Contributor Author

Jenkins test this please

@jkbradley
Copy link
Member

To start the discussion; should a SparseMatrix and a DenseMatrix be equal if they contain the same underlying content (i.e. do we define == as equality in semantics or as equality in representation)?

If we follow Breeze's example, then we should define equality in semantics, not representation. I think this PR is valid.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #1420 has finished for PR 8042 at commit 78f9426.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should be for Matrix, not just SparseMatrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Will also change DenseMatrix.equals.

@feynmanliang feynmanliang changed the title [SPARK-9750][MLlib] Add equals to SparseMatrix [SPARK-9750][MLlib] Improve equals on SparseMatrix and DenseMatrix Aug 10, 2015
@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40320 has finished for PR 8042 at commit 22782df.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you had before better. Using Breeze, I think you'd avoid converting to dense representations. With toArray, it becomes dense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally forgot about that, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could replace toArray with toBreeze here too, in case "o" is sparse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jkbradley
Copy link
Member

LGTM pending tests

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40488 has finished for PR 8042 at commit bb70d5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Merging with master and branch-1.5

asfgit pushed a commit that referenced this pull request Aug 11, 2015
Adds unit test for `equals` on `mllib.linalg.Matrix` class and `equals` to both `SparseMatrix` and `DenseMatrix`. Supports equality testing between `SparseMatrix` and `DenseMatrix`.

mengxr

Author: Feynman Liang <fliang@databricks.com>

Closes #8042 from feynmanliang/SPARK-9750 and squashes the following commits:

bb70d5e [Feynman Liang] Breeze compare for dense matrices as well, in case other is sparse
ab6f3c8 [Feynman Liang] Sparse matrix compare for equals
22782df [Feynman Liang] Add equality based on matrix semantics, not representation
78f9426 [Feynman Liang] Add casts
43d28fa [Feynman Liang] Fix failing test
6416fa0 [Feynman Liang] Add failing sparse matrix equals tests

(cherry picked from commit 520ad44)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 520ad44 Aug 11, 2015
@mengxr
Copy link
Contributor

mengxr commented Aug 12, 2015

@feynmanliang The contract for Java's Object is that a.equals(b) implies a.hashCode == b.hashCode. So usually we need to implement both. The problem with hashCode is that we shouldn't compute it based on all values, which could be very expensive. You can use the implementation of Vector.hashCode as a template, but that requires some changes to avoid hash code collisions.

@feynmanliang
Copy link
Contributor Author

@mengxr made SPARK-9919 to track this

CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
Adds unit test for `equals` on `mllib.linalg.Matrix` class and `equals` to both `SparseMatrix` and `DenseMatrix`. Supports equality testing between `SparseMatrix` and `DenseMatrix`.

mengxr

Author: Feynman Liang <fliang@databricks.com>

Closes apache#8042 from feynmanliang/SPARK-9750 and squashes the following commits:

bb70d5e [Feynman Liang] Breeze compare for dense matrices as well, in case other is sparse
ab6f3c8 [Feynman Liang] Sparse matrix compare for equals
22782df [Feynman Liang] Add equality based on matrix semantics, not representation
78f9426 [Feynman Liang] Add casts
43d28fa [Feynman Liang] Fix failing test
6416fa0 [Feynman Liang] Add failing sparse matrix equals tests
@feynmanliang feynmanliang deleted the SPARK-9750 branch August 17, 2015 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants