Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4406] [MLib] FIX: Validate k in SVD #3945

Closed
wants to merge 2 commits into from

Conversation

MechCoder
Copy link
Contributor

Raise exception when k is non-positive in SVD

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@MechCoder
Copy link
Contributor Author

ping @jkbradley . I think I have fixed the issue. Can you have a look?

@@ -102,6 +102,9 @@ class IndexedRowMatrix(
k: Int,
computeU: Boolean = false,
rCond: Double = 1e-9): SingularValueDecomposition[IndexedRowMatrix, Matrix] = {
if (k < 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use require here? The message is inconsistent with the check, since the text says that k == 1 is not allowed, but it is.

@MechCoder MechCoder force-pushed the spark-4406 branch 2 times, most recently from 3174584 to 49d653a Compare January 8, 2015 11:36
@MechCoder
Copy link
Contributor Author

@srowen I've fixed that up and pushed it. Any more comments?
(I removed the line for RowMatrix, since it is tested just below)

@MechCoder
Copy link
Contributor Author

Are merge conflicts the reason, that Travis is not running?

@srowen
Copy link
Member

srowen commented Jan 8, 2015

@MechCoder your patch has no merge conflicts. Do you mean Jenkins? As you can see above he's waiting for authorization from an admin to test your patch, since you're not whitelisted yet. The PR looks fine.

@MechCoder
Copy link
Contributor Author

Ah, thanks a lot! Would you be able to whitelist my name?

@mengxr
Copy link
Contributor

mengxr commented Jan 8, 2015

add to whitelist

@mengxr
Copy link
Contributor

mengxr commented Jan 8, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25241 has started for PR 3945 at commit 12dae73.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25241 has finished for PR 3945 at commit 12dae73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25241/
Test PASSed.

@MechCoder
Copy link
Contributor Author

@mengxr Thanks :) . Good to go?

@@ -113,6 +113,16 @@ class IndexedRowMatrixSuite extends FunSuite with MLlibTestSparkContext {
assert(closeToZero(U * brzDiag(s) * V.t - localA))
}

test("validate k in svd") {
val A = new IndexedRowMatrix(indexedRows)
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would always pass. The following is more common in Spark's codebase:

intercept[IllegalArgumentException] {
  A.computeSVD(-1)
}

@MechCoder
Copy link
Contributor Author

@jkbradley @mengxr Thanks for your reviews. I have addressed them in the last commit.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25259 has started for PR 3945 at commit fcb4ca5.

  • This patch merges cleanly.

@@ -212,7 +212,7 @@ class RowMatrix(
tol: Double,
mode: String): SingularValueDecomposition[RowMatrix, Matrix] = {
val n = numCols().toInt
require(k > 0 && k <= n, s"Request up to n singular values but got k=$k and n=$n.")
require(k > 0 && k <= n, s"Request up to n singular values but got k=$k and numCols=$n.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be $n here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine; I just meant the printed text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, please let me know if there is anything else left to be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I just realized the original error message is incorrect. It should say "Requested up to k" (not "n"). The rest is fine. Can you please fix this in IndexedRowMatrix as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right as it is. Does this not mean that k can be up to n?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the phrase "Requested up to SOMETHING", the value SOMETHING should be the maximum number of singular values requested, which is specified by k.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I got confused by the condition k <= n . Which made me read it as "The maximum singular values possible is n, but provided with k (which is greater than n)."

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25259 has finished for PR 3945 at commit fcb4ca5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25259/
Test PASSed.

@jkbradley
Copy link
Member

@MechCoder After that last comment I just made, I believe this will be ready. Thanks!

@MechCoder
Copy link
Contributor Author

@jkbradley I've fixed it up. Thanks!

@SparkQA
Copy link

SparkQA commented Jan 9, 2015

Test build #25341 has started for PR 3945 at commit 64e6d2d.

  • This patch merges cleanly.

@jkbradley
Copy link
Member

@MechCoder Thanks!

LGTM pending tests

CC: @mengxr

@SparkQA
Copy link

SparkQA commented Jan 9, 2015

Test build #25341 has finished for PR 3945 at commit 64e6d2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25341/
Test PASSed.

@asfgit asfgit closed this in 4554529 Jan 10, 2015
@mengxr
Copy link
Contributor

mengxr commented Jan 10, 2015

Merged into master. Thanks!

@MechCoder MechCoder deleted the spark-4406 branch January 10, 2015 06:09
@MechCoder
Copy link
Contributor Author

@jkbradley @mengxr Thanks for the quick reviews and merge. Looking to contribute more.

yaooqinn pushed a commit that referenced this pull request Aug 26, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue #3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue #4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue #3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue #3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue #4075](h2database/h2database#4075): infinite loop in compact
    - [Issue #4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue #4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR #3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR #3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR #3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR #3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR #3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR #3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR #3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue #3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR #3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue #3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR #3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR #2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR #2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR #2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <guow93@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
IvanK-db pushed a commit to IvanK-db/spark that referenced this pull request Sep 20, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue apache#3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue apache#4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue apache#3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue apache#3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue apache#4075](h2database/h2database#4075): infinite loop in compact
    - [Issue apache#4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue apache#4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR apache#3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR apache#3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR apache#3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR apache#3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR apache#3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR apache#3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR apache#3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue apache#3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR apache#3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue apache#3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR apache#3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR apache#2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR apache#2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR apache#2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <guow93@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue apache#3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue apache#4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue apache#3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue apache#3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue apache#4075](h2database/h2database#4075): infinite loop in compact
    - [Issue apache#4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue apache#4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR apache#3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR apache#3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR apache#3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR apache#3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR apache#3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR apache#3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR apache#3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue apache#3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR apache#3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue apache#3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR apache#3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR apache#2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR apache#2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR apache#2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <guow93@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…42.7.4 and `mssql` to 12.8.1.jre11

### What changes were proposed in this pull request?

This PR aims to upgrade `h2` to 2.3.232, `postgresql` to 42.7.4 and `mssql` to 12.8.1.jre11.

### Why are the changes needed?

1. For `h2`, there are some issues fixed in version 2.3.232(full release notes: https://www.h2database.com/html/changelog.html):

    - [Issue apache#3945](h2database/h2database#3945): Column not found in correlated subquery, when referencing outer column from LEFT JOIN .. ON clause
    - [Issue apache#4097](h2database/h2database#4097): StackOverflowException when using multiple SELECT statements in one query (2.3.230)
    - [Issue apache#3982](h2database/h2database#3982): Potential issue when using ROUND
    - [Issue apache#3894](h2database/h2database#3894): Race condition causing stale data in query last result cache
    - [Issue apache#4075](h2database/h2database#4075): infinite loop in compact
    - [Issue apache#4091](h2database/h2database#4091): Wrong case with linked table to postgresql
    - [Issue apache#4088](h2database/h2database#4088): BadGrammarException when the same alias is used within two different CTEs

2. For `postgresql`, there are some issues fixed and improvements in version 42.7.4(full release notes: https://jdbc.postgresql.org/changelogs/2024-08-22-42.7.4-release/):

    - fix: PgInterval ignores case for represented interval string [PR apache#3344](pgjdbc/pgjdbc#3344)
    - perf: Avoid extra copies when receiving int4 and int2 in PGStream [PR apache#3295](pgjdbc/pgjdbc#3295)
    - fix: Add support for Infinity::numeric values in ResultSet.getObject [PR apache#3304](pgjdbc/pgjdbc#3304)
    - fix: Ensure order of results for getDouble [PR apache#3301](pgjdbc/pgjdbc#3301)
    - perf: Replace BufferedOutputStream with unsynchronized PgBufferedOutputStream, allow configuring different Java and SO_SNDBUF buffer sizes [PR apache#3248](pgjdbc/pgjdbc#3248)
    - fix: Fix SSL tests [PR apache#3260](pgjdbc/pgjdbc#3260)
    - fix: Support bytea in preferQueryMode=simple [PR apache#3243](pgjdbc/pgjdbc#3243)
    - fix: Fix [Issue apache#3234](pgjdbc/pgjdbc#3234) - Return -1 as update count for stored procedure calls [PR apache#3235](pgjdbc/pgjdbc#3235)
    - fix: Fix [Issue apache#3224](pgjdbc/pgjdbc#3224) - conversion for TIME ‘24:00’ to LocalTime breaks in binary-mode [PR apache#3225](pgjdbc/pgjdbc#3225)

3. For `mssql`,  there are some issues fixed in 12.8.1.jre11(full release notes: https://github.com/microsoft/mssql-jdbc/releases/tag/v12.8.1):

    - Adjusted DESTINATION_COL_METADATA_LOCK, in SQLServerBulkCopy, so that is properly released in all cases [PR apache#2492](microsoft/mssql-jdbc#2492)
    - Reverted "Execute Stored Procedures Directly" feature, as well as subsequent changes related to the feature [PR apache#2493](microsoft/mssql-jdbc#2493)
    - Changed driver behavior to allow prepared statement objects to be reused, preventing a "multiple queries are not allowed" error [PR apache#2494](microsoft/mssql-jdbc#2494)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47810 from wayneguow/ug_h2.

Authored-by: Wei Guo <guow93@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants