[SPARK-18218][ML][MLLib] Reduce shuffled data size of BlockMatrix multiplication and solve potential OOM and low parallelism usage problem By split middle dimension in matrix multiplication #15730

WeichenXu123 · 2016-11-02T07:15:07Z

What changes were proposed in this pull request?

The problem in current block matrix mulitiplication

As in JIRA https://issues.apache.org/jira/browse/SPARK-18218 described, block matrix multiplication in spark may cause some problem, suppose we have M*N dimensions matrix A multiply N*P dimensions matrix B, when N is much larger than M and P, then the following problem may occur:

when the middle dimension N is too large, it will cause reducer OOM.
even if OOM do not occur, it will still cause parallism too low.
when N is much large than M and P, and matrix A and B have many partitions, it may cause too many partition on M and P dimension, it will cause much larger shuffled data size. (I will expain this in detail in the following.)

Key point of my improvement

In this PR, I introduce midDimSplitNum parameter, and improve the algorithm, to resolve this problem.

In order to understand the improvement in this PR, first let me give a simple case to explain how the current mulitiplication works and what cause the problems above:

suppose we have block matrix A, contains 200 blocks (2 numRowBlocks * 100 numColBlocks), blocks arranged in 2 rows, 100 cols:

A00 A01 A02 ... A0,99
A10 A11 A12 ... A1,99

and we have block matrix B, also contains 200 blocks (100 numRowBlocks * 2 numColBlocks), blocks arranged in 100 rows, 2 cols:

B00    B01
B10    B11
B20    B21
...
B99,0  B99,1

Suppose all blocks in the two matrices are dense for now.
Now we call A.multiply(B), suppose the generated resultPartitioner contains 2 rowPartitions and 2 colPartitions (can't be more partitions because the result matrix only contains 2 * 2 blocks), the current algorithm will contains two shuffle steps:

step-1
Step-1 will generate 4 reducer, I tag them as reducer-00, reducer-01, reducer-10, reducer-11, and shuffle data as following:

A00 A01 A02 ... A0,99
B00 B10 B20 ... B99,0    shuffled into reducer-00

A00 A01 A02 ... A0,99
B01 B11 B21 ... B99,1    shuffled into reducer-01

A10 A11 A12 ... A1,99
B00 B10 B20 ... B99,0    shuffled into reducer-10

A10 A11 A12 ... A1,99
B01 B11 B21 ... B99,1    shuffled into reducer-11

and the shuffling above is a cogroup transform, note that each reducer contains only one group.

step-2
Step-2 will do an aggregateByKey transform on the result of step-1, will also generate 4 reducers, and generate the final result RDD, contains 4 partitions, each partition contains one block.

The main problems are in step-1. Now we have only 4 reducers, but matrix A and B have 400 blocks in total, obviously the reducer number is too small.
and, we can see that, each reducer contains only one group(the group concept in coGroup transform), each group contains 200 blocks. This is terrible because we know that coGroup transformer will load each group into memory when computing. It is un-extensable in the algorithm level. Suppose matrix A has 10000 cols blocks or more instead of 100? Than each reducer will load 20000 blocks into memory. It will easily cause reducer OOM.

This PR try to resolve the problem described above.
When matrix A with dimension M * N multiply matrix B with dimension N * P, the middle dimension N is the keypoint. If N is large, the current mulitiplication implementation works badly.
In this PR, I introduce a numMidDimSplits parameter, represent how many splits it will cut on the middle dimension N.
Still using the example described above, now we set numMidDimSplits = 10, now we can generate 40 reducers in step-1:

the reducer-ij above now will be splited into 10 reducers: reducer-ij0, reducer-ij1, ... reducer-ij9, each reducer will receive 20 blocks.
now the shuffle works as following:

reducer-000 to reducer-009

A0,0 A0,10 A0,20 ... A0,90
B0,0 B10,0 B20,0 ... B90,0    shuffled into reducer-000

A0,1 A0,11 A0,21 ... A0,91
B1,0 B11,0 B21,0 ... B91,0    shuffled into reducer-001

A0,2 A0,12 A0,22 ... A0,92
B2,0 B12,0 B22,0 ... B92,0    shuffled into reducer-002

...

A0,9 A0,19 A0,29 ... A0,99
B9,0 B19,0 B29,0 ... B99,0    shuffled into reducer-009

reducer-010 to reducer-019

A0,0 A0,10 A0,20 ... A0,90
B0,1 B10,1 B20,1 ... B90,1    shuffled into reducer-010

A0,1 A0,11 A0,21 ... A0,91
B1,1 B11,1 B21,1 ... B91,1    shuffled into reducer-011

A0,2 A0,12 A0,22 ... A0,92
B2,1 B12,1 B22,1 ... B92,1    shuffled into reducer-012

...

A0,9 A0,19 A0,29 ... A0,99
B9,1 B19,1 B29,1 ... B99,1    shuffled into reducer-019

reducer-100 to reducer-109 and reducer-110 to reducer-119 is similar to the above, I omit to write them out.

API for this optimized algorithm

I add a new API as following:

  def multiply(
      other: BlockMatrix,
      numMidDimSplits: Int // middle dimension split number, expained above
): BlockMatrix

Shuffled data size analysis (compared under the same parallelism)

The optimization has some subtle influence on the total shuffled data size. Appropriate numMidDimSplits will significantly reduce the shuffled data size,
but too large numMidDimSplits may increase the shuffled data in reverse. For now I don't want to introduce formula to make thing too complex, I only use a simple case to represent it here:

Suppose we have two same size square matrices X and Y, both have 16 numRowBlocks * 16 numColBlocks. X and Y are both dense matrix. Now let me analysis the shuffling data size in the following case:

case 1: X and Y both partitioned in 16 rowPartitions and 16 colPartitions, numMidDimSplits = 1
ShufflingDataSize = (16 * 16 * (16 + 16) + 16 * 16) blocks = 8448 blocks
parallelism = 16 * 16 * 1 = 256 //use step-1 reducers number as the parallism because it cost most of the computation time in this algorithm.

case 2: X and Y both partitioned in 8 rowPartitions and 8 colPartitions, numMidDimSplits = 4
ShufflingDataSize = (8 * 8 * (32 + 32) + 16 * 16 * 4) blocks = 5120 blocks
parallelism = 8 * 8 * 4 = 256 //use step-1 reducers number as the parallism because it cost most of the computation time in this algorithm.

The two cases above all have parallism = 256, case 1 numMidDimSplits = 1 is equivalent with current implementation in mllib, but case 2 shuffling data is 60.6% of case 1, it shows that under the same parallelism, proper numMidDimSplits will significantly reduce the shuffling data size.

How was this patch tested?

Test suites added.
Running result:

WeichenXu123 · 2016-11-02T07:24:56Z

@yanboliang @sethah I will be very pleasure to hear your opinions, thanks! This optimization may also bring some inspirations to a series of algos based on distributed matrix.

SparkQA · 2016-11-02T08:18:57Z

Test build #67969 has finished for PR 15730 at commit a6bc12d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BlockMatrix @Since(\"2.1.0\") (
- trait DistributedMatrix extends Serializable

SparkQA · 2016-11-02T09:46:07Z

Test build #67977 has finished for PR 15730 at commit 4324a52.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sethah · 2016-11-02T16:04:06Z

ping @brkyvz who wrote these libraries.

brkyvz · 2016-11-18T18:45:46Z

Hi @WeichenXu123 Thank you for this PR. Sorry for taking so long to get back to you. Your optimization would be very helpful. I have a couple thoughts though. Your examples always take into account fully dense matrices, i.e. that all blocks exist all the time. How would sparsity affect shuffling? Would there ever be a case where sparsity of blocks and unlucky alignment of blocks could actually cause a lot more shuffling with your parameter?

Nevertheless, I can see fully dense matrix multiplications benefitting significantly from your optimization. I guess we will need to work on the APIs a bit and document it a bit more clearly.

WeichenXu123 · 2016-11-19T14:06:27Z

@brkyvz

Good question about the shuffling data in the sparse case. Now I give some simple analysis on it (maybe not very strict):
as discussed above, the shuffling data contains step-1 and step-2,
if we keep parallelism the same, we need to increase the midDimSplitNum and decrease ShuffleRowPartitions and ShuffleColPartitions, in such case, we can make sure that step-1 shuffling data will always reduce, and step-2 shuffling data will always increase.
Now let us consider the sparse case, the step-2 shuffling data increasing because when each pair of left-matrix row-sets multiplying right-matrix col-sets, we split it into multiple parts (midDimSplitNum parts), in each parts we do the multiplying and we need to aggregate all parts together, the more parts need to be aggregate, the more shuffling data it needs. BUT, in sparse case, these parts(after multiplying) will be empty in high probability, so to those empty parts it shuffle nothing. So, we can expect that in sparse case, step-2 shuffling data won't increase much rather than the midDimSplitNum = 1 case, because most split parts will shuffle nothing.

Now I am considering improve the API interface to make it easier to use.
I would like to let user specified the parallism, and the algorithm automatically calculate the optimal midDimSplitNum, what do you think about it ?

And thanks for careful review!

brkyvz · 2016-11-28T19:48:17Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

you shouldn't change the signature of a public method and still call it @Since(1.3.0). Maybe move the inside of this to

private def multiplyImpl( other: BlockMatrix, shufflePartitioner: GridPartitioner, midDimPartNum: Int, resultPartitioner: GridPartitioner): BlockMatrix

but keep the

@Since("1.3.0") def multiply(other: BlockMatrix): BlockMatrix = { multiplyImpl(other, ...) }

brkyvz · 2016-11-28T19:50:42Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

I guess this is not used?

brkyvz · 2016-11-28T20:04:49Z

@WeichenXu123 How about if we only add:

def multiply(other: BlockMatrix, numMidDimSplits: Integer): BlockMatrix

as the public API. Users shouldn't have to define GridPartitioners themselves.

brkyvz · 2016-11-28T20:07:09Z

This way, all that needs to change in the implementation of multiply is:

val intermediatePartitioner = new Partitioner {
  override def numPartitions: Int = resultPartitioner.numPartitions * numMidDimSplits
  override def getPartition(key: Any): Int = key.asInstanceOf[Int]
}
val newBlocks = flatA.cogroup(flatB, intermediatePartitioner).flatMap { case (pId, (a, b)) =>

WeichenXu123 · 2016-12-11T13:14:48Z

@brkyvz All right, I'll update code ASAP. Thanks!

SparkQA · 2016-12-25T16:41:17Z

Test build #70574 has finished for PR 15730 at commit 13ccfff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-01-03T06:19:48Z

@brkyvz I update code and attach a running result screenshot, waiting for your review, thanks!

brkyvz · 2017-01-03T06:50:09Z

@WeichenXu123 Thanks! Will take a look once I get back from vacation (in a week). Happy new year!

brkyvz · 2017-01-09T17:46:35Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

brkyvz · 2017-01-09T17:54:11Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

Could you please add documentation for this? This is the most important part about this PR, understanding how this parameter improves performance. You may copy most of your PR description

brkyvz · 2017-01-09T17:59:31Z

mllib/src/test/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrixSuite.scala

I would rather write a test function testMultiply which takes A, B, and expected C and make numSplits configurable. Here you can simply have

testMultiply(largeA, largeB, largeC, 1) testMultiply(largeA, largeB, largeC, 2) testMultiply(largeA, largeB, largeC, 3) testMultiply(largeA, largeB, largeC, 4)

brkyvz · 2017-01-09T17:59:58Z

Looks pretty good overall. Left one major comment about documentation and one about tests.

WeichenXu123 · 2017-01-11T15:49:38Z

@brkyvz Done. Thanks!

SparkQA · 2017-01-11T16:38:36Z

Test build #71219 has finished for PR 15730 at commit bf2a603.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-01-12T03:08:15Z

Jenkins, test this please.

SparkQA · 2017-01-12T04:16:54Z

Test build #71242 has finished for PR 15730 at commit bf2a603.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123 · 2017-01-13T11:51:36Z

cc @yanboliang Thanks!

brkyvz · 2017-01-13T17:47:03Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

+  }
+
+  /**
+   * Left multiplies this [[BlockMatrix]] to `other`, another [[BlockMatrix]]. This method add


Ok, I feel this is very verbose in terms of documentation. Can we summarize it somehow?

/** * Left multiplies this [[BlockMatrix]] to `other`, another [[BlockMatrix]]. The `colsPerBlock` * of this matrix must equal the `rowsPerBlock` of `other`. If `other` contains * `SparseMatrix`, they will have to be converted to a `DenseMatrix`. The output * [[BlockMatrix]] will only consist of blocks of `DenseMatrix`. This may cause * some performance issues until support for multiplying two sparse matrices is added. * Blocks with duplicate indices will be added with each other. * * @param other Matrix `B` in `A * B = C` * @param numMidDimSplits Number of splits to cut on the middle dimension when doing multiplication. For example, when multiplying a Matrix `A` of size `m x n` with Matrix `B` of size `n x k`, this parameter configures the parallelism to use when grouping the matrices. The parallelism will increase from `m x k` to `m x k x numMidDimSplits`, which in some cases also reduces total shuffled data.

Wrote out a sketch, please put in proper format, i.e. omitted * on the last lines

All right, thanks!

SparkQA · 2017-01-15T08:04:04Z

Test build #71390 has finished for PR 15730 at commit 55dabe0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-01-17T20:03:16Z

@WeichenXu123 This LGTM thanks. @mengxr Would you also like to take a look?

jkbradley · 2017-01-19T18:57:06Z

The API looks good to me. I have not reviewed the internals carefully.

One comment: Let's add a check to verify that numMidDimSplits is > 0.

SparkQA · 2017-01-20T02:53:50Z

Test build #71691 has finished for PR 15730 at commit 3feca58.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-01-26T18:47:10Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

      s"of B must be equal. A.numCols: ${numCols()}, B.numRows: ${other.numRows()}. If you " +
      "think they should be equal, try setting the dimensions of A and B explicitly while " +
      "initializing them.")
+    require(numMidDimSplits > 0, "numMidDimSplits should be positive value.")


ultra nit: should be a positive value or should be a positive integer

brkyvz · 2017-01-26T18:58:15Z

@WeichenXu123 This LGTM! Thank you for providing this functionality. I left one final comment then I'll merge this

WeichenXu123 · 2017-01-27T02:08:28Z

@brkyvz Also thanks for your careful code review! ^_^

SparkQA · 2017-01-27T03:11:00Z

Test build #72063 has finished for PR 15730 at commit 4edcd74.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

brkyvz · 2017-01-27T04:10:02Z

Merging to master! Thanks!

…tiplication and solve potential OOM and low parallelism usage problem By split middle dimension in matrix multiplication ## What changes were proposed in this pull request? ### The problem in current block matrix mulitiplication As in JIRA https://issues.apache.org/jira/browse/SPARK-18218 described, block matrix multiplication in spark may cause some problem, suppose we have `M*N` dimensions matrix A multiply `N*P` dimensions matrix B, when N is much larger than M and P, then the following problem may occur: - when the middle dimension N is too large, it will cause reducer OOM. - even if OOM do not occur, it will still cause parallism too low. - when N is much large than M and P, and matrix A and B have many partitions, it may cause too many partition on M and P dimension, it will cause much larger shuffled data size. (I will expain this in detail in the following.) ### Key point of my improvement In this PR, I introduce `midDimSplitNum` parameter, and improve the algorithm, to resolve this problem. In order to understand the improvement in this PR, first let me give a simple case to explain how the current mulitiplication works and what cause the problems above: suppose we have block matrix A, contains 200 blocks (`2 numRowBlocks * 100 numColBlocks`), blocks arranged in 2 rows, 100 cols: ``` A00 A01 A02 ... A0,99 A10 A11 A12 ... A1,99 ``` and we have block matrix B, also contains 200 blocks (`100 numRowBlocks * 2 numColBlocks`), blocks arranged in 100 rows, 2 cols: ``` B00 B01 B10 B11 B20 B21 ... B99,0 B99,1 ``` Suppose all blocks in the two matrices are dense for now. Now we call A.multiply(B), suppose the generated `resultPartitioner` contains 2 rowPartitions and 2 colPartitions (can't be more partitions because the result matrix only contains `2 * 2` blocks), the current algorithm will contains two shuffle steps: **step-1** Step-1 will generate 4 reducer, I tag them as reducer-00, reducer-01, reducer-10, reducer-11, and shuffle data as following: ``` A00 A01 A02 ... A0,99 B00 B10 B20 ... B99,0 shuffled into reducer-00 A00 A01 A02 ... A0,99 B01 B11 B21 ... B99,1 shuffled into reducer-01 A10 A11 A12 ... A1,99 B00 B10 B20 ... B99,0 shuffled into reducer-10 A10 A11 A12 ... A1,99 B01 B11 B21 ... B99,1 shuffled into reducer-11 ``` and the shuffling above is a `cogroup` transform, note that each reducer contains **only one group**. **step-2** Step-2 will do an `aggregateByKey` transform on the result of step-1, will also generate 4 reducers, and generate the final result RDD, contains 4 partitions, each partition contains one block. The main problems are in step-1. Now we have only 4 reducers, but matrix A and B have 400 blocks in total, obviously the reducer number is too small. and, we can see that, each reducer contains only one group(the group concept in `coGroup` transform), each group contains 200 blocks. This is terrible because we know that `coGroup` transformer will load each group into memory when computing. It is un-extensable in the algorithm level. Suppose matrix A has 10000 cols blocks or more instead of 100? Than each reducer will load 20000 blocks into memory. It will easily cause reducer OOM. This PR try to resolve the problem described above. When matrix A with dimension M * N multiply matrix B with dimension N * P, the middle dimension N is the keypoint. If N is large, the current mulitiplication implementation works badly. In this PR, I introduce a `numMidDimSplits` parameter, represent how many splits it will cut on the middle dimension N. Still using the example described above, now we set `numMidDimSplits = 10`, now we can generate 40 reducers in **step-1**: the reducer-ij above now will be splited into 10 reducers: reducer-ij0, reducer-ij1, ... reducer-ij9, each reducer will receive 20 blocks. now the shuffle works as following: **reducer-000 to reducer-009** ``` A0,0 A0,10 A0,20 ... A0,90 B0,0 B10,0 B20,0 ... B90,0 shuffled into reducer-000 A0,1 A0,11 A0,21 ... A0,91 B1,0 B11,0 B21,0 ... B91,0 shuffled into reducer-001 A0,2 A0,12 A0,22 ... A0,92 B2,0 B12,0 B22,0 ... B92,0 shuffled into reducer-002 ... A0,9 A0,19 A0,29 ... A0,99 B9,0 B19,0 B29,0 ... B99,0 shuffled into reducer-009 ``` **reducer-010 to reducer-019** ``` A0,0 A0,10 A0,20 ... A0,90 B0,1 B10,1 B20,1 ... B90,1 shuffled into reducer-010 A0,1 A0,11 A0,21 ... A0,91 B1,1 B11,1 B21,1 ... B91,1 shuffled into reducer-011 A0,2 A0,12 A0,22 ... A0,92 B2,1 B12,1 B22,1 ... B92,1 shuffled into reducer-012 ... A0,9 A0,19 A0,29 ... A0,99 B9,1 B19,1 B29,1 ... B99,1 shuffled into reducer-019 ``` **reducer-100 to reducer-109** and **reducer-110 to reducer-119** is similar to the above, I omit to write them out. ### API for this optimized algorithm I add a new API as following: ``` def multiply( other: BlockMatrix, numMidDimSplits: Int // middle dimension split number, expained above ): BlockMatrix ``` ### Shuffled data size analysis (compared under the same parallelism) The optimization has some subtle influence on the total shuffled data size. Appropriate `numMidDimSplits` will significantly reduce the shuffled data size, but too large `numMidDimSplits` may increase the shuffled data in reverse. For now I don't want to introduce formula to make thing too complex, I only use a simple case to represent it here: Suppose we have two same size square matrices X and Y, both have `16 numRowBlocks * 16 numColBlocks`. X and Y are both dense matrix. Now let me analysis the shuffling data size in the following case: **case 1: X and Y both partitioned in 16 rowPartitions and 16 colPartitions, numMidDimSplits = 1** ShufflingDataSize = (16 * 16 * (16 + 16) + 16 * 16) blocks = 8448 blocks parallelism = 16 * 16 * 1 = 256 //use step-1 reducers number as the parallism because it cost most of the computation time in this algorithm. **case 2: X and Y both partitioned in 8 rowPartitions and 8 colPartitions, numMidDimSplits = 4** ShufflingDataSize = (8 * 8 * (32 + 32) + 16 * 16 * 4) blocks = 5120 blocks parallelism = 8 * 8 * 4 = 256 //use step-1 reducers number as the parallism because it cost most of the computation time in this algorithm. **The two cases above all have parallism = 256**, case 1 `numMidDimSplits = 1` is equivalent with current implementation in mllib, but case 2 shuffling data is 60.6% of case 1, **it shows that under the same parallelism, proper `numMidDimSplits` will significantly reduce the shuffling data size**. ## How was this patch tested? Test suites added. Running result: ![blockmatrix](https://cloud.githubusercontent.com/assets/19235986/21600989/5e162cc2-d1bf-11e6-868c-0ec29190b605.png) Author: WeichenXu <WeichenXu123@outlook.com> Closes apache#15730 from WeichenXu123/optim_block_matrix.

WeichenXu123 changed the title ~~[SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplication and move it into ml package~~ [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplication Nov 2, 2016

WeichenXu123 changed the title ~~[SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplication~~ [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplication which may cause OOM and low parallelism usage problem in several cases Nov 2, 2016

brkyvz reviewed Nov 28, 2016

View reviewed changes

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala Outdated

Copy link

Contributor

brkyvz Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not used?

WeichenXu123 force-pushed the optim_block_matrix branch from 4324a52 to 13ccfff Compare December 25, 2016 15:41

brkyvz reviewed Jan 9, 2017

View reviewed changes

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala Outdated

Copy link

Contributor

brkyvz Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.2.0

brkyvz reviewed Jan 9, 2017

View reviewed changes

WeichenXu123 added 4 commits January 11, 2017 04:48

update.

34216a8

update not move to ml

be420cb

update API

d80870c

update comments & tests

bf2a603

WeichenXu123 force-pushed the optim_block_matrix branch from 13ccfff to bf2a603 Compare January 11, 2017 15:47

brkyvz reviewed Jan 13, 2017

View reviewed changes

summarize document

55dabe0

add numMidDimSplits > 0 check

3feca58

update

4edcd74

brkyvz reviewed Jan 26, 2017

View reviewed changes

asfgit closed this in 1191fe2 Jan 27, 2017

hyu-leeky mentioned this pull request Jun 16, 2017

스파크의 BlockMatrix.multiply 가 어떻게 동작하는지 이해하고 설명하기 ddps-lab/distributed-matrix-completion#5

Closed

WeichenXu123 deleted the optim_block_matrix branch January 20, 2018 01:32

[SPARK-18218][ML][MLLib] Reduce shuffled data size of BlockMatrix multiplication and solve potential OOM and low parallelism usage problem By split middle dimension in matrix multiplication #15730

[SPARK-18218][ML][MLLib] Reduce shuffled data size of BlockMatrix multiplication and solve potential OOM and low parallelism usage problem By split middle dimension in matrix multiplication #15730

Uh oh!

Conversation

WeichenXu123 commented Nov 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

The problem in current block matrix mulitiplication

Key point of my improvement

API for this optimized algorithm

Shuffled data size analysis (compared under the same parallelism)

How was this patch tested?

Uh oh!

WeichenXu123 commented Nov 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Nov 2, 2016

Uh oh!

SparkQA commented Nov 2, 2016

Uh oh!

sethah commented Nov 2, 2016

Uh oh!

brkyvz commented Nov 18, 2016

Uh oh!

WeichenXu123 commented Nov 19, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brkyvz commented Nov 28, 2016

Uh oh!

brkyvz commented Nov 28, 2016

Uh oh!

WeichenXu123 commented Dec 11, 2016

Uh oh!

SparkQA commented Dec 25, 2016

Uh oh!

WeichenXu123 commented Jan 3, 2017

Uh oh!

brkyvz commented Jan 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brkyvz commented Jan 9, 2017

Uh oh!

WeichenXu123 commented Jan 11, 2017

Uh oh!

SparkQA commented Jan 11, 2017

Uh oh!

WeichenXu123 commented Jan 12, 2017

Uh oh!

SparkQA commented Jan 12, 2017

Uh oh!

WeichenXu123 commented Jan 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

brkyvz commented Jan 17, 2017

Uh oh!

jkbradley commented Jan 19, 2017

Uh oh!

SparkQA commented Jan 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brkyvz commented Jan 26, 2017

Uh oh!

WeichenXu123 commented Jan 27, 2017

Uh oh!

SparkQA commented Jan 27, 2017

Uh oh!

WeichenXu123 commented Nov 2, 2016 •

edited

Loading

WeichenXu123 commented Nov 2, 2016 •

edited

Loading