Add Large Dim Checks for linalg Operators #18816

Zha0q1 · 2020-07-29T04:17:52Z

Add Large Dim Checks for linalg Operators. Although external blas libraries support large tensors (>2^32 sized), large dimensions (>= 2^31) will trigger an openblas int overflow error under current configuration. This PR adds checks to exit those use cases properly.

Done:

gemm and gemm2
trmm and trms
syrk
gelqf

TODO:
All done. The rest of operators take square matrices as inputs, which cannot possibly have large dimensions (>= 2^31) because of memory constrains. For example a 2^32 * 2*31 float matrix will take up 2^24 TB of memory!

ubuntu@ip-172-31-6-47:~/mx/incubator-mxnet/build$  nosetests --logging-level=DEBUG --verbose -s ../tests/nightly/test_large_array.py:test_linalg_large_dim
test_large_array.test_linalg_large_dim ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…alg_large_dim_check

mxnet-bot · 2020-07-29T04:17:57Z

Hey @Zha0q1 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, centos-cpu, windows-cpu, unix-cpu, sanity, centos-gpu, website, edge, clang, windows-gpu, miscellaneous]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

src/operator/tensor/la_op.h

ChaiBapchya · 2020-07-30T00:13:31Z

tests/nightly/test_large_array.py

@@ -1350,6 +1351,50 @@ def run_trsm(inp):
    check_batch_trsm()


+def test_linalg_large_dim():
+    def check_gemm():
+        A = nd.ones(shape=(1, INT32_MAX + 1, 1))


this should go to test_large_vector since the input contains 1 dimension which has large while rest dimensions are small. @access2rohit plz confirm

Name of the file can be made better.
Basically the idea was to have 2 separate files

test_large_array.py [more like test_large_size.py]
testing input whose individual dimensions are less than 2**32 but size of the input is > 2**32

test_large_vector.py [more like test_large_shape.py]
testing input whose atleast 1 individual dimensions is > 2**32

Maybe we should make more explicit comments on what those test do? I can do so in my next commit. I still think the two cases both fall into the same category which is testing with tensors of large dimensions.

In comments I would say like 1. those tests are for overflowing total size 2. those other tests are for overflowing index calculation i.e. row dim, col dim, etc

From consistency standpoint, I'd

put these tests in test_large_vector.py file.

Rename that file to [whatever sounds right I just gave a suggestion above]

add a comment in that file.

All dimensions in test_large_array.py are less than INT32_MAX
Large dimension [>2**32] was introduced in test_large_vector.py for the same reason.

So to keep the testing approach consistent I'd do that.

Even if both files play with "Large tensors" one does it for large "size" other specifically for large "shape".

well one thing to note is that they are not vectors per se. The inputs are all 3D whereas in test_large_vector they are all 1D. The dim checks happen on both row and col dim so you can see I used both (1, 1, x) and (1, x, 1).

I will add more comments in next commit

Yes I know they aren't "vectors" and hence recommended "renaming the file"

If we keep these tests in this file, it defeats the purpose of few tests in test_large_vector.py

https://github.com/apache/incubator-mxnet/blob/6bbd53107aa16fc41e8d462cf5dc46fb70d592df/tests/nightly/test_large_vector.py#L99-L126

@Zha0q1 the vector file generally houses tests for operators with a single dimension that exceeds 2^32 range. Please address what is suggested by @ChaiBapchya and move it to vector tests. Feel free to rename the file to test_large_dimensions.py

Yeah that would make sense. I have moved the tests to test_large_vector.py, which I kept the original name to avoid naming discrepancy with master. Also comments were added to the tests

ChaiBapchya

Functionality-wise looks good to me. Had other thoughts about "where" this test should be placed. Feel free to disagree & merge.
Looks good other than that. Thanks!

Zha0q1 · 2020-07-30T00:50:03Z

Functionality-wise looks good to me. Had other thoughts about "where" this test should be placed. Feel free to disagree & merge.
Looks good other than that. Thanks!

Thanks!

…tor-mxnet into add_linalg_large_dim_check

Ubuntu added 4 commits July 29, 2020 00:30

initial

447e204

Merge branch 'v1.x' of github.com:apache/incubator-mxnet into add_lin…

42d0bc7

…alg_large_dim_check

test

235d04d

gemm and gemm2

a3844cb

Zha0q1 commented Jul 29, 2020

View reviewed changes

src/operator/tensor/la_op.h Outdated Show resolved Hide resolved

type fix

36a74c9

access2rohit reviewed Jul 29, 2020

View reviewed changes

src/operator/tensor/la_op.h Show resolved Hide resolved

Ubuntu added 2 commits July 29, 2020 23:42

syrk trmm trsm

9d7a6e8

gelqf

465644f

Zha0q1 changed the title ~~[WIP] Add Large Dim Checks for linalg Operators~~ Add Large Dim Checks for linalg Operators Jul 30, 2020

ChaiBapchya reviewed Jul 30, 2020

View reviewed changes

josephevans approved these changes Jul 30, 2020

View reviewed changes

ChaiBapchya approved these changes Jul 30, 2020

View reviewed changes

Ubuntu and others added 5 commits July 30, 2020 17:12

move tests from test_large_array.py to test_large_vector.py

3996c53

Merge branch 'v1.x' into add_linalg_large_dim_check

377d866

fix white space issue

ca8f4df

Merge branch 'add_linalg_large_dim_check' of github.com:Zha0q1/incuba…

3e3c666

…tor-mxnet into add_linalg_large_dim_check

Merge branch 'v1.x' into add_linalg_large_dim_check

b8682a0

access2rohit approved these changes Jul 30, 2020

View reviewed changes

sandeep-krishnamurthy merged commit f4e62df into apache:v1.x Jul 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Large Dim Checks for linalg Operators #18816

Add Large Dim Checks for linalg Operators #18816

Zha0q1 commented Jul 29, 2020 •

edited

Loading

mxnet-bot commented Jul 29, 2020

ChaiBapchya Jul 30, 2020

ChaiBapchya Jul 30, 2020 •

edited

Loading

Zha0q1 Jul 30, 2020 •

edited

Loading

Zha0q1 Jul 30, 2020

ChaiBapchya Jul 30, 2020 •

edited

Loading

Zha0q1 Jul 30, 2020 •

edited

Loading

ChaiBapchya Jul 30, 2020

ChaiBapchya Jul 30, 2020

access2rohit Jul 30, 2020 •

edited

Loading

Zha0q1 Jul 30, 2020

ChaiBapchya left a comment

Zha0q1 commented Jul 30, 2020

Add Large Dim Checks for linalg Operators #18816

Add Large Dim Checks for linalg Operators #18816

Conversation

Zha0q1 commented Jul 29, 2020 • edited Loading

Checklist

Essentials

Changes

Comments

mxnet-bot commented Jul 29, 2020

ChaiBapchya Jul 30, 2020

Choose a reason for hiding this comment

ChaiBapchya Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Zha0q1 Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Zha0q1 Jul 30, 2020

Choose a reason for hiding this comment

ChaiBapchya Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Zha0q1 Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

ChaiBapchya Jul 30, 2020

Choose a reason for hiding this comment

ChaiBapchya Jul 30, 2020

Choose a reason for hiding this comment

access2rohit Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

Zha0q1 Jul 30, 2020

Choose a reason for hiding this comment

ChaiBapchya left a comment

Choose a reason for hiding this comment

Zha0q1 commented Jul 30, 2020

Zha0q1 commented Jul 29, 2020 •

edited

Loading

ChaiBapchya Jul 30, 2020 •

edited

Loading

Zha0q1 Jul 30, 2020 •

edited

Loading

ChaiBapchya Jul 30, 2020 •

edited

Loading

Zha0q1 Jul 30, 2020 •

edited

Loading

access2rohit Jul 30, 2020 •

edited

Loading