Avoid one unnecessary memory allocation in XNNPACK integration. #35350

AshkanAliabadi · 2020-03-25T01:43:24Z

Stack from ghstack:

Add XNNPACK Clamp. #35896 [WIP] Add XNNPACK Clamp.
Enable stateless XNNPACK linear. #35791 Enable stateless XNNPACK linear.
Enable stateless XNNPACK convolutions. #35790 Enable stateless XNNPACK convolutions.
Support for XNNPACK max pooling operator. #35354 Support for XNNPACK max pooling operator.
Propagate input tensor names in XNNPACK backend. #35351 Propagate input tensor names in XNNPACK backend.
Avoid one unnecessary memory allocation in XNNPACK integration. #35350 Avoid one unnecessary memory allocation in XNNPACK integration.

Currently we call input.contiguous() on the input tensor resulting in an
unecessary allocation and copy in cases where the input is not contiguous
with regards to the requested memory format. The reason is that in such
scenarios, this call re-allocates and copies the input tensor into
contiguous storage, only for this newly allocated tensor to be used as
the source of another copy to the final destination. Instead, if we copy
into the destination directly in such circumstances, we will save an
allocation and a copy.

Differential Revision: D20656798

Currently we call input.contiguous() on the input tensor resulting in an unecessary allocation and copy in cases where the input is not contiguous with regards to the requested memory format. The reason is that in such scenarios, this call re-allocates and copies the input tensor into contiguous storage, only for this newly allocated tensor to be used as the source of another copy to the final destination. Instead, if we copy into the destination directly in such circumstances, we will save an allocation and a copy. [ghstack-poisoned]

Currently we call input.contiguous() on the input tensor resulting in an unecessary allocation and copy in cases where the input is not contiguous with regards to the requested memory format. The reason is that in such scenarios, this call re-allocates and copies the input tensor into contiguous storage, only for this newly allocated tensor to be used as the source of another copy to the final destination. Instead, if we copy into the destination directly in such circumstances, we will save an allocation and a copy. ghstack-source-id: 1c498a6e2287be38c252642a50f310676f258998 Pull Request resolved: #35350

dr-ci · 2020-03-25T01:45:12Z

💊 CircleCI build failures summary and remediations

As of commit ac58495 (more details on the Dr. CI page):

✅ None of the build failures appear to be your fault 💚

1/1 broken upstream at merge base 2a4ca70 since Apr 02
Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)
```
If your commit is older than viable/strict:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
```
Check out the recency history of this "viable master" tracking branch.

🚧 1 upstream failure:

These were probably caused by upstream breakages:

binary_linux_libtorch_3_6m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build since Apr 02

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 30 times.

AshkanAliabadi · 2020-03-25T01:47:33Z

Do we have any tests that I can use to validate that the results are correct? I'm not sure this code is exercised through CI.

…tion." Currently we call input.contiguous() on the input tensor resulting in an unecessary allocation and copy in cases where the input is not contiguous with regards to the requested memory format. The reason is that in such scenarios, this call re-allocates and copies the input tensor into contiguous storage, only for this newly allocated tensor to be used as the source of another copy to the final destination. Instead, if we copy into the destination directly in such circumstances, we will save an allocation and a copy. [ghstack-poisoned]

kimishpatel · 2020-03-25T15:31:11Z

Do we have any tests that I can use to validate that the results are correct? I'm not sure this code is exercised through CI.

You can try python test/test_xnnpack_integration.py. I suggest add, in one or two tests, trying different memory formats for inputs, including channels first and channels last.

kimishpatel

LG for the most part. Can you add the test I mentioned in the comment or try the existing tests?

…tion." Currently we call input.contiguous() on the input tensor resulting in an unecessary allocation and copy in cases where the input is not contiguous with regards to the requested memory format. The reason is that in such scenarios, this call re-allocates and copies the input tensor into contiguous storage, only for this newly allocated tensor to be used as the source of another copy to the final destination. Instead, if we copy into the destination directly in such circumstances, we will save an allocation and a copy. [ghstack-poisoned]

…tion." Currently we call input.contiguous() on the input tensor resulting in an unecessary allocation and copy in cases where the input is not contiguous with regards to the requested memory format. The reason is that in such scenarios, this call re-allocates and copies the input tensor into contiguous storage, only for this newly allocated tensor to be used as the source of another copy to the final destination. Instead, if we copy into the destination directly in such circumstances, we will save an allocation and a copy. Differential Revision: [D20656798](https://our.internmc.facebook.com/intern/diff/D20656798) [ghstack-poisoned]

facebook-github-bot · 2020-04-03T06:14:23Z

@AshkanAliabadi merged this pull request in d0ce94d.

AshkanAliabadi requested review from kimishpatel and dreiss March 25, 2020 01:46

This was referenced Mar 25, 2020

Propagate input tensor names in XNNPACK backend. #35351

Closed

Support for XNNPACK max pooling operator. #35354

Closed

kimishpatel requested changes Mar 25, 2020

View reviewed changes

Ashkan Aliabadi added 5 commits March 25, 2020 12:48

This was referenced Apr 1, 2020

Enable stateless XNNPACK convolutions. #35790

Closed

Enable stateless XNNPACK linear. #35791

Closed

dreiss approved these changes Apr 1, 2020

View reviewed changes

Ashkan Aliabadi added 2 commits April 1, 2020 13:51

AshkanAliabadi mentioned this pull request Apr 2, 2020

Add XNNPACK Clamp. #35896

Closed

Ashkan Aliabadi added 2 commits April 2, 2020 15:31

facebook-github-bot closed this in d0ce94d Apr 3, 2020

facebook-github-bot added the merged label Apr 3, 2020

facebook-github-bot deleted the gh/AshkanAliabadi/11/head branch April 6, 2020 14:17

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid one unnecessary memory allocation in XNNPACK integration. #35350

Avoid one unnecessary memory allocation in XNNPACK integration. #35350

AshkanAliabadi commented Mar 25, 2020 •

edited

Loading

dr-ci bot commented Mar 25, 2020 •

edited

Loading

AshkanAliabadi commented Mar 25, 2020

kimishpatel commented Mar 25, 2020

kimishpatel left a comment

facebook-github-bot commented Apr 3, 2020

Avoid one unnecessary memory allocation in XNNPACK integration. #35350

Avoid one unnecessary memory allocation in XNNPACK integration. #35350

Conversation

AshkanAliabadi commented Mar 25, 2020 • edited Loading

dr-ci bot commented Mar 25, 2020 • edited Loading

💊 CircleCI build failures summary and remediations

🚧 1 upstream failure:

AshkanAliabadi commented Mar 25, 2020

kimishpatel commented Mar 25, 2020

kimishpatel left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 3, 2020

AshkanAliabadi commented Mar 25, 2020 •

edited

Loading

dr-ci bot commented Mar 25, 2020 •

edited

Loading