Skip to content

Adding ROIAlign backwards for CPU #504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

xssChauhan
Copy link

Adding ROIAlign backwards implementation for CPU.

Implemented using vision's CUDA implementation for the same purpose.

Currently the layers branch is not compiling.

@fmassa
Copy link
Member

fmassa commented May 17, 2018

Thanks for the PR!

I believe we still need to modify this file in order for the CPU dispatch to work?

@xssChauhan
Copy link
Author

@fmassa My bad! Fixing this.

@xssChauhan
Copy link
Author

@fmassa Fixed the python interface

@sampepose
Copy link

Can you make sure flake8 runs successfully on your code? CI is failing since there are some linter errors.

@fmassa
Copy link
Member

fmassa commented May 19, 2018

@sampepose I believe the flake8 issues are on my end. I need to fix them before merging the layers branch into master.
And we don't currently have a linter for C++ in torchvision, so this should be fine.

@fmassa
Copy link
Member

fmassa commented May 19, 2018

@xssChauhan could you please write a small python file that tests that the gradients are indeed computed correctly?
For that, you can use PyTorch torch.autograd.gradcheck. Also, when running the code, make sure that you are using double tensors - if you use float tensors, you'll have problems because of the lack of precision for finite differences differentiation that gradcheck uses to compare the gradients.

Once we know that the gradcheck is passing, I'll merge this patch.

Thanks!

@xssChauhan
Copy link
Author

@fmassa Will do so

@xssChauhan
Copy link
Author

xssChauhan commented May 22, 2018

@fmassa layers branch is currently not compiling.

Here's the error:

$ python setup.py install                                                                                                                                                                                   
running install                                                                                                                                                                                             
running bdist_egg                                                                                                                                                                                           
running egg_info                                                                                                                                                                                            
writing torchvision.egg-info/PKG-INFO                                                                                                                                                                       
writing dependency_links to torchvision.egg-info/dependency_links.txt                                                                                                                                       
writing requirements to torchvision.egg-info/requires.txt 
writing top-level names to torchvision.egg-info/top_level.txt                                         
reading manifest file 'torchvision.egg-info/SOURCES.txt'                                              
reading manifest template 'MANIFEST.in'            
warning: no previously-included files matching '__pycache__' found under directory '*'                
warning: no previously-included files matching '*.py[co]' found under directory '*'                   
writing manifest file 'torchvision.egg-info/SOURCES.txt'                                              
installing library code to build/bdist.linux-x86_64/egg                                               
running install_lib                                
running build_py                                   
running build_ext                                  
building 'torchvision._C' extension                
gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/home/schauhan/vision/torchvision/csrc -I/home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include -I/home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/TH -I/home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/include/python3.6m -c /home/schauhan/vision/torchvision/csrc/vision.cpp -o build/temp.linux-x86_64-3.6/home/schauhan/vision/torchvision/csrc/vision.o -DTORCH_EXTENSION_NAME=torchvision._C -std=c++11                                                  
In file included from /home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:12:0,                                                                                      
                 from /home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/cast.h:13,                                                                                           
                 from /home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/attr.h:13,                                                                                           
                 from /home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/pybind11.h:43,                                                                                       
                 from /home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/torch/torch.h:6,                                                                                              
                 from /home/schauhan/vision/torchvision/csrc/cpu/vision.h:2,                          
                 from /home/schauhan/vision/torchvision/csrc/nms.h:2,                                 
                 from /home/schauhan/vision/torchvision/csrc/vision.cpp:1:                            
<command-line>:0:33: error: expected initializer before _._ token                                     
/home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/detail/common.h:212:47: note: in definition of macro _PYBIND11_CONCAT_                                                     
 #define PYBIND11_CONCAT(first, second) first##second                                                 
                                               ^~~~~~                                                 
/home/schauhan/vision/torchvision/csrc/vision.cpp:6:1: note: in expansion of macro _PYBIND11_MODULE_  
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {        
 ^~~~~~~~~~~~~~~                                   
/home/schauhan/vision/torchvision/csrc/vision.cpp:6:17: note: in expansion of macro _TORCH_EXTENSION_NAME_                                                                                                  
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {        
                 ^~~~~~~~~~~~~~~~~~~~              
<command-line>:0:33: error: expected initializer before _._ token                                     
/home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/detail/common.h:171:51: note: in definition of macro _PYBIND11_PLUGIN_IMPL_                                                
     extern "C" PYBIND11_EXPORT PyObject *PyInit_##name()                                             
                                                   ^~~~                                               
/home/schauhan/vision/torchvision/csrc/vision.cpp:6:1: note: in expansion of macro _PYBIND11_MODULE_  
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {        
 ^~~~~~~~~~~~~~~                                   
/home/schauhan/vision/torchvision/csrc/vision.cpp:6:17: note: in expansion of macro _TORCH_EXTENSION_NAME_                                                                                                  
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {        
                 ^~~~~~~~~~~~~~~~~~~~              
<command-line>:0:33: error: expected initializer before _._ token                                     
/home/schauhan/vision/env/lib/python3.6/site-packages/torch/lib/include/pybind11/detail/common.h:212:47: note: in definition of macro _PYBIND11_CONCAT_                                                     
 #define PYBIND11_CONCAT(first, second) first##second                                                 
                                               ^~~~~~                                                 
/home/schauhan/vision/torchvision/csrc/vision.cpp:6:1: note: in expansion of macro _PYBIND11_MODULE_  
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {        
 ^~~~~~~~~~~~~~~                                   
/home/schauhan/vision/torchvision/csrc/vision.cpp:6:17: note: in expansion of macro _TORCH_EXTENSION_NAME_                                                                                                  
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {        
                 ^~~~~~~~~~~~~~~~~~~~              
error: command 'gcc' failed with exit status 1   

This is stopping me from successfully building the extension, and hence cannot access the roi_align_backwards from python interface.
How can I fix this?

@fmassa
Copy link
Member

fmassa commented May 22, 2018

This is the error you get before even applying your patch, is that right?

@xssChauhan
Copy link
Author

Yes. Initially thought that it was introduced by me. Then tried on the layers branch, and found the same issue.

@fmassa
Copy link
Member

fmassa commented May 22, 2018

weird. Let me try compiling it (last time I checked it was working)

@fmassa
Copy link
Member

fmassa commented May 22, 2018

Ok, I know what's going on.

You need to have a source install from PyTorch. The bug you are facing was fixed in pytorch/pytorch#6986

@xssChauhan
Copy link
Author

Understood. Would install PyTorch from source.
Was using pip installation till now. Thank you :)

@xssChauhan
Copy link
Author

@fmassa Installed PyTorch from source and then tried compiling the layers branch.
Here's the error that i got:

running install
running bdist_egg
running egg_info
writing torchvision.egg-info/PKG-INFO
writing dependency_links to torchvision.egg-info/dependency_links.txt
writing requirements to torchvision.egg-info/requires.txt
writing top-level names to torchvision.egg-info/top_level.txt
reading manifest file 'torchvision.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
writing manifest file 'torchvision.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'torchvision._C' extension
gcc -pthread -B /home/shikhar/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/shikhar/Documents/vision-org/torchvision/csrc -I/home/shikhar/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/home/shikhar/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/home/shikhar/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/home/shikhar/anaconda3/include/python3.6m -c /home/shikhar/Documents/vision-org/torchvision/csrc/vision.cpp -o build/temp.linux-x86_64-3.6/home/shikhar/Documents/vision-org/torchvision/csrc/vision.o -DTORCH_EXTENSION_NAME=_C -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/shikhar/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/shikhar/Documents/vision-org/torchvision/csrc -I/home/shikhar/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/home/shikhar/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/home/shikhar/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/home/shikhar/anaconda3/include/python3.6m -c /home/shikhar/Documents/vision-org/torchvision/csrc/cpu/ROIAlign_cpu.cpp -o build/temp.linux-x86_64-3.6/home/shikhar/Documents/vision-org/torchvision/csrc/cpu/ROIAlign_cpu.o -DTORCH_EXTENSION_NAME=_C -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/home/shikhar/Documents/vision-org/torchvision/csrc/cpu/ROIAlign_cpu.cpp:226:66: error: macro "AT_ASSERT" passed 2 arguments, but takes just 1
   AT_ASSERT(!input.type().is_cuda(), "input must be a CPU tensor");
                                                                  ^
/home/shikhar/Documents/vision-org/torchvision/csrc/cpu/ROIAlign_cpu.cpp:227:64: error: macro "AT_ASSERT" passed 2 arguments, but takes just 1
   AT_ASSERT(!rois.type().is_cuda(), "rois must be a CPU tensor");
                                                                ^
/home/shikhar/Documents/vision-org/torchvision/csrc/cpu/ROIAlign_cpu.cpp: In function ‘at::Tensor ROIAlign_forward_cpu(const at::Tensor&, const at::Tensor&, float, int, int, int)’:
/home/shikhar/Documents/vision-org/torchvision/csrc/cpu/ROIAlign_cpu.cpp:226:3: error: ‘AT_ASSERT’ was not declared in this scope
   AT_ASSERT(!input.type().is_cuda(), "input must be a CPU tensor");
   ^
error: command 'gcc' failed with exit status 1

@fmassa
Copy link
Member

fmassa commented May 24, 2018

Ok, this is due to a recent change in PyTorch that modified the behavior of AT_ASSERT as you can see in pytorch/pytorch#7104

I believe you can replace AT_ASSERT with AT_CHECK and it should compile.

And sorry for the troubles getting this branch to compile, as you can see PyTorch is evolving quite fast!

@xssChauhan
Copy link
Author

xssChauhan commented May 24, 2018

Thank you. I'll do the changes.
Quiet exciting to see PyTorch grow so fast and learn from it!

@xssChauhan
Copy link
Author

Hey @fmassa
torchvision is now successfully compiling, but here's the issue that i am facing now:

Even the successful compilation does not seem to add layers or _C modules to torchvision.

Here are the steps that i took:

  • Install pytorch from source using the master branch and no cuda support.
  • Replace all the occurences of AT_ASSERT with AT_CHECK in nms_cpu.cpp and ROIAlign_cpu.cpp.
  • Install torchvision from source using layers branch

The build process throws no error.
In the shell:
image

Here are a couple of implementation details:

  • Using a conda environment

  • The build folder seems to have the compiled extension and the layers module
    image

  • I dont know if it is relevant, but installing pytorch and vision from source has different effects
    image

  • I have tried manually copying the contents of the build folder to the environment. The _C extension, is thus present in the environment package, but still does not appear in the terminal.

How can i fix this? Would be grateful for the help.

@fmassa
Copy link
Member

fmassa commented May 28, 2018

Hi,

So, I believe the copying of the _C file should fix all (almost?) all the issues.
I've installed torchvision using python setup.py build develop, so that we have a symlink to the torchvision directory.

Can you try doing

from torchvision import layers

If that doesn't work, one possibility that I think that's happening is that you might need to uninstall your previous torchvision installation before installing the new one using python setup.py install.

Could you try doing conda uninstall torchvision, check in your distribution that torchvision is not in the site-packages folder, than then try installing it again using python setup.py build develop?

@xssChauhan
Copy link
Author

@fmassa My bad. from torchvision import pytorch seemed to do the trick. Thank you.

@fmassa
Copy link
Member

fmassa commented Jun 19, 2018

@xssChauhan is this ready for review? Did you manage to perform the gradcheck?

@xssChauhan
Copy link
Author

@fmassa The branch is currently not passing the gradcheck. I've been working on finding the issue.

My code is referenced from Caffe2 and vision's implementation of the same.
I recently stumbled upon this implementation that references caffe2 as well. Using the same parameters as in the link to run the gradcheck.

@varunagrawal
Copy link
Contributor

Since this PR seems to be abandoned, I have taken the liberty to add the backwards pass for CPU in #630.

@fmassa
Copy link
Member

fmassa commented Oct 17, 2018

Closing in favor of #630

@fmassa fmassa closed this Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants