Skip to content

[RFC] Encapsulate and standardize C++ Ops methods #2956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datumbox opened this issue Nov 3, 2020 · 3 comments · Fixed by #3097
Closed

[RFC] Encapsulate and standardize C++ Ops methods #2956

datumbox opened this issue Nov 3, 2020 · 3 comments · Fixed by #3097

Comments

@datumbox
Copy link
Contributor

datumbox commented Nov 3, 2020

🚀 Feature

Currently all the internal methods of the C++ Ops are public and this can lead to signature conflicts. To avoid the problem, we often use slightly different names for the same functionality across devices (nms_cpu_kernel vs nms_kernel). A better approach would be to encapsulate the methods and standardize the naming conventions across operators. Doing so will enforce good engineering standards, make it easier for first-time contributors to understand the code and potentially make it easier for us on the future to reduce the amount of duplicate/boilerplate code across Ops.

Motivation

While porting the C++ ops to use the dispatcher and autocast (#2796, #2797) I found it frustrating that the codebase looks substantially different from one operator to the other, the naming conventions are not standardized and that there is no separation between internal and public API methods. Moreover while making the changes, I've experienced some strange errors related to method signature conflicts.

Pitch

Naming Conventions:

  • Make the name of all operators lowercase to match the names of the Python Ops. For example: files, headers and methods containing DeformConv2d should become deform_conv2d
  • Ensure naming consistency within operators. Example: DeformConv.h should be deform_conv2d.h
  • Standardize the names of the public functions that the ops need to implement and make them consistent across devices. For example: *_forward_*, *_backward_* (partially done on previous PRs).

Encapsulation:

Alternatives

An alternative to anonymous namespaces could be refactoring the code to use classes and access modifiers. This will help us enforce the naming conventions more naturally (using abstract classes), nevertheless it's likely to increase complexity. This is because each operator receives different parameters and thus multiple Classes might be necessary per operator. Finally with an OO approach we might have performance considerations.

Note: It's worth noting that the changes introduced earlier for adopting the new PyTorch Dispatcher and Autocast introduced some BC breaking changes on the C++ methods. Thought these methods are not considered part of the public API, it's worth probably making all the necessary changes to the code before the release of the next stable version to ensure we won't affect our users twice.

cc @fmassa @cpuhrsch @vfdev-5 @mthrok

@mthrok
Copy link
Contributor

mthrok commented Nov 3, 2020

I agree with the general direction that the proposal is going. I do not have enough context to fully evaluate the pitches, (because I am not familiar with vision's C++ codebase), but here are my thoughts

Pitch

Naming Conventions:

  • Make the name of all operators lowercase to match the names of the Python Ops. For example: files, headers and methods containing DeformConv2d should become deform_conv2d
  • Ensure naming consistency within operators. Example: DeformConv.h should be deform_conv2d.h

Having a standard sounds good. When it comes to picking one, looking at PyTorch codebase, they use CamelCase for headers and classes, and use snake_case for functions/methods, so maybe aligning with PyTorch would make it easier for people who work on PyTorch to work on torchvision codebase.

  • Standardize the names of the public functions that the ops need to implement and make them consistent across devices. For example: *_forward_*, *_backward_* (partially done on previous PRs).

Idea-wise, this sounds good, but I do not know enough about forward/backward implementations, so I suggest, like the previous one, to study the convention of PyTorch core and adopt the same convention.

Encapsulation:

  • Use anonymous namespaces to hide internal ops methods from the public methods.

I think this is the good thing. But I lack the context. Is this "must have" for the dispatcher adaptation or just a "good to have" for better code base quality? I support the move either way.

Regarding the signature conflict, it is not immediately clear to me why we will have a conflict. Is it because when you standardize the naming convention, there will be same implementations for different devices? (like forward for an operation for CUDA and CPU) But if it's in anonymous namespace, they are not public API, is the dispatcher still be able to use these non-public function?

I do not oppose this but we might want to be careful about it, as there are some build environment we do not (cannot) test with our CI systems.

Alternatives

An alternative to anonymous namespaces could be refactoring the code to use classes and access modifiers.

I think anonymous namespace would just do.

@datumbox
Copy link
Contributor Author

datumbox commented Nov 4, 2020

they use CamelCase for headers and classes, and use snake_case for functions/methods

CameCase for header and classes is indeed a very common choice. @fmassa proposed in the past to prefer snake_case in order to keep Python and C++ consistent. Personally I just advocate consistent naming, I don't have a preference over what we use.

Is this "must have" for the dispatcher adaptation or just a "good to have" for better code base quality?

This is not "must have" to adopt the dispatcher. It's the latter.

Regarding the signature conflict, it is not immediately clear to me why we will have a conflict.

On my final cleanups during the dispatcher work, I tried to align some of the functions but this lead into problems. As far as I understand, all methods of an operator across devices end up being imported together at one point. If the method signature (name + exact params) matches exactly across the two devices then you get a conflict (so this one fails while this one does not).

But if it's in anonymous namespace, they are not public API, is the dispatcher still be able to use these non-public function?

By putting the internal methods of ops (everything other than forward/backward) in an anonymous namespace, you hide them from all other files except the one where they are defined. So the forward/backward of each device will still be able to call them but they should not conflict if they are included together by other files.

I do not oppose this but we might want to be careful about it, as there are some build environment we do not (cannot) test with our CI systems.

Thanks for flagging. Could you elaborate which environments we should have in mind?

@mthrok
Copy link
Contributor

mthrok commented Nov 9, 2020

Regarding the signature conflict, it is not immediately clear to me why we will have a conflict.

On my final cleanups during the dispatcher work, I tried to align some of the functions but this lead into problems. As far as I understand, all methods of an operator across devices end up being imported together at one point. If the method signature (name + exact params) matches exactly across the two devices then you get a conflict (so this one fails while this one does not).

But if it's in anonymous namespace, they are not public API, is the dispatcher still be able to use these non-public function?

By putting the internal methods of ops (everything other than forward/backward) in an anonymous namespace, you hide them from all other files except the one where they are defined. So the forward/backward of each device will still be able to call them but they should not conflict if they are included together by other files.

So my understanding is that the top level functions that are called by dispatcher will be public and have different names (like XXX_forwad_CPU and XXX_forward_CUDA), and the helper functions used by these top level will go into anonymous namespace, possibly with the same function name. I think that's a fair approach. One thing I wonder is if it is a good idea to pose a constraint of having the same name for helper functions. I am not familiar with CUDA but I hear their programming paradigm is very different from CPU. So, trying to align the signatures of helper functions for different device might not give a benefit.

I do not oppose this but we might want to be careful about it, as there are some build environment we do not (cannot) test with our CI systems.

Thanks for flagging. Could you elaborate which environments we should have in mind?

I hear torchvision has mobile use cases, which I have no idea how they are built and ran. I do not think our CI has a coverage for mobile, so we might want to conduct some research to ensure that it won't break something like that. If they only simply rely on TorchScript runtime, then we can assume that working TorchScript would be sufficient for mobile workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants