-
Notifications
You must be signed in to change notification settings - Fork 7k
[RFC] Encapsulate and standardize C++ Ops methods #2956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with the general direction that the proposal is going. I do not have enough context to fully evaluate the pitches, (because I am not familiar with vision's C++ codebase), but here are my thoughts
Having a standard sounds good. When it comes to picking one, looking at PyTorch codebase, they use CamelCase for headers and classes, and use snake_case for functions/methods, so maybe aligning with PyTorch would make it easier for people who work on PyTorch to work on torchvision codebase.
Idea-wise, this sounds good, but I do not know enough about forward/backward implementations, so I suggest, like the previous one, to study the convention of PyTorch core and adopt the same convention.
I think this is the good thing. But I lack the context. Is this "must have" for the dispatcher adaptation or just a "good to have" for better code base quality? I support the move either way. Regarding the signature conflict, it is not immediately clear to me why we will have a conflict. Is it because when you standardize the naming convention, there will be same implementations for different devices? (like forward for an operation for CUDA and CPU) But if it's in anonymous namespace, they are not public API, is the dispatcher still be able to use these non-public function?
I do not oppose this but we might want to be careful about it, as there are some build environment we do not (cannot) test with our CI systems.
I think anonymous namespace would just do. |
CameCase for header and classes is indeed a very common choice. @fmassa proposed in the past to prefer snake_case in order to keep Python and C++ consistent. Personally I just advocate consistent naming, I don't have a preference over what we use.
This is not "must have" to adopt the dispatcher. It's the latter.
On my final cleanups during the dispatcher work, I tried to align some of the functions but this lead into problems. As far as I understand, all methods of an operator across devices end up being imported together at one point. If the method signature (name + exact params) matches exactly across the two devices then you get a conflict (so this one fails while this one does not).
By putting the internal methods of ops (everything other than forward/backward) in an anonymous namespace, you hide them from all other files except the one where they are defined. So the forward/backward of each device will still be able to call them but they should not conflict if they are included together by other files.
Thanks for flagging. Could you elaborate which environments we should have in mind? |
So my understanding is that the top level functions that are called by dispatcher will be public and have different names (like
I hear torchvision has mobile use cases, which I have no idea how they are built and ran. I do not think our CI has a coverage for mobile, so we might want to conduct some research to ensure that it won't break something like that. If they only simply rely on TorchScript runtime, then we can assume that working TorchScript would be sufficient for mobile workflow. |
🚀 Feature
Currently all the internal methods of the C++ Ops are public and this can lead to signature conflicts. To avoid the problem, we often use slightly different names for the same functionality across devices (
nms_cpu_kernel
vsnms_kernel
). A better approach would be to encapsulate the methods and standardize the naming conventions across operators. Doing so will enforce good engineering standards, make it easier for first-time contributors to understand the code and potentially make it easier for us on the future to reduce the amount of duplicate/boilerplate code across Ops.Motivation
While porting the C++ ops to use the dispatcher and autocast (#2796, #2797) I found it frustrating that the codebase looks substantially different from one operator to the other, the naming conventions are not standardized and that there is no separation between internal and public API methods. Moreover while making the changes, I've experienced some strange errors related to method signature conflicts.
Pitch
Naming Conventions:
DeformConv2d
should becomedeform_conv2d
DeformConv.h
should bedeform_conv2d.h
*_forward_*
,*_backward_*
(partially done on previous PRs).Encapsulation:
Alternatives
An alternative to anonymous namespaces could be refactoring the code to use classes and access modifiers. This will help us enforce the naming conventions more naturally (using abstract classes), nevertheless it's likely to increase complexity. This is because each operator receives different parameters and thus multiple Classes might be necessary per operator. Finally with an OO approach we might have performance considerations.
Note: It's worth noting that the changes introduced earlier for adopting the new PyTorch Dispatcher and Autocast introduced some BC breaking changes on the C++ methods. Thought these methods are not considered part of the public API, it's worth probably making all the necessary changes to the code before the release of the next stable version to ensure we won't affect our users twice.
cc @fmassa @cpuhrsch @vfdev-5 @mthrok
The text was updated successfully, but these errors were encountered: