-
Notifications
You must be signed in to change notification settings - Fork 6.8k
MXNet Extensions enhancements2 #19016
MXNet Extensions enhancements2 #19016
Conversation
Hey @samskalicky , Thanks for submitting the PR
CI supported jobs: [edge, clang, centos-cpu, sanity, unix-cpu, windows-gpu, website, windows-cpu, unix-gpu, centos-gpu, miscellaneous] Note: |
@leezu for review of the cmake config file changes and @mseth10 and @rondogency for review of everything else |
add_library(customop_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/gemm_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc) | ||
add_library(transposecsr_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposecsr_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc) | ||
add_library(transposerowsp_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_custom_op/transposerowsp_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc) | ||
add_library(subgraph_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_subgraph/subgraph_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc) | ||
add_library(pass_lib SHARED ${CMAKE_CURRENT_SOURCE_DIR}/example/extensions/lib_pass/pass_lib.cc ${CMAKE_CURRENT_SOURCE_DIR}/src/lib_api.cc) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offline discussion with @leezu, in another PR we should move this build code into CMakeLists.txt for each example and use add_subdirectory
to include it and replace the current Makefiles so theres only 1 set of build steps for each example.
LibraryInitializer::~LibraryInitializer() { | ||
close_open_libs(); | ||
} | ||
LibraryInitializer::~LibraryInitializer() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an assertion which checks if all handles have been closed during shutdown of mxnet? That could allow to catch a possible leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed since there are cases where (for external ops for example) an object is registered in an MXNet data structure like the operator registry and during shutdown the object is attempted to be destructed but its pointing to an object in the loaded library. This ended up causing a segfault. Without closing open handles, we let loaded libraries live longer than libmxnet.so and allow it to shutdown cleanly.
This is why we changed MXLoadLib
to return the handle to the library and call dlclose
on the handle in Python.
However, this still isnt an issue since on process exit the library will be closed by the OS anyway when it cleans up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but hence an assertion or some other debug log to make the user aware that there are still unclosed handles. The idea is to give a hint during shutdown. If somebody then sees hundreds of unclosed handles, that could be a strong indicator of something being wrong - I'm not talking about automatic closing of the handle, just a message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cant check if handles are closed while libmxnet.so is closing. It would have to be checked elsewhere so the loaded libraries can live during libmxnet.so shutdown. If we put in a check it will print out to the user every time since the expectation is that the handles are closed after libmxnet.so
We're not talking about hundreds of handles, we're talking about one or maybe two libraries loaded by the user explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I was not aware that the goal was that these libraries have a longer lifecycle than libmxnet.so itself. Thanks for elaborating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
.setBackward(backwardGPU, "gpu"); | ||
|
||
MXReturnValue initialize(int version) { | ||
if (version >= 20000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure about this? since gemm_lib is still 10700
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should update all the examples to 20000 on master. ill do that in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline and we will change example corresponding to master
@@ -54,25 +54,25 @@ | |||
print("indices:", c.indices.asnumpy()) | |||
print("indptr:", c.indptr.asnumpy()) | |||
|
|||
print("--------start symbolic compute--------") | |||
print("--------start Gluon compute--------") | |||
d = mx.sym.Variable('d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this sym also get deprecated then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we still use Symbol class, but we removed all the usages like bind
self.handle = handle | ||
def __del__(self): | ||
libdl = ctypes.CDLL("libdl.so") | ||
libdl.dlclose(self.handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is to elaborate libdl.so lives longer than libmxnet.so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is how we close the library we loaded before, by using libdl.so
to call dlcose
/*! | ||
* Copyright (c) 2019 by Contributors | ||
* \file lib_api.cc | ||
* \brief APIs to interact with libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we say "extension API to load dynamic loaded custom libraries", and also say it will depend on lib_api.h (2-stop file instead of 1-stop header file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, lets work on what the new text should be and update it in another PR.
* initial commit * fixed c++17 downgrade * fixed stringstream * fixed cast * changed to use pointers for stringstream since not copyable * fixed includes * fixed makefile includes * skipped lint for malloc/free for passing across C ABI Co-authored-by: Ubuntu <ubuntu@ip-172-31-6-220.us-west-2.compute.internal>
Hi @samskalicky , it seems that now we need |
Hi @ZiyueHuang the lib_api.cc file can be accessed by downloading directly from github: |
If the user install mxnet via pip, only the header files in |
Hi @ZiyueHuang thanks for that feedback. Initially I was thinking the same way, we only had the So in 1.8 we split the code in The pip wheel is organized as the easy path to install in order to run Python programs using MXNet. But any C/C++ compilation for MXNet requires cloning the whole repo. Building a library for custom ops only requires the |
@ZiyueHuang in #19393 i propose adding the |
Description
This PR contains a few enhancements for MXNet extensions:
MXLoadLib
to return the handle to the loaded library and expects the user (or language binding) to close the library later.Other changes
Improves the instructions in config/linux_gpu.cmake for building for specific GPU architectures.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.