Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME][REFACTOR] Use new to avoid exit-time de-allocation order #6292

Merged
merged 1 commit into from
Aug 18, 2020

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Aug 17, 2020

It is hard to control deallocation order of TVM and other frameworks. So just keep the global state active and rely on OS to recycle the resources

@tqchen
Copy link
Member Author

tqchen commented Aug 17, 2020

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ideally all singletons should be created this way.

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samskalicky
Copy link
Contributor

Thanks @tqchen! I will try building this into my code tonight and get back to you tomorrow

@tqchen tqchen force-pushed the runtime branch 2 times, most recently from b68f759 to 260fb36 Compare August 18, 2020 01:37
@samskalicky
Copy link
Contributor

samskalicky commented Aug 18, 2020

This still doesnt fix my problem in #5986. But I think its going in the right direction. Now we have a long-lived device object that wont be destroyed. But we cannot access it anymore after the DeviceAPIManager and its child DeviceAPI objects have already been deleted in the global scope when attempting to access from the NDArray deleter DefaultDeleter. In the deleter its calling tvm::runtime::DeviceAPI::Get to get the DeviceAPI pointer.

Instead, what if we store the DeviceAPI pointer in the NDArray during creation (or whenever we set/change the context) and then use that pointer inside the deleter? The actual DeviceAPI object on the heap will still be around, so that should work.

@tqchen tqchen force-pushed the runtime branch 2 times, most recently from 27a810c to 22b3e7b Compare August 18, 2020 14:06
@tqchen
Copy link
Member Author

tqchen commented Aug 18, 2020

@samskalicky In tihs case the deleter is already long-lived. If the deletion happens during the process of already unloading of vtable or other global objects, then there is little that we can do to solve the problem. Again, in the cases where multiple dlls are involved, the best way is to avoid retaining global objects in python.

@samskalicky
Copy link
Contributor

@samskalicky In tihs case the deleter is already long-lived. If the deletion happens during the process of already unloading of vtable or other global objects, then there is little that we can do to solve the problem. Again, in the cases where multiple dlls are involved, the best way is to avoid retaining global objects in python.

Right, the problem could be that the DeviceAPIManager and its child DeviceAPI objects are shorter lived than the deleter. If we remove the dependency from the deleter to get the DeviceAPI through the (potentially destructed) DeviceAPIManager then the deleter doesnt depend on the lifetime of the DeviceAPIManager since it already has a pointer to the heap allocated DeviceAPI object.

@samskalicky
Copy link
Contributor

Since this is definitely something out of our control (shutdown order, varies depending on application usage) and we've decided to rely on the process exiting to free any memory, can we consider clearing the pointers in DeviceAPIManager like:

  // destructor
  ~DeviceAPIManager() { std::fill(api_.begin(), api_.end(), nullptr); }

And then just make sure that wherever we try and get the DeviceAPI that we check to see if the pointer is null? Then we just skip freeing any memory and dont have to worry about objects destructed out of order

@tqchen
Copy link
Member Author

tqchen commented Aug 18, 2020

NOTE that right now the DeviceAPIManager singleton is long-lived now, the value won't be cleared. Even if we choose to instead clear these values, there is no guarantee wrt to the order. So the best way is still to avoid global object in the code(if there are, avoid deleting them, like in this PR)

@samskalicky
Copy link
Contributor

NOTE that right now the DeviceAPIManager singleton is long-lived now, the value won't be cleared. Even if we choose to instead clear these values, there is no guarantee wrt to the order. So the best way is still to avoid global object in the code(if there are, avoid deleting them, like in this PR)

Agreed, but thats not always possible given other framework requirements/architecture. Shouldnt we at least be able to guarantee an orderly shutdown for TVM components (runtime/ndarrays/etc) regardless?

@tqchen
Copy link
Member Author

tqchen commented Aug 18, 2020

To be clear, I don't think it is the problem of shutdown order(as the objects are now long-lived). It seems that the function is called after the dll get unloaded, so there is little we can do.

@kparzysz-quic
Copy link
Contributor

This will not invoke any destructors on exit! What problem is this trying to solve?

@tqchen
Copy link
Member Author

tqchen commented Aug 18, 2020

@kparzysz-quic The main problem it tries to resolve is to avoid the problems that can be caused by differnt order destruction of static objects.

For example, if a static object takes NDArray, and it get destructed after the DeviceAPI get destructed, then there will be an error.

The resources(e.g. in the case of memory etc) will be recycled as OS recycles the process after unloading.

@kparzysz-quic
Copy link
Contributor

This will likely cause errors with address sanitizer.

We really need a better solution in the long term.

@kparzysz-quic
Copy link
Contributor

I'm not against it as a temporary solution.

@tqchen
Copy link
Member Author

tqchen commented Aug 18, 2020

I see, not freeing the static global singleton is a common pattern though. Is it possible to somehow provide hints to ASAN to disable the warnings?

@kparzysz-quic
Copy link
Contributor

There is a way to turn off the leak sanitizer: define a function int __lsan_is_turned_off(void); and make it return non-zero.

Here's the description (from compiler-rt/include/sanitizer/lsan_interface.h):

  // The user may optionally provide this function to disallow leak checking
  // for the program it is linked into (if the return value is non-zero). This
  // function must be defined as returning a constant value; any behavior beyond
  // that is unsupported.
  // To avoid dead stripping, you may need to define this function with
  // __attribute__((used))
  int __lsan_is_turned_off(void);

Btw, for static singleton, the C++ compiler will generate calls to their destructors when main terminates.

@kparzysz-quic
Copy link
Contributor

This is how the leak-suppression function is used in one of the source files in LLVM (in TableGen utility):

#ifndef __has_feature
#define __has_feature(x) 0
#endif

#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__) ||       \
    __has_feature(leak_sanitizer)

#include <sanitizer/lsan_interface.h>
// Disable LeakSanitizer for this binary as it has too many leaks that are not
// very interesting to fix. See compiler-rt/include/sanitizer/lsan_interface.h .
LLVM_ATTRIBUTE_USED int __lsan_is_turned_off() { return 1; }

#endif

@kparzysz-quic
Copy link
Contributor

Sorry, tried to clear the comment window and hit the wrong button. :(

@tqchen
Copy link
Member Author

tqchen commented Aug 18, 2020

OK, seems the concensus is to keep this solution for now. Would be great to have more conversations. @kparzysz-quic no worries, glad to see that you now have the committer perm setup correctly :)

@tqchen tqchen merged commit 233b8cc into apache:master Aug 18, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
@tqchen tqchen deleted the runtime branch February 26, 2023 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants