-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Informing Node of our memory usage #601
Comments
I was actually working on this just yesterday. You can see what I did in my fork here: I was planning to put together a pull request after I've had a chance to test it a bit more, but it looks like it works. There are a number of other bugs related to garbage collection and memory leaks that are actually this issue. I believe that #101, #167, #209, and #411 would all be resolved or dramatically improved by this. |
I was able to run some tests in a memory constrained environment, including running some of the examples with all the calls to |
thanks dstark; this has been bugging me for a while :). Maybe some heavy notes that the memory usage would be over-estimated in the case where real mat memory is referenced by multiple matrices, which should be rare but could trip someone up if not highlighted. I will test over the next couple of weeks, and comment here, if peter does not get there first.... |
Yes, I thought about that, because there are a couple of places in this library where a new Matrix is clearly instantiated with a reference to some existing memory, but I couldn't see a way to easily and reliably determine whether the memory being assigned or destroyed is just changing the reference count or whether it is a standalone object. So, at least in that case, you are correct that it will overstate the external memory usage by the size of the overlap. If there's an easy and reliable way to determine the state of the memory for the Mat, this should be pretty easy to fix, particularly because this change centralizes most of the wrapper object creation. But I couldn't see an easy way to do that. For example, I don't know OpenCV's internals all that well, but it looks like not every reference to a portion of memory has to be the same size, so if you only look at the reference count, it wasn't clear whether you could determine anything about the relative size of the other things sharing that memory. (Consider as an example the ROI function, where the returned matrix points at the original matrix but potentially has a different size). In writing this, I wanted to make sure that every amount that I added to the external memory count would also be removed when the object went away, because the consequences of getting that wrong are more likely to cause problems than temporarily overstating the external memory usage. |
So I looked into this a bit more today. It appears that we can determine a more accurate complete size of the referenced data by looking at A simple test using the ROI function shows that this could potentially correctly track the actual amount of external data being used when there are multiple references to the same There are potentially some weird edge cases with doing this that I'm not aware of though. |
I found one of those weird edge cases. The GMG background subtractor apparently retains a reference to its output matrix for some time past the end of the function call. Even if I change the place where our wrapper Matrix is constructed to correctly handle this situation, it doesn't appear that there's any way to guarantee that the reference will have been released before the Matrix is destroyed, so the result is inconsistent tracking of the memory. It seems likely that there are other places where similar things could happen. I haven't checked too many places at this point. |
interesting; I've got a problem with the MOG2 subtractor that it seems to take 1 CPU forever, but this sometimes happens only every other run if run synchronously (only checked on PC without the 'extras'). It could be that there is a generic fault in the opencv implementation. Ahh.... Maybe I see the issue; pls confirm. which should be released on ~AsyncBackgroundSubtractorWorker However, maybe 1) the destructor is not called promptly (after all, it's likely wrapped in a js variable) 2) is not called at all. looks like I introduced the bug in my taking of refs on async calls. There will be others like this. :(. |
I don't think that's the problem I'm seeing because the problem affects both the synchronous and asynchronous versions, and the affected variable was the The issue you point out might also be a problem though. If we are using the reference counts to track the references from our Javascript objects, then it is important that no other objects retain references to the tracked objects when our tracked Javascript objects are released, or our counting system will not work correctly. The problem you are pointing out is in that category-- it looks like the input image may be retained at least during the scope when the Javascript callback is executed. If the input image were released in that scope, then that would cause a problem for our external memory tracker. I actually found a related issue in VideoCaptureWrap where the new image was retained an extra time during the async callback and then released after, which I can resolve by calling None of this really mattered before because no memory was being leaked. Now, though, we need to make sure that the matrices that are wrapped in At this point, I have a lot of local work on this revised implementation of the external memory counting, but I'm not comfortable replacing the current PR with this implementation yet because it needs to be thoroughly checked out for this class of error. The approach in the PR is simpler and safer, but might significantly overestimate memory usage. This version is more accurate, but introduces some new requirements on how the internals of this library behave, which could be tricky to get right and easy to mess up in the future. I'll try to post it somewhere you can see it later today, and I'll work on some more testing to uncover these types of problems. |
detailed thoughts.... |
just came to me ... there is a bigger issue here with BG subtraction. This kind of indicates that the only completely reliable solution would be to look at the OpenCV allocator; unless we estimate the memory allocation of specific 'long run' objects and just add them to our usage. One off-the-wall idea. Node DOES know how much process memory is in use (i.e. we can read the process). A rough estimate could be made of how much a particular object used by measuring before and after (and trying to make sense of the JS memory counters may help). Then 'take' that memory against that object, and return it on the JS release. (I'm not a fan of this approach; just an idea...) |
To the first comment, that's a good point. The easiest way to handle that would be to make sure that instead of storing Mat references directly for Async functions, we store a reference using our wrapper Matrix objects, which when deallocated handle the external memory calculation correctly. It might be hard to test those cases, but that should work correctly, and the external memory count would be decremented whenever the later of the two Matrix objects is deconstructed. Hanging on the the passed in Matrix object itself would have worked if not for the fact that the user could call For the BG subtractor, it's not that big of a deal, we just need to isolate the objects used internally from the ones passed back into javascript. I was able to deal with the output matrix by cloning it and returning the clone. For practical purposes here, we only need to track the amount of memory that the node garbage collector can do anything about, and for now I've only focused on the Matrix objects. Potentially, background subtractor implementations could issue similar external memory adjustment calls, but I don't think that's necessary unless the memory usage is going to persist for a long time and be resolvable by triggering a Node GC run. Finally, Node's |
So it turns out that it is easy to produce a unit test that releases the input image before the body of the asynchronous call executes, so I have been able to confirm that the strategy I described above for handling async calls will work. |
I've published this ongoing work into a branch, which you can look at here: |
Notes from looking at the mods: I was not sure about use of clone(): in the case of a bg output, it's a monochrome mat, so not that large, however clone() does copy all of the data, which is a performance hit. I'd do anything to avoid it :) ref calling release() before async process has run; I deliberately do this all the time, so i don't need to worry about the object I should have just 'consumed'. If course, with node tracking the objects correctly, I should no longer need to, but lots of legacy code will... and the only harm should be over-estimation of memory use? One concern about keeping the JS object - e.g. in bg (this probably exposes my lack of node knowledge): finally; did you look at a custom matrix allocator? |
I wasn't sure about the use of Regarding cloning the output of bg, in order for this strategy of using the reference counts to track whether an object is in use, we need to make sure that only our Matrix objects retain references to the underlying Mats that we want to track. If the background subtractor is holding a reference to the object we want to return, we have to clone it. I suppose we could check the reference count at return time and only clone it if we see a reference being held. Calling The Matrix objects are not JS objects, they're C++ objects, and they're basically a wrapper class with an underlying cv::Mat. There's no garbage collector there. Calling delete will result in the destructor being called, which handles the external memory tracking. Because the Matrix objects make calls to v8 to adjust external memory, I'm not sure it is safe to create or destroy them on a separate thread. That's why the delete is in the closing handler and not in the execute body. There's a little about that pattern I used for async that I want to rework, but the version I have right now does the right thing when everything goes correctly. I saw that the custom allocators exist, but I'm not sure I know enough about OpenCV internals to implement one correctly, and I'm not sure that it's safe to call the v8 methods from everywhere. |
thanks for the explanations :). I think most is clear. ref "I'm not sure that it's safe to call the v8 methods from everywhere" - certainly not from another thread; was thinking about this after I wrote; so if we did this, we'd need to track the memory in mutex protected C++ code, and correct node's view from the JS thread - e.g. keeping a 'count of mem in use by mats' plus a 'count of mem known by JS', and fix them up every time we create or destroy one of our mat objects. |
Thank you for all the comments. It's pointed out a bunch of cases I might not have otherwise handled correctly. I just posted an update to my branch that slightly improves the async pattern to make memory handling a little more clear. There are some other classes I haven't looked at yet that have AsyncWorkers that will probably have to be modified as well. |
hi David, ok, now is later. The above branch now has a functional custom allocator, and i've used it to instrument the AdjustExternalMemory (only in Matrix.cc). first run output doing bg subtract (Mog2) is here: (note: I've done no research except on ocv 3.3) |
David, updated my branch with proposed use of custom allocator for 3.0-3.3, fallback to manual adjust for 2.4 - please take a look and comment. |
Thanks for looking into this. In general, I like the idea of the custom allocator approach if it can be made to work, but I think this implementation has a number of issues. First, this might be simpler if the two approaches were not intertwined. If replacing the allocator works, then that should be all you need, and the Another concern might be whether when objects are released normally, Node is informed at the right time. I'm not sure in the 3.x implementation how (or if) the deallocator determines that it is on the correct thread to call My last concern is that this approach doesn't save us any work, because it seems like the full, more complex solution is required to support 2.4. So that needs to be correct, and all the constraints on coding that it imposes need to be followed, or things break when built against 2.4. The advantages of switching to the custom allocator approach should be that it is simpler, more reliable, and easier to code against. If the manual adjustment method is going to proceed, there are still some other classes (like FaceRecognizer) that need a bunch of work to bring them in line, so there's still more to do with my original approach. |
on versions; it's so confusing. I downloaded a selection of opencv... Ref deallocate; the GC will call ~Matrix, so the memory should be released on time... but you are correct in that a delete of Matrix done in a separate thread would require us to NOT call the V8 fn. I agree, I've made a rather mixed solution; but I started with #ifdefs everywhere, before deciding that one function would do; so adopted yours. The requirement to pass in the anticipated size (which would not be used in 3+) does seem a little imposition. 'commit' is currently not used; intended to be used (=false) from allocations done in threads, where the V8 fn can't be called. A couple more commits added - mainly for version compatibility, but also added mat.release() in ~Matrix, else size will not be correct at the check. |
You've attached the custom allocator to What I'd recommend is to make |
yes, in 3+ (or rather 3.1+), the counting of how much memory we've used (for cv::mat) is totally automagic. But unless we know we're on the right thread, we can't call the Nan fn directly. Suggestion: Then in the memory tracking code, we detect the thread, and call the Nan fn if we're safe, else just change the counters. Let me know your thoughts and if you will contribute the next round of code :). I started an implementation, but 'real work' beckons in an 'against the wall' kind of way; so I'll do nothing else until I hear from you. |
It seems like the work I've been doing to track memory usage by Matrix object is required, at least for 2.x. It also has the advantage of keeping the amount of external memory Node knows about as close as we can to the amount of memory actually being kept alive by our JS objects. That's important because the external memory limit is relatively small, and when it is hit it triggers a full GC. So, to start with, my plan is to try to finish that (unless you think the custom allocator work could be made to work correctly against all OpenCV versions). I think I've finished with The custom allocator approach is clever, and if it can be made to work correctly and accurately, then I think it's a good idea to switch to it eventually, especially because it will be easier to work with and future development is less likely to mess it up. There's still a little bit of thought that needs to go into it, though, because I don't think the current approach is tracking memory accurately. At the very least, it is difficult to test that it works correctly. It potentially tracks a lot more memory than just the memory being kept alive by our JS objects, so it's important that it be as accurate as possible in tracking, to avoid unnecessary GCs. Similar to my original implementation, it sacrifices some accuracy for developer safety. In addition, if it only works for >3.1, then we still need to do the work for the 2.x Matrix tracking, so the custom allocator adds a lot of work without saving us anything right now. I would keep the custom allocator code entirely separate from the Matrix memory tracking, though obviously only one can be in effect at a time. If it is working correctly, it should do the right thing without requiring any modifications to Matrix (aside from removing the other memory tracking approach). Since it doesn't actually depend on Matrix at all, it does make sense that it could be moved into a separate file. I don't know if OpenCV.cc is the right place for it. We could create a new file for memory management for >3.1 only and put it all there. Regarding the thread detection, I'll do some more research there. It doesn't look easy with the current Nan api, but there's some current discussion about this kind of need in the Nan project. There's also a post-GC callback hook where it looks like we could fix up the memory, but I'd really rather avoid the GC than try to fix it up afterwards. Also, if the solution requires some sort of manual reconcile, then there's something else wrong with our approach, because the memory management should be entirely internal to the module. |
I just updated some additional classes with memory tracking in my branch. I think at this point I've covered everything we need to cover, although some of it I'm not sure how to test. Should I update PR #602 with these changes? I think the coverage of this version is more complete and as we've discussed, more accurate than the proposal that is currently there. I have some more investigation to do on the custom allocator strategy, but I think this work is independent of that. |
give me a few days; I downloaded all those version of ocv, so may as well put them to use. I'll pull the branch down locally and check it over; at least from a perspective of ocv versions, and let you know. |
Hi David, |
I think I see what you want here, and I'll look at it some more. I'm not sure a global Since the scope of the memory adjustment is only for Matrix objects (currently), we could create a new static method for Matrix (something like |
I was thinking a global define would be most efficient long term; i.e. if disabled, it would compile to nothing? |
I think the performance difference would be minimal, and to me it seems clearer and more difficult to misuse. However, at the moment, there are only 3 calls to With that done, the memory tracking would be entirely an implementation detail of Matrix, and wouldn't need to be exposed in any public way. I think that also allows us to defer the work of turning that behavior on or off with a flag to when we add the custom allocator. I'll make that adjustment first. |
Ok, I just checked in changes on my branch that consolidate all the calls to The remaining calls are in constructors or destructors, or in a method that explicitly changes the size of a matrix in place (merge, color conversion, image pyramid upscale/downscale) and so doesn't fit neatly into one of the common patterns. |
At this point, I'd like to update PR #602 with this version of the code. I think it's strictly better than the earlier version I submitted. It should correctly track all the cases that that code does, plus it doesn't double count memory referenced from more than one JS object. In addition, while testing things, I've found and fixed a number of cases that my original implementation did not handle correctly. How does that sound? |
I dropped the ball on this, is #602 ready to be merged? |
I think it is. There are potentially some future enhancements that we've talked about in this thread that could be done separately later, but the current PR is complete and working. |
Currently, node-opencv does not tell nodejs about it's memory usage.
On low memory footprint systems (e.g. if node has been told to limit it's ram to 256mbytes on an RPi), it is very easy to blow the RAM budget with a live stream unless Matrix objects are explicitly manually released.
Investigation has lead to AdjustAmountOfExternalAllocatedMemory -
This is used in node-canvas and node-mapnik to inform the V8 engine of memory usage, although in node-mapnik's case, the discussions are not all positive.
The implied advantage of using this function to tell node how much memory we have used is that IF node hits a point where it could be exceeding it's allow memory constraints (default 1.5GBytes, or constrained by command line/environment option), a major GC will be triggered so saving the process from killing the platform or being killed by the kernel. The major GC will destroy any unused Matrix objects, so freeing memory.
In our case, it is (probably) Always better to manage the memory manually from a performance perspective (as a major GC is expensive, and especially in a live stream case, you don't really want the 100ms time 'holes' in your stream), But, it is also not acceptable for our processes to die because we forgot to release a few mats every frame.
I've raised this issue here as informational; I'm not sure I have all the skills (or time) to do a good implementation of use of AdjustAmountOfExternalAllocatedMemory, or even analyse if this is beneficial, but hope others will add their thoughts/experiences to help us know if we should do this.
The text was updated successfully, but these errors were encountered: