Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Informing Node of our memory usage #601

Closed
btsimonh opened this issue Dec 23, 2017 · 34 comments · Fixed by #602
Closed

Informing Node of our memory usage #601

btsimonh opened this issue Dec 23, 2017 · 34 comments · Fixed by #602

Comments

@btsimonh
Copy link
Collaborator

Currently, node-opencv does not tell nodejs about it's memory usage.
On low memory footprint systems (e.g. if node has been told to limit it's ram to 256mbytes on an RPi), it is very easy to blow the RAM budget with a live stream unless Matrix objects are explicitly manually released.

Investigation has lead to AdjustAmountOfExternalAllocatedMemory -
This is used in node-canvas and node-mapnik to inform the V8 engine of memory usage, although in node-mapnik's case, the discussions are not all positive.

The implied advantage of using this function to tell node how much memory we have used is that IF node hits a point where it could be exceeding it's allow memory constraints (default 1.5GBytes, or constrained by command line/environment option), a major GC will be triggered so saving the process from killing the platform or being killed by the kernel. The major GC will destroy any unused Matrix objects, so freeing memory.
In our case, it is (probably) Always better to manage the memory manually from a performance perspective (as a major GC is expensive, and especially in a live stream case, you don't really want the 100ms time 'holes' in your stream), But, it is also not acceptable for our processes to die because we forgot to release a few mats every frame.

I've raised this issue here as informational; I'm not sure I have all the skills (or time) to do a good implementation of use of AdjustAmountOfExternalAllocatedMemory, or even analyse if this is beneficial, but hope others will add their thoughts/experiences to help us know if we should do this.

@dstarke
Copy link
Contributor

dstarke commented Dec 23, 2017

I was actually working on this just yesterday. You can see what I did in my fork here:
https://github.com/Atollogy/node-opencv/tree/memory-fix

I was planning to put together a pull request after I've had a chance to test it a bit more, but it looks like it works.

There are a number of other bugs related to garbage collection and memory leaks that are actually this issue. I believe that #101, #167, #209, and #411 would all be resolved or dramatically improved by this.

@dstarke
Copy link
Contributor

dstarke commented Jan 3, 2018

I was able to run some tests in a memory constrained environment, including running some of the examples with all the calls to Matrix.release() commented out, and things seem to be working, so I've submitted the pull request. More technical details about the changes are available in the description of the pull request.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 3, 2018

thanks dstark; this has been bugging me for a while :).
One comment: mat is a reference to some internally allocated memory - and as such, (in a very quick review), this mod will note the mat size for every mat, and reduce the size every release(). Is it practical to know (and increase the mem usage) when a mat create actually allocated memory, and when it only incremented the ref count (and not increase the mem)?
And the other way around, only reduce memory when the ref count means that the release actually released memory?
The ideal would be to intercept the OpenCV allocator, but looking into this it seems like a) it's not the same in 2.4/3.0/3.3, and b) may require a special OpenCV build :(. So I don't think this is an option.

Maybe some heavy notes that the memory usage would be over-estimated in the case where real mat memory is referenced by multiple matrices, which should be rare but could trip someone up if not highlighted.

I will test over the next couple of weeks, and comment here, if peter does not get there first....

@dstarke
Copy link
Contributor

dstarke commented Jan 3, 2018

Yes, I thought about that, because there are a couple of places in this library where a new Matrix is clearly instantiated with a reference to some existing memory, but I couldn't see a way to easily and reliably determine whether the memory being assigned or destroyed is just changing the reference count or whether it is a standalone object. So, at least in that case, you are correct that it will overstate the external memory usage by the size of the overlap.

If there's an easy and reliable way to determine the state of the memory for the Mat, this should be pretty easy to fix, particularly because this change centralizes most of the wrapper object creation.

But I couldn't see an easy way to do that. For example, I don't know OpenCV's internals all that well, but it looks like not every reference to a portion of memory has to be the same size, so if you only look at the reference count, it wasn't clear whether you could determine anything about the relative size of the other things sharing that memory. (Consider as an example the ROI function, where the returned matrix points at the original matrix but potentially has a different size).

In writing this, I wanted to make sure that every amount that I added to the external memory count would also be removed when the object went away, because the consequences of getting that wrong are more likely to cause problems than temporarily overstating the external memory usage.

@dstarke
Copy link
Contributor

dstarke commented Jan 3, 2018

So I looked into this a bit more today. It appears that we can determine a more accurate complete size of the referenced data by looking at (mat.dataend - mat.datastart) instead of multiplying the rows columns and element size. Combining this with the reference count of the mat might allow us to figure out if we need to track additional external data (or untrack it on release).

A simple test using the ROI function shows that this could potentially correctly track the actual amount of external data being used when there are multiple references to the same Mat object.

There are potentially some weird edge cases with doing this that I'm not aware of though.

@dstarke
Copy link
Contributor

dstarke commented Jan 3, 2018

I found one of those weird edge cases. The GMG background subtractor apparently retains a reference to its output matrix for some time past the end of the function call. Even if I change the place where our wrapper Matrix is constructed to correctly handle this situation, it doesn't appear that there's any way to guarantee that the reference will have been released before the Matrix is destroyed, so the result is inconsistent tracking of the memory.

It seems likely that there are other places where similar things could happen. I haven't checked too many places at this point.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 4, 2018

interesting; I've got a problem with the MOG2 subtractor that it seems to take 1 CPU forever, but this sometimes happens only every other run if run synchronously (only checked on PC without the 'extras'). It could be that there is a generic fault in the opencv implementation.

Ahh.... Maybe I see the issue; pls confirm.
If my comments in the code are correct, in AsyncBackgroundSubtractorWorker, the constructor increments the ref by:
img_mat(img_mat)

which should be released on ~AsyncBackgroundSubtractorWorker

However, maybe 1) the destructor is not called promptly (after all, it's likely wrapped in a js variable) 2) is not called at all.
But it should be safe to call img_mat.release() in HandleOKCallback, which would drop the ref count as soon as the mat was no longer in use?

looks like I introduced the bug in my taking of refs on async calls. There will be others like this. :(.
And I will need to learn to be more explicit....
s

@dstarke
Copy link
Contributor

dstarke commented Jan 4, 2018

I don't think that's the problem I'm seeing because the problem affects both the synchronous and asynchronous versions, and the affected variable was the fgmat return value, not the image being processed. Also, the problem I'm seeing only affects the GMG implementation and no others. I suspect that something internal to that implementation retains a reference to the fgmat variable for a short time after. I can work around it by returning a clone of the fgmat instead.

The issue you point out might also be a problem though. If we are using the reference counts to track the references from our Javascript objects, then it is important that no other objects retain references to the tracked objects when our tracked Javascript objects are released, or our counting system will not work correctly. The problem you are pointing out is in that category-- it looks like the input image may be retained at least during the scope when the Javascript callback is executed. If the input image were released in that scope, then that would cause a problem for our external memory tracker.

I actually found a related issue in VideoCaptureWrap where the new image was retained an extra time during the async callback and then released after, which I can resolve by calling release() before the callback is issued.

None of this really mattered before because no memory was being leaked. Now, though, we need to make sure that the matrices that are wrapped in Matrix objects are not retained anywhere else, because we need to make sure that one of our Matrix objects will be the last one to release its inner Mat.

At this point, I have a lot of local work on this revised implementation of the external memory counting, but I'm not comfortable replacing the current PR with this implementation yet because it needs to be thoroughly checked out for this class of error. The approach in the PR is simpler and safer, but might significantly overestimate memory usage. This version is more accurate, but introduces some new requirements on how the internals of this library behave, which could be tricky to get right and easy to mess up in the future. I'll try to post it somewhere you can see it later today, and I'll work on some more testing to uncover these types of problems.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 4, 2018

detailed thoughts....
Reference keeping a ref to the input mat whilst our JS releases the object. (e.g. in async BG call). Previously, (maybe not in this fn, but certainly in others), a raw pointer to the input mat was kept for the execute function to work on. This was very bad news, as the input mat could be released before the execute fn even ran -> big trouble.
Now the situation is that our JS may release the JS object, so telling node that we're using less memory than we actually are. (but still mildly better than the situation we had previously where the JS didn't have any idea about the memory used, but not erring on caution of overestimating the memory used. I.e. if node only GCs when memory is exhausted, we could still run out of RAM on a live running system -> bad).
I'm not so good with NaN and the Node API, but it does seem like we would need to hang on to the JS object as well as taking a ref on the mat. I'm guessing that there is a function to do this.....
Then, the JS would not attempt to release the ram (and not manage to do so, but think it did) because we'd still be holding a ref on the JS object?

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 4, 2018

just came to me ... there is a bigger issue here with BG subtraction.
I can only imagine how it works.. but certainly the BG subtractor itself allocates some storage whilst it exists. We can't account for this... what if it stores the last 20 images? - it's a massive amount of memory...
In terms of GMG; it may indeed store the last image it output - indeed this may be the way GMG works; the previous output image(s) are almost certainly able to aid the next calculations....

This kind of indicates that the only completely reliable solution would be to look at the OpenCV allocator; unless we estimate the memory allocation of specific 'long run' objects and just add them to our usage.

One off-the-wall idea. Node DOES know how much process memory is in use (i.e. we can read the process). A rough estimate could be made of how much a particular object used by measuring before and after (and trying to make sense of the JS memory counters may help). Then 'take' that memory against that object, and return it on the JS release. (I'm not a fan of this approach; just an idea...)

@dstarke
Copy link
Contributor

dstarke commented Jan 4, 2018

To the first comment, that's a good point. The easiest way to handle that would be to make sure that instead of storing Mat references directly for Async functions, we store a reference using our wrapper Matrix objects, which when deallocated handle the external memory calculation correctly. It might be hard to test those cases, but that should work correctly, and the external memory count would be decremented whenever the later of the two Matrix objects is deconstructed.

Hanging on the the passed in Matrix object itself would have worked if not for the fact that the user could call release() at any time even though they shouldn't while an async method is in flight. I suppose another option would be to have the async operations mark the Matrix objects they are using and then haverelease() throw an exception if there's an in-flight async operation using that Matrix. I'm not sure that's a great idea, though.

For the BG subtractor, it's not that big of a deal, we just need to isolate the objects used internally from the ones passed back into javascript. I was able to deal with the output matrix by cloning it and returning the clone.

For practical purposes here, we only need to track the amount of memory that the node garbage collector can do anything about, and for now I've only focused on the Matrix objects. Potentially, background subtractor implementations could issue similar external memory adjustment calls, but I don't think that's necessary unless the memory usage is going to persist for a long time and be resolvable by triggering a Node GC run.

Finally, Node's process.memoryUsage() does not know about the external memory usage except what it has been specifically told by something like AdjustAmountOfExternalAllocatedMemory(). I'm not sure if there's somewhere else to look, but I haven't seen anything. If there were some way on the C++ side to check how much memory was in use, then we could get a baseline measurement at the beginning and then periodically adjust the amount based on the difference between the previous and current total allocated amount. That number could be very wrong at any instant though for a whole bunch of reasons, and it's also preferable to have the amount we are reporting to Node be the amount of memory (or close to the amount of memory) that the garbage collector could actually do anything about.

@dstarke
Copy link
Contributor

dstarke commented Jan 5, 2018

So it turns out that it is easy to produce a unit test that releases the input image before the body of the asynchronous call executes, so I have been able to confirm that the strategy I described above for handling async calls will work.

@dstarke
Copy link
Contributor

dstarke commented Jan 5, 2018

I've published this ongoing work into a branch, which you can look at here:
https://github.com/Atollogy/node-opencv/tree/memory-fix-2

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 5, 2018

Notes from looking at the mods:

I was not sure about
Nan::AdjustExternalMemory(img->mat.dataend - img->mat.datastart);
but there is a good explanation here, and I believe you usage is correct:
opencv/opencv#8304
datalimit would be more exact for sizing?

use of clone(): in the case of a bg output, it's a monochrome mat, so not that large, however clone() does copy all of the data, which is a performance hit. I'd do anything to avoid it :)

ref calling release() before async process has run; I deliberately do this all the time, so i don't need to worry about the object I should have just 'consumed'. If course, with node tracking the objects correctly, I should no longer need to, but lots of legacy code will... and the only harm should be over-estimation of memory use?

One concern about keeping the JS object - e.g. in bg (this probably exposes my lack of node knowledge):
I assume new Matrix::Matrix(_img) creates a new JS object with a new mat which references the original, and which node knows about.
So when it is removed:
delete matrix;
does this call release() right away, or just de-reference for GC to take care of it later?
Is it safe to pass a pointer to a JS object to another thread (i.e. can node move JS objects?)

finally; did you look at a custom matrix allocator?
(e.g. https://gist.github.com/kazuki-ma/1901180)
there is little documentation, but these lines in mat.hpp point in a direction:
//! custom allocator
MatAllocator* allocator;
//! and the standard allocator
static MatAllocator* getStdAllocator();
static MatAllocator* getDefaultAllocator();
static void setDefaultAllocator(MatAllocator* allocator);

@dstarke
Copy link
Contributor

dstarke commented Jan 5, 2018

I wasn't sure about the use of datalimit. It means that there are potentially some extra bytes that we're not tracking, but I'm not sure that's a problem. Using datastart and dataend made it easy to test that a predictable amount of memory was being allocated and deallocated at the right times.

Regarding cloning the output of bg, in order for this strategy of using the reference counts to track whether an object is in use, we need to make sure that only our Matrix objects retain references to the underlying Mats that we want to track. If the background subtractor is holding a reference to the object we want to return, we have to clone it. I suppose we could check the reference count at return time and only clone it if we see a reference being held.

Calling release() while an async method is in flight is an odd case. It's not going to reduce your memory consumption while that async function is still running. In any event, in the test/memory.js file you can see all the test cases I wrote, and for each of the async functions, I have a test case that calls release before the body of the function executes.

The Matrix objects are not JS objects, they're C++ objects, and they're basically a wrapper class with an underlying cv::Mat. There's no garbage collector there. Calling delete will result in the destructor being called, which handles the external memory tracking.

Because the Matrix objects make calls to v8 to adjust external memory, I'm not sure it is safe to create or destroy them on a separate thread. That's why the delete is in the closing handler and not in the execute body.

There's a little about that pattern I used for async that I want to rework, but the version I have right now does the right thing when everything goes correctly.

I saw that the custom allocators exist, but I'm not sure I know enough about OpenCV internals to implement one correctly, and I'm not sure that it's safe to call the v8 methods from everywhere.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 5, 2018

thanks for the explanations :). I think most is clear. ref "I'm not sure that it's safe to call the v8 methods from everywhere" - certainly not from another thread; was thinking about this after I wrote; so if we did this, we'd need to track the memory in mutex protected C++ code, and correct node's view from the JS thread - e.g. keeping a 'count of mem in use by mats' plus a 'count of mem known by JS', and fix them up every time we create or destroy one of our mat objects.

@dstarke
Copy link
Contributor

dstarke commented Jan 5, 2018

Thank you for all the comments. It's pointed out a bunch of cases I might not have otherwise handled correctly. I just posted an update to my branch that slightly improves the async pattern to make memory handling a little more clear.

There are some other classes I haven't looked at yet that have AsyncWorkers that will probably have to be modified as well.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 6, 2018

hi David,
building on your stuff, I tried out a custom allocator. However, although it gets allocations, it does NOT get deallocations; so this may not work for us. source is here:
https://github.com/btsimonh/node-opencv-nr/tree/CustomAllocator
All it does is note the numbers, and add access methods to read them (through a mat for the moment).
s
hold that thought; found the fault. more later.

ok, now is later. The above branch now has a functional custom allocator, and i've used it to instrument the AdjustExternalMemory (only in Matrix.cc).

first run output doing bg subtract (Mog2) is here:
https://gist.github.com/btsimonh/12ed006fc83fd4fd81240169dac8ce69
One output per frame, you can clearly see the GC occurring (look for where memNumDeAlloc jumps to almost meet memNumAlloc). In theory the mem and memJS should match, but clearly we're losing some. (but mainly at the beginning? the FIRST output is {"memNumAlloc":12,"memNumDeAlloc":0,"mem":39014400,"memJS":7680000})

(note: I've done no research except on ocv 3.3)
(researching 2.4 - seems that you must set the allocator on every mat, not able to set a default allocator. So seems for 2.4 support, your counting is better? But for 3.3, the custom allocator should be perfect?)

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 7, 2018

David, updated my branch with proposed use of custom allocator for 3.0-3.3, fallback to manual adjust for 2.4 - please take a look and comment.
p.s. found my bg cpu issue -> OpenCL; disabled now for my windows use.

@dstarke
Copy link
Contributor

dstarke commented Jan 7, 2018

Thanks for looking into this. In general, I like the idea of the custom allocator approach if it can be made to work, but I think this implementation has a number of issues.

First, this might be simpler if the two approaches were not intertwined. If replacing the allocator works, then that should be all you need, and the AdjustExternalMemory() method doesn't need to do anything. That eliminates the need for the extra commit parameter. If 2.x doesn't use the custom allocator, then it probably shouldn't call into the custom allocator code. The two implementations look similar in that they call the same API, but the logic for calculating the memory adjustment amount is very different.

Another concern might be whether when objects are released normally, Node is informed at the right time. I'm not sure in the 3.x implementation how (or if) the deallocator determines that it is on the correct thread to call Nan::AdjustExternalMemory(). If it is already in the JS thread (and I believe it would be during garbage collection), then it can go ahead and call that, but if the object is being released somewhere else, like in an AsyncWorker, it might not be ok. We currently only have Matrix objects being released on the JS thread, but I'm not sure about other cv::Mat objects. It seems like there should be some sort of mechanism for checking if you're on the right thread, and if not, scheduling an update for later to execute on the right thread (if there isn't one already scheduled). Right now it looks like it won't fix the memory until the next call of AdjustExternalMemory(), which won't happen until we create another object (which might not happen for a while), change an object (which might not happen at all), or destroy an object (which might not happen until the JS garbage collector runs again, which could be relatively far in the future). AdjustExternalMemory() is really part of the 2.x solution, so it feels a bit strange for the custom allocator solution to depend on it.

My last concern is that this approach doesn't save us any work, because it seems like the full, more complex solution is required to support 2.4. So that needs to be correct, and all the constraints on coding that it imposes need to be followed, or things break when built against 2.4. The advantages of switching to the custom allocator approach should be that it is simpler, more reliable, and easier to code against.

If the manual adjustment method is going to proceed, there are still some other classes (like FaceRecognizer) that need a bunch of work to bring them in line, so there's still more to do with my original approach.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 7, 2018

on versions; it's so confusing. I downloaded a selection of opencv...
I just tested on
2.4.13.5: all good
3.0.0 - crashes on toBuffer() (? i think)
3.1.0 - no VideoCapture
3.3.0 - all good
3.3.1 - all good
3.4.0 - all good
Not saying it's our fault :).

Ref deallocate; the GC will call ~Matrix, so the memory should be released on time... but you are correct in that a delete of Matrix done in a separate thread would require us to NOT call the V8 fn.
I did think about snooping the threadid, but not sure on cross-platform aspects....

I agree, I've made a rather mixed solution; but I started with #ifdefs everywhere, before deciding that one function would do; so adopted yours. The requirement to pass in the anticipated size (which would not be used in 3+) does seem a little imposition. 'commit' is currently not used; intended to be used (=false) from allocations done in threads, where the V8 fn can't be called.

A couple more commits added - mainly for version compatibility, but also added mat.release() in ~Matrix, else size will not be correct at the check.

@dstarke
Copy link
Contributor

dstarke commented Jan 8, 2018

You've attached the custom allocator to cv:Mat, so what ~Matrix does becomes irrelevant. Your allocator gets called for every cv:Mat alloc and dealloc, but we know that not all of those happen on the thread we want. That could cause crashes (I think toBufferAsync is one place it could crash, for example-- the way it works right now, the Mat definitely can get reallocated on the background thread there). I'm pretty sure the logic for this right now is not quite correct, and the overlap between the two implementations makes it even more confusing. Needing to call mat.release() in the Matrix destructor is a further sign that this is not quite right, since that is about to happen automatically. If the custom allocator strategy is implemented correctly, it should require no changes to an implementation of Matrix that isn't itself adjusting the external memory size. It particularly shouldn't require you to release something that is about to be released anyway, since the entire point of hooking into the deallocator was to handle that case.

What I'd recommend is to make AdjustExternalMemory() call Nan::AdjustExternalMemory() directly with the supplied difference in 2.x, and make it do absolutely nothing for 3.x. In 3.x, I'd add in all your custom allocator code, and then work on getting the scheduling of `Nan:AdjustExternalMemory()' correct without overloading our 2.x implementation with that responsibility. It should be possible to make the custom allocator code all in one block, since the whole idea is to not alter Matrix, but essentially inject that code into OpenCV. There should not be #ifdefs everywhere, only two blocks.

@btsimonh
Copy link
Collaborator Author

btsimonh commented Jan 8, 2018

yes, in 3+ (or rather 3.1+), the counting of how much memory we've used (for cv::mat) is totally automagic. But unless we know we're on the right thread, we can't call the Nan fn directly.
maybe
https://stackoverflow.com/questions/28236481/detect-whether-current-thread-is-main-thread-of-the-libuv-default-event-loop
may help....
(But in the back of my mind I do worry about things like https://www.npmjs.com/package/webworker-threads
which is already under serious consideration by the nodjs team nodejs/worker#4
I'd love to have that functionality! - but we can push it aside for the moment).

Suggestion:
We move all the memory tracking out of Matrix/customallocator and into class OpenCV - after all, we may want to add tracking of non Matrix memory too, and Opencv.h is included everywhere.
Then the customallocator can call the fns there.
And the 2.4-3.1 implementation also.
(I'd like to retain the metrics, working for both)
The mem tracking becomes the common item, the two implementations using it separately.
Matrix Mem tracking for 2.4 can be a #define, so it literally is not compiled for 3.1+.
We can add a 2nd define for 2.4-3.4 for non-matrix memory.

Then in the memory tracking code, we detect the thread, and call the Nan fn if we're safe, else just change the counters.
Initially, I think it's ok for the Nan fn to only be called by the allocations/deallocations in the main thread; in most scenarios, these will be often enough. We could supply a JS fn to cause a 'reconcile' of memory for people who come across edge cases?

Let me know your thoughts and if you will contribute the next round of code :). I started an implementation, but 'real work' beckons in an 'against the wall' kind of way; so I'll do nothing else until I hear from you.

@dstarke
Copy link
Contributor

dstarke commented Jan 8, 2018

It seems like the work I've been doing to track memory usage by Matrix object is required, at least for 2.x. It also has the advantage of keeping the amount of external memory Node knows about as close as we can to the amount of memory actually being kept alive by our JS objects. That's important because the external memory limit is relatively small, and when it is hit it triggers a full GC.

So, to start with, my plan is to try to finish that (unless you think the custom allocator work could be made to work correctly against all OpenCV versions). I think I've finished with Matrix and BackgroundSubtractor, including an extensive suite of tests to check that I've done it correctly, but there are a few other classes I need to look at, including FaceRecognizer.

The custom allocator approach is clever, and if it can be made to work correctly and accurately, then I think it's a good idea to switch to it eventually, especially because it will be easier to work with and future development is less likely to mess it up. There's still a little bit of thought that needs to go into it, though, because I don't think the current approach is tracking memory accurately. At the very least, it is difficult to test that it works correctly. It potentially tracks a lot more memory than just the memory being kept alive by our JS objects, so it's important that it be as accurate as possible in tracking, to avoid unnecessary GCs. Similar to my original implementation, it sacrifices some accuracy for developer safety.

In addition, if it only works for >3.1, then we still need to do the work for the 2.x Matrix tracking, so the custom allocator adds a lot of work without saving us anything right now.

I would keep the custom allocator code entirely separate from the Matrix memory tracking, though obviously only one can be in effect at a time. If it is working correctly, it should do the right thing without requiring any modifications to Matrix (aside from removing the other memory tracking approach). Since it doesn't actually depend on Matrix at all, it does make sense that it could be moved into a separate file. I don't know if OpenCV.cc is the right place for it. We could create a new file for memory management for >3.1 only and put it all there.

Regarding the thread detection, I'll do some more research there. It doesn't look easy with the current Nan api, but there's some current discussion about this kind of need in the Nan project. There's also a post-GC callback hook where it looks like we could fix up the memory, but I'd really rather avoid the GC than try to fix it up afterwards. Also, if the solution requires some sort of manual reconcile, then there's something else wrong with our approach, because the memory management should be entirely internal to the module.

@dstarke
Copy link
Contributor

dstarke commented Jan 12, 2018

I just updated some additional classes with memory tracking in my branch. I think at this point I've covered everything we need to cover, although some of it I'm not sure how to test.

Should I update PR #602 with these changes? I think the coverage of this version is more complete and as we've discussed, more accurate than the proposal that is currently there.

I have some more investigation to do on the custom allocator strategy, but I think this work is independent of that.

@btsimonh
Copy link
Collaborator Author

give me a few days; I downloaded all those version of ocv, so may as well put them to use. I'll pull the branch down locally and check it over; at least from a perspective of ocv versions, and let you know.

@btsimonh
Copy link
Collaborator Author

Hi David,
could I ask for the calls to Nan::AdjustExternalMemory() to be a #define, e.g. to make it fairly obvious something like NAN_ADJUSTEXTERNALMEMORY().
I'm very keen on the custom allocator approach for 3.1+, and this would make adding the custom allocator PR change very few files.... rather than touchiing every file in this PR.
I too am unsure how to do a complete test. Maybe we consult with Peter on making a separate branch for the moment, and ask for feedback?
br,
s

@dstarke
Copy link
Contributor

dstarke commented Jan 17, 2018

I think I see what you want here, and I'll look at it some more. I'm not sure a global #define is the way to do it, but I can try to centralized the calls to Nan:AdjustExternalMemory() some more.

Since the scope of the memory adjustment is only for Matrix objects (currently), we could create a new static method for Matrix (something like Matrix::AdjustExternalMemory()) and have most of the existing locations use that call instead (including all the places outside of Matrix.cc). That way, when we want to swap in the custom allocator approach, we'd only need to touch a few locations in Matrix.cc to disable it.

@btsimonh
Copy link
Collaborator Author

I was thinking a global define would be most efficient long term; i.e. if disabled, it would compile to nothing?

@dstarke
Copy link
Contributor

dstarke commented Jan 17, 2018

I think the performance difference would be minimal, and to me it seems clearer and more difficult to misuse. However, at the moment, there are only 3 calls to Nan:AdjustExternalMemory() outside of Matrix.cc. Two in FaceRecognizer.cc (1 for reading in a new image into a Matrix, and one for a color conversion), and one in OpenCV.cc (reading in a new image). I left those in because they were tricky to get rid of without restructuring the code, but it is possible with a little bit of work to move all the memory adjustment into Matrix.cc.

With that done, the memory tracking would be entirely an implementation detail of Matrix, and wouldn't need to be exposed in any public way. I think that also allows us to defer the work of turning that behavior on or off with a flag to when we add the custom allocator.

I'll make that adjustment first.

@dstarke
Copy link
Contributor

dstarke commented Jan 17, 2018

Ok, I just checked in changes on my branch that consolidate all the calls to Nan::AdjustExternalMemory() into Matrix.cc. I also consolidated the patterns for working with cv:Mat objects in a Matrix so that fewer places need to explicitly adjust the memory within Matrix.cc by making more places use one of the established patterns that correctly tracks the memory.

The remaining calls are in constructors or destructors, or in a method that explicitly changes the size of a matrix in place (merge, color conversion, image pyramid upscale/downscale) and so doesn't fit neatly into one of the common patterns.

@dstarke
Copy link
Contributor

dstarke commented Jan 17, 2018

At this point, I'd like to update PR #602 with this version of the code. I think it's strictly better than the earlier version I submitted. It should correctly track all the cases that that code does, plus it doesn't double count memory referenced from more than one JS object.

In addition, while testing things, I've found and fixed a number of cases that my original implementation did not handle correctly.

How does that sound?

@peterbraden
Copy link
Owner

I dropped the ball on this, is #602 ready to be merged?

@dstarke
Copy link
Contributor

dstarke commented Apr 3, 2018

I think it is. There are potentially some future enhancements that we've talked about in this thread that could be done separately later, but the current PR is complete and working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants