Resource de/allocation refactor #467

tianjunwork · 2020-02-07T17:47:08Z

Signed-off-by: Jun Tian jun.tian@intel.com
Signed-off-by: Xu Guangxin guangxin.xu@intel.com

Signed-off-by: Jun Tian <jun.tian@intel.com>

Caused by MotionCompensationPredictionContextDctor EB_DELETE(obj->localReferenceBlockL0); Pointer is changed in EbCodingLoop.c EncodePassPreFetchRef and EbInterPrediction.c EncodePassInterPrediction16bit contextPtr->mcpContext->localReferenceBlockL0->bufferY = refPicList0->bufferY + lumaOffSet; Signed-off-by: Jun Tian <jun.tian@intel.com>

It seems to be a heap exception. Use Application verifier to get backtrack of writing to invalid address. Signed-off-by: Jun Tian <jun.tian@intel.com>

Compile warning mem leak of unpack Crash of OpenVisualCloud#457 Signed-off-by: Jun Tian <jun.tian@intel.com>

Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork · 2020-02-08T06:16:48Z

Source/Lib/Codec/EbPictureBufferDesc.c

-    }
-    else {
-        pictureBufferDescPtr->bufferY   = 0;
+        EB_CALLOC_ALIGNED_ARRAY(pictureBufferDescPtr->bufferY, pictureBufferDescPtr->lumaSize * bytesPerPixel);


@hassount, the old size is
pictureBufferDescPtr->lumaSize * bytesPerPixel + (pictureBufferDescPtr->width + 1) * 2 * bytesPerPixel, and it is not aligned mem.
New one keeps sync with AV1, which I thinks makes more sense.
Which should I use for recon buffer?

Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork · 2020-02-13T01:09:03Z

Hi @xuguangxin , hope you are doing well. I wonder if you are interested to review these results :)

tianjunwork · 2020-02-13T01:33:59Z

CI fast test result with 2 failed buffered_test test cases. Log and bitstream seems fine. Not sure why they are marked as fail.

   1 ---------------------------------------------------------
   2 Test Begin...
   3 Total Number of Tests: 2685
   4 Total Passed: 2683
   5 Percentage Passed: 99.9255121043%
   6 Time Elapsed: 7 hour(s), 13 minute(s), 13 second(s)
   7 ---------------------------------------------------------

xuguangxin · 2020-02-13T05:58:37Z

Hi @xuguangxin , hope you are doing well. I wonder if you are interested to review these results :)

sure I can help. But it may be a little slower since it's a large commit. Hope I can finish it before mid of next week.
thanks

tianjunwork · 2020-02-13T18:00:49Z

Hi @xuguangxin , yeah, this PR is to much to review. I was asking if you could review the test result above, the committed memory and valgrind. Did you see similar trend with your PR to AV1?

tianjunwork · 2020-02-13T22:48:24Z

Briefly tested below parameters with different combinations, bitstream has no R2R issue compared with master.
CBR, -nb, -hierarchical-levels, -pred-struct, -sao
VBR R2R issue on master is submitted.

tianjunwork · 2020-02-14T00:03:55Z

Speed testing with default parameters:
First two columns are with this PR, last two are from master tip.
Regression for both, but mainly of over all executing time. Will look into this.

input	time	fps	time	fps
1920x1080	11.831s	184.28	11.184s	189.41
7680x3840	1m33.284s	27.46	1m25.442s	27.22
7680x4320 10bit	2m47.448s	18.22	2m9.507s	17.08

Signed-off-by: Jun Tian <jun.tian@intel.com>

Source/Lib/Codec/EbEncHandle.c

Signed-off-by: Jun Tian <jun.tian@intel.com>

Source/Lib/Codec/EbThreads.c

Source/Lib/Codec/EbObject.h

Source/Lib/Codec/EbMalloc.h

Austin-Hu · 2020-02-24T21:55:52Z

Source/Lib/Codec/EbMalloc.h

+        EB_MALLOC_ARRAY(p2d, width); \
+        EB_MALLOC_ARRAY(p2d[0], width * height); \
+        for (uint32_t w = 1; w < width; w++) { \
+            p2d[w] = p2d[0] + w * height; \


It's a little tricky here, to allocate the big (width * height * sizeof(*p2d[0])) memory, and then assign each element ofp2d pointer array with the address of equal size memory sequentially...
It requires that the virtual memory allocator must have continuous linear space with that size, although allocation failure almost wouldn't happen. Not a good implementation.

This a tread off, if we do not do this, we will have huge malloc times. suppose we have a 1920x1080 it will allocate 2M times.
If we do think this a problem, at least we need to allocate a line for each time.
The malloc times will drop to 2M to 1080.

According to malloc man (section NOTES) and the MM related Q&A, "The Linux kernel uses lazy (on-demand) allocation of physical pages, deferring the allocation until necessary and avoiding allocating physical pages which will never actually be used." Windows should have the similar allocation strategy. malloc wouldn't take much time, except for setting up the corresponding page table. And page fault handler will allocate physical memory and map the pages with virtuall address, once you really need to read, write or map it.

So there wouldn't be big time consuming difference between: 1) malloc memories for each entry of the pointer array; AND 2) as implemented here, malloc all the memory with 1 time, then assign each entry of the pointer array with the equally spaced virtural addresses.

With current EB_MALLOC_2D implementation, free any entry (except for p2d[0]) of the pointer array will fail, when those memories are not needed any more. All of them needs to wait for freeing p2d[0] to be freed.

As I mentioned before, such implementation requires that the allocator must have continuous (no hole) linear space with that "big" size, although the allocation failure almost wouldn't happen. :-)

bind phisical memory is not cost one, How to find the right size for malloc and how to split large memory to meet our requirement will use most of cpu time. Usally, if you do not do any optimization, for a decoder, 2030 cpu time will used by malloc. Encoder may less than this, but it still a noticalbe one.

we never remove any entries, svt alwasy allocate system at start, and free it at end.

it's just virtual address not physical, For 64 bits system, it usally has 48bits address space, whichi is 281TB, so you always can find virtual addreess for our allocation.

Maybe we can do some profile to see the time of multipile malloc vs. one malloc.

1 & 2. I noticed that in user space, multiple mallocs (for the same total size) take more time due to syscalls (brk or mmap) and allocation algorithms done in glibc or kernel (Linux slab) allocator. So in theory, the EB_MALLOC_2D could consume less time, especially when width or height is very big.
3. It can't address the separate free issue, unless there is additional resource management such as refcount. Otherwise EB_MALLOC_2D can only be used in the case that the memory resouces would be freed together.
4. Yes, I was talking about virtual address (space).

Source/Lib/Codec/EbMalloc.h

Source/Lib/Codec/EbMalloc.c

tianjunwork · 2020-02-25T21:53:17Z

From below vtune data, EbInitEncoder(EbPictureBufferDescCtor->memset) causes the perf regression. CPU time of other functions keep about the same.
With this PR:

HEAD:

Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork · 2020-02-26T02:16:59Z

With the fix to the regression, init time is back to and better than original.
Thx @Austin-Hu and @lijing0010.

HEAD:

xuguangxin · 2020-02-26T02:42:18Z

With the fix to the regression, init time is back to and better than original.
Thx @Austin-Hu and @lijing0010.

HEAD:

@tianjunwork
we use EB_CALLOC_ARRAY in SVT AV1 because it will memset the memory after malloc https://github.com/OpenVisualCloud/SVT-AV1/pull/314/files#diff-40daf4c2999b357c7617eeb0d545b007L99
Could we apply your patch to SVT AV1 too?

thanks

tianjunwork · 2020-02-26T05:11:23Z

@xuguangxin I see, thx for the explanation. SVT-HEVC has no memset after EB_ALLIGN_MALLOC on HEAD. I will make a patch to AV1. But need someone who added memset to review in case it is to fix some issue that I don't know?

tianjunwork · 2020-02-26T20:07:26Z

Speed testing with default parameters. -n 1000 x 5 times.
First two columns are with this PR, last two are from master tip.
No perf regression. Total execution time is 4.4% less then master.

input	time	fps	time	fps
7680x4320 10bit	1m10.605s	16.2	1m13.865s	16.26

mem leak is not handled in the calling function itself. But some source code error is fixed, so that any throw will cause EbInitEncoder return EB_ErrorInsufficientResources to the application. Application should terminate encoder in this case. This makes it easier to handle many possible mem leaks because of throw. Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork · 2020-02-26T23:12:35Z

Hi @xuguangxin , on your comments, mem leak is not handled in the calling function itself. But some source code error is fixed, so that any throw will cause EbInitEncoder return EB_ErrorInsufficientResources to the application.
Application should terminate encoder in this case.
This makes it easier to handle many possible mem leaks because of throw.
Tested with manually adding return EB_ErrorInsufficientResources to simulate throw. EbInitEncoder can return EB_ErrorInsufficientResources now with my fix.

tianjunwork · 2020-02-26T23:15:16Z

Hi, @Austin-Hu , I didn't change malloc with EB_MALLOC, using malloc gives the opportunity to free previously allocated buffer in this function. EB_MALLOC will return inside the macro, making it hard to handle mem leak.

tianjunwork · 2020-02-26T23:20:38Z

Source/Lib/Codec/EbSystemResourceManager.c

@@ -427,12 +427,12 @@ static EB_ERRORTYPE EBObjectWrapperCtor(EbObjectWrapper_t* wrapper,
    EB_ERRORTYPE ret;

    wrapper->dctor = EBObjectWrapperDctor;
-    ret = objectCreator(&wrapper->objectPtr, objectInitDataPtr);
-    if (ret != EB_ErrorNone)
-        return ret;
    wrapper->releaseEnable = EB_TRUE;


If objectCreator return error, function returns. objectDestroyer has no chance to get assigned. So crash happens when EBObjectWrapperDctor is called, because it goes to:

else { //hack....

Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork · 2020-03-01T02:14:24Z

Hi All, all the comments are address, please help to review the last round. Thank you again for the help:)

xuguangxin · 2020-03-02T10:19:56Z

thans Jun, for addressed all my comments.

tianjunwork · 2020-03-02T19:00:36Z

thans Jun, for addressed all my comments.

Thank you again for helping on this PR.

tianjunwork added 7 commits February 7, 2020 09:40

refactor resource allocation

5ecf1a1

Signed-off-by: Jun Tian <jun.tian@intel.com>

Fix crash in Dctor

c132e3a

Signed-off-by: Jun Tian <jun.tian@intel.com>

fix mem leak and compile warning

95863cc

Signed-off-by: Jun Tian <jun.tian@intel.com>

clean up and 444 mem leak

fc0ac40

Signed-off-by: Jun Tian <jun.tian@intel.com>

fix 8bit crash at the end of encoding

112d0c8

Signed-off-by: Jun Tian <jun.tian@intel.com>

Fix crash of 10bit with sao enabled

365a30e

It seems to be a heap exception. Use Application verifier to get backtrack of writing to invalid address. Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork added the Clean up A cleaner implementation or improved functionality label Feb 7, 2020

tianjunwork added 2 commits February 7, 2020 11:15

Fixes

8ff140d

Compile warning mem leak of unpack Crash of OpenVisualCloud#457 Signed-off-by: Jun Tian <jun.tian@intel.com>

Fix release and Ubuntu build

234977c

Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork marked this pull request as ready for review February 7, 2020 23:23

tianjunwork requested review from Austin-Hu, hassount and intelmark February 7, 2020 23:24

Fix app mem leak

20f842c

Signed-off-by: Jun Tian <jun.tian@intel.com>

tianjunwork mentioned this pull request Feb 8, 2020

Valgrind reports of memory issues #461

Closed

tianjunwork commented Feb 8, 2020

View reviewed changes

Austin-Hu mentioned this pull request Feb 10, 2020

Fixed the memory map management issue with multi-instance. #446

Closed

Revert back of allocation of recon buffer

16f1409

Signed-off-by: Jun Tian <jun.tian@intel.com>

This comment has been minimized.

Sign in to view

This was referenced Feb 14, 2020

Unpack fixes #476

Closed

SVT-HEVC multi-instance are blocked when calling deinit by multi-thread #357

Closed

Fix jump depends on uninitialised value

bd45628

Signed-off-by: Jun Tian <jun.tian@intel.com>

Austin-Hu reviewed Feb 24, 2020

View reviewed changes

Source/Lib/Codec/EbEncHandle.c Show resolved Hide resolved

Austin-Hu reviewed Feb 24, 2020

View reviewed changes

Source/Lib/Codec/EbEncHandle.c Show resolved Hide resolved

Austin-Hu reviewed Feb 24, 2020

View reviewed changes

Source/Lib/Codec/EbEncHandle.c Show resolved Hide resolved

Austin-Hu reviewed Feb 24, 2020

View reviewed changes

Source/Lib/Codec/EbEncHandle.c Show resolved Hide resolved

Austin-Hu reviewed Feb 24, 2020

View reviewed changes

Source/Lib/Codec/EbEncHandle.c Show resolved Hide resolved

check pointer/handle when destroy

5d247b4

Signed-off-by: Jun Tian <jun.tian@intel.com>

Austin-Hu reviewed Feb 24, 2020

View reviewed changes

tianjunwork added 2 commits February 25, 2020 16:03

Fix regression of unnecessary memset from calloc

a30cdad

Signed-off-by: Jun Tian <jun.tian@intel.com>

Use malloc for recon buffer too

14e9429

Signed-off-by: Jun Tian <jun.tian@intel.com>

This comment has been minimized.

Sign in to view

Austin-Hu approved these changes Feb 26, 2020

View reviewed changes

tianjunwork commented Feb 26, 2020

View reviewed changes

Add back kernel quit logic

ab4849c

Signed-off-by: Jun Tian <jun.tian@intel.com>

xuguangxin approved these changes Mar 2, 2020

View reviewed changes

Austin-Hu approved these changes Mar 2, 2020

View reviewed changes

tianjunwork merged commit 0daec1a into OpenVisualCloud:master Mar 2, 2020

tianjunwork deleted the res_refactor branch March 2, 2020 19:10

tianjunwork mentioned this pull request Mar 5, 2020

Clean up useless function #487

Merged

yurikoles mentioned this pull request Mar 15, 2020

[macOS]Could not allocate enough memory for channel 1 #495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource de/allocation refactor #467

Resource de/allocation refactor #467

tianjunwork commented Feb 7, 2020 •

edited

Loading

tianjunwork Feb 8, 2020

This comment has been minimized.

This comment has been minimized.

tianjunwork commented Feb 13, 2020

tianjunwork commented Feb 13, 2020 •

edited

Loading

xuguangxin commented Feb 13, 2020

tianjunwork commented Feb 13, 2020

tianjunwork commented Feb 13, 2020 •

edited

Loading

tianjunwork commented Feb 14, 2020 •

edited

Loading

Austin-Hu Feb 24, 2020

xuguangxin Feb 26, 2020

Austin-Hu Feb 26, 2020

xuguangxin Mar 2, 2020

Austin-Hu Mar 2, 2020 •

edited

Loading

tianjunwork commented Feb 25, 2020 •

edited

Loading

tianjunwork commented Feb 26, 2020

xuguangxin commented Feb 26, 2020

This comment has been minimized.

tianjunwork commented Feb 26, 2020

tianjunwork commented Feb 26, 2020

tianjunwork commented Feb 26, 2020

tianjunwork commented Feb 26, 2020

tianjunwork Feb 26, 2020 •

edited

Loading

tianjunwork commented Mar 1, 2020 •

edited

Loading

xuguangxin commented Mar 2, 2020

tianjunwork commented Mar 2, 2020

Resource de/allocation refactor #467

Resource de/allocation refactor #467

Conversation

tianjunwork commented Feb 7, 2020 • edited Loading

tianjunwork Feb 8, 2020

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

tianjunwork commented Feb 13, 2020

tianjunwork commented Feb 13, 2020 • edited Loading

xuguangxin commented Feb 13, 2020

tianjunwork commented Feb 13, 2020

tianjunwork commented Feb 13, 2020 • edited Loading

tianjunwork commented Feb 14, 2020 • edited Loading

Austin-Hu Feb 24, 2020

Choose a reason for hiding this comment

xuguangxin Feb 26, 2020

Choose a reason for hiding this comment

Austin-Hu Feb 26, 2020

Choose a reason for hiding this comment

xuguangxin Mar 2, 2020

Choose a reason for hiding this comment

Austin-Hu Mar 2, 2020 • edited Loading

Choose a reason for hiding this comment

tianjunwork commented Feb 25, 2020 • edited Loading

tianjunwork commented Feb 26, 2020

xuguangxin commented Feb 26, 2020

This comment has been minimized.

tianjunwork commented Feb 26, 2020

tianjunwork commented Feb 26, 2020

tianjunwork commented Feb 26, 2020

tianjunwork commented Feb 26, 2020

tianjunwork Feb 26, 2020 • edited Loading

Choose a reason for hiding this comment

tianjunwork commented Mar 1, 2020 • edited Loading

xuguangxin commented Mar 2, 2020

tianjunwork commented Mar 2, 2020

tianjunwork commented Feb 7, 2020 •

edited

Loading

tianjunwork commented Feb 13, 2020 •

edited

Loading

tianjunwork commented Feb 13, 2020 •

edited

Loading

tianjunwork commented Feb 14, 2020 •

edited

Loading

Austin-Hu Mar 2, 2020 •

edited

Loading

tianjunwork commented Feb 25, 2020 •

edited

Loading

tianjunwork Feb 26, 2020 •

edited

Loading

tianjunwork commented Mar 1, 2020 •

edited

Loading