-
Notifications
You must be signed in to change notification settings - Fork 169
Conversation
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
Caused by MotionCompensationPredictionContextDctor EB_DELETE(obj->localReferenceBlockL0); Pointer is changed in EbCodingLoop.c EncodePassPreFetchRef and EbInterPrediction.c EncodePassInterPrediction16bit contextPtr->mcpContext->localReferenceBlockL0->bufferY = refPicList0->bufferY + lumaOffSet; Signed-off-by: Jun Tian <jun.tian@intel.com>
It seems to be a heap exception. Use Application verifier to get backtrack of writing to invalid address. Signed-off-by: Jun Tian <jun.tian@intel.com>
Compile warning mem leak of unpack Crash of OpenVisualCloud#457 Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
} | ||
else { | ||
pictureBufferDescPtr->bufferY = 0; | ||
EB_CALLOC_ALIGNED_ARRAY(pictureBufferDescPtr->bufferY, pictureBufferDescPtr->lumaSize * bytesPerPixel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hassount, the old size is
pictureBufferDescPtr->lumaSize * bytesPerPixel + (pictureBufferDescPtr->width + 1) * 2 * bytesPerPixel, and it is not aligned mem.
New one keeps sync with AV1, which I thinks makes more sense.
Which should I use for recon buffer?
Signed-off-by: Jun Tian <jun.tian@intel.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi @xuguangxin , hope you are doing well. I wonder if you are interested to review these results :) |
CI fast test result with 2 failed buffered_test test cases. Log and bitstream seems fine. Not sure why they are marked as fail.
|
sure I can help. But it may be a little slower since it's a large commit. Hope I can finish it before mid of next week. |
Hi @xuguangxin , yeah, this PR is to much to review. I was asking if you could review the test result above, the committed memory and valgrind. Did you see similar trend with your PR to AV1? |
Briefly tested below parameters with different combinations, bitstream has no R2R issue compared with master. |
Speed testing with default parameters:
|
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
EB_MALLOC_ARRAY(p2d, width); \ | ||
EB_MALLOC_ARRAY(p2d[0], width * height); \ | ||
for (uint32_t w = 1; w < width; w++) { \ | ||
p2d[w] = p2d[0] + w * height; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little tricky here, to allocate the big (width * height * sizeof(*p2d[0])) memory, and then assign each element ofp2d pointer array with the address of equal size memory sequentially...
It requires that the virtual memory allocator must have continuous linear space with that size, although allocation failure almost wouldn't happen. Not a good implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a tread off, if we do not do this, we will have huge malloc times. suppose we have a 1920x1080 it will allocate 2M times.
If we do think this a problem, at least we need to allocate a line for each time.
The malloc times will drop to 2M to 1080.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- According to malloc man (section NOTES) and the MM related Q&A, "The Linux kernel uses lazy (on-demand) allocation of physical pages, deferring the allocation until necessary and avoiding allocating physical pages which will never actually be used." Windows should have the similar allocation strategy. malloc wouldn't take much time, except for setting up the corresponding page table. And page fault handler will allocate physical memory and map the pages with virtuall address, once you really need to read, write or map it.
- So there wouldn't be big time consuming difference between: 1) malloc memories for each entry of the pointer array; AND 2) as implemented here, malloc all the memory with 1 time, then assign each entry of the pointer array with the equally spaced virtural addresses.
- With current EB_MALLOC_2D implementation, free any entry (except for p2d[0]) of the pointer array will fail, when those memories are not needed any more. All of them needs to wait for freeing p2d[0] to be freed.
- As I mentioned before, such implementation requires that the allocator must have continuous (no hole) linear space with that "big" size, although the allocation failure almost wouldn't happen. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- bind phisical memory is not cost one, How to find the right size for malloc and how to split large memory to meet our requirement will use most of cpu time. Usally, if you do not do any optimization, for a decoder,
2030 cpu time will used by malloc. Encoder may less than this, but it still a noticalbe one. - we never remove any entries, svt alwasy allocate system at start, and free it at end.
- it's just virtual address not physical, For 64 bits system, it usally has 48bits address space, whichi is 281TB, so you always can find virtual addreess for our allocation.
Maybe we can do some profile to see the time of multipile malloc vs. one malloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 & 2. I noticed that in user space, multiple mallocs (for the same total size) take more time due to syscalls (brk or mmap) and allocation algorithms done in glibc or kernel (Linux slab) allocator. So in theory, the EB_MALLOC_2D could consume less time, especially when width or height is very big.
3. It can't address the separate free issue, unless there is additional resource management such as refcount. Otherwise EB_MALLOC_2D can only be used in the case that the memory resouces would be freed together.
4. Yes, I was talking about virtual address (space).
Signed-off-by: Jun Tian <jun.tian@intel.com>
Signed-off-by: Jun Tian <jun.tian@intel.com>
With the fix to the regression, init time is back to and better than original. |
@tianjunwork thanks |
This comment has been minimized.
This comment has been minimized.
@xuguangxin I see, thx for the explanation. SVT-HEVC has no memset after EB_ALLIGN_MALLOC on HEAD. I will make a patch to AV1. But need someone who added memset to review in case it is to fix some issue that I don't know? |
Speed testing with default parameters. -n 1000 x 5 times.
|
mem leak is not handled in the calling function itself. But some source code error is fixed, so that any throw will cause EbInitEncoder return EB_ErrorInsufficientResources to the application. Application should terminate encoder in this case. This makes it easier to handle many possible mem leaks because of throw. Signed-off-by: Jun Tian <jun.tian@intel.com>
Hi @xuguangxin , on your comments, mem leak is not handled in the calling function itself. But some source code error is fixed, so that any throw will cause EbInitEncoder return EB_ErrorInsufficientResources to the application. |
Hi, @Austin-Hu , I didn't change |
@@ -427,12 +427,12 @@ static EB_ERRORTYPE EBObjectWrapperCtor(EbObjectWrapper_t* wrapper, | |||
EB_ERRORTYPE ret; | |||
|
|||
wrapper->dctor = EBObjectWrapperDctor; | |||
ret = objectCreator(&wrapper->objectPtr, objectInitDataPtr); | |||
if (ret != EB_ErrorNone) | |||
return ret; | |||
wrapper->releaseEnable = EB_TRUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If objectCreator return error, function returns. objectDestroyer has no chance to get assigned. So crash happens when EBObjectWrapperDctor
is called, because it goes to:
else {
//hack....
Signed-off-by: Jun Tian <jun.tian@intel.com>
Hi All, all the comments are address, please help to review the last round. Thank you again for the help:) |
thans Jun, for addressed all my comments. |
Thank you again for helping on this PR. |
Signed-off-by: Jun Tian jun.tian@intel.com
Signed-off-by: Xu Guangxin guangxin.xu@intel.com