fixed so that batch norm layer does not convert tensors to FP16 #32

paveltc · 2019-01-25T00:56:46Z

Fixed bug caused by using fp16 tensors with batch norm.

* changing buffers to reflect right values * getting correct pointers for ROI * buffer values for multiple plane images * fixes 12/39 swapImageHandle cases * fixes all swapImageHandle errors

* OpenVX 1.3 Headers Added * Temp Changes to build MIVisionX with OpenVX 1.3 Headers * VX NN 1.3 Port * Lk/openvx port 1.3 (#2) * changes for object-array * threshold functions & adding objectarray to ago_util * remap functions * advanced array functions * vxCreateMatrixFromPatternAndOrigin * setImagePixelValues * OpenVX 1.3 Conformance build * createvirtualscalar * vxCreateVirtualConvolution * syntax fix * vxCreateVirtiaulDistribution, scalar syntax fix * vxCreateVirtiaulDistribution, scalar syntax fix * vxCreateVirtiaulMatrix * vxWeightedAverageNode vxuWeightedAverage * vxNonLinearFilterNode, vxuNonLinearFilter * vxLaplacianPyramidNode, vxuLaplacianPyramid * vxLaplacianReconstructNode, vxuLaplacianReconstruct * type fix * latest changes (#5) * createvirtualscalar * vxCreateVirtualConvolution * syntax fix * vxCreateVirtiaulDistribution, scalar syntax fix * vxCreateVirtiaulDistribution, scalar syntax fix * vxCreateVirtiaulMatrix * vxWeightedAverageNode vxuWeightedAverage * vxNonLinearFilterNode, vxuNonLinearFilter * vxLaplacianPyramidNode, vxuLaplacianPyramid * vxLaplacianReconstructNode, vxuLaplacianReconstruct * type fix * fixes vxGetUserStructNameByEnm and EnumByName * fixes vxCreateObjectArray and vxCreateVirtualObjectArray * conformance test nodes * bug ifx * declaration added * threshold functions * fixes matrix functions * fixes vxSetImagePixelValues * fixes remap functions * threshold typo fix * changed all of objectarray to replicate delay * threshold kernels * weighted average invalid format fix * fixes vxCreateVirtualScalar * fixes vxCReateVirtualConvolution * fixes vxCopyLUT & vxMapLUT * fixing * weightedaverage passed * fix comment * graph/context/refernce+ fixes * target base passes all - vx_context * waitGraph and verifyGraphBase * undo verifyGraphBase * nonlinearfilter passing * latest chanegs * Update CMakeLists.txt * Update ago_platform.h * Update vx_api.cpp * moving #def from VX/include to vx_ext_amd.h * moving #def from VX/include to vx_ext_amd.h * adjusting spaces Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * Lk/port1.3 fix new (#6) * createvirtualscalar * vxCreateVirtualConvolution * syntax fix * vxCreateVirtiaulDistribution, scalar syntax fix * vxCreateVirtiaulDistribution, scalar syntax fix * vxCreateVirtiaulMatrix * vxWeightedAverageNode vxuWeightedAverage * vxNonLinearFilterNode, vxuNonLinearFilter * vxLaplacianPyramidNode, vxuLaplacianPyramid * vxLaplacianReconstructNode, vxuLaplacianReconstruct * type fix * fixes vxGetUserStructNameByEnm and EnumByName * fixes vxCreateObjectArray and vxCreateVirtualObjectArray * conformance test nodes * bug ifx * declaration added * threshold functions * fixes matrix functions * fixes vxSetImagePixelValues * fixes remap functions * threshold typo fix * changed all of objectarray to replicate delay * threshold kernels * weighted average invalid format fix * fixes vxCreateVirtualScalar * fixes vxCReateVirtualConvolution * fixes vxCopyLUT & vxMapLUT * fixing * weightedaverage passed * fix comment * graph/context/refernce+ fixes * target base passes all - vx_context * waitGraph and verifyGraphBase * undo verifyGraphBase * nonlinearfilter passing * latest chanegs * Update CMakeLists.txt * Update ago_platform.h * Update vx_api.cpp * moving #def from VX/include to vx_ext_amd.h * moving #def from VX/include to vx_ext_amd.h * adjusting spaces * moving things from openvx/include to ext_amd * laplacian update * disable opencl * Update CMakeLists.txt * Update ago_interface.cpp * Update vx_api.cpp * LaplacianReconstruct pass * bug fix * bracket fix * remove debug comments * Update ago_haf_cpu_generic_functions.cpp Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * changes from Hansel and threshold changes (#7) * remove debug comments * Update ago_haf_cpu_generic_functions.cpp * changes to threshold Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * OpenVX 1.3 port - Laplacian Pyramid (#8) * Laplacian pyramid fix (#9) * LaplacianPyramid Pass * CMake fix * Revert change Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * OpenVX Port - fix build failures (#10) * Revert change * opencl changes to threshold * threshold with C code Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * mean std deviation fix (#11) * fixes meanstddev Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * user node 20/74 tests passes (#12) * usernode 20 out of 74 tests pass * removing printf statements * spacing fix Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * USerNode: All tests pass (#13) * spacing fix * adding verification path to fix user node * fixes all tests of the user node Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * SwapImageHandle roi=false cases passes (#14) * fixes all tests of userNode * swapImageHandle - roi=false cases Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * Fixes ReplicateNode-ObjectArray (#17) * swapImageHandle - roi=false cases * fixes replicateNode object array Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * fixes LUT type=S16 failures (#18) * fixes LUT failures Co-authored-by: Hansel Yang <hanselyang123@gmail.com> * H/openvx 1.3 port new (#19) * wait graph fixed * deadlock debug * deadlock fixed * cmake fix * comments added Co-authored-by: LakshmiKumar23 <kumar.lakshmi1994@gmail.com> * OpenVX 1.3 Fix * weighted average validation fix * graphstate fixed * check fix * bug fix * SmokeTestBase.vxReleaseReferenceBase fix (#21) * SmokeTestBase.vxReleaseReferenceBase fix * SmokeTestBase.vxSetReferenceName fix * Graph State - Fix & code cleanup (#22) * Graph Tests - Fix (#23) * code cleanup * user node fixes * graph state fixes * smoketestbase - vxRetainReferenceBase fix (#24) * smoketestbase-vxRetainReferenceBase fix * histogram - Fix CTS Errors(#25) * debugging histogram * histogram fix * code cleanup * Resource Release Fix * OpenVX 1.3 - half scale gaussian (#27) * halfscalegaussian fix * code cleanup * OpenVX 1.3 - smokeTest.vxRetainReference & smokeTestBase.vxSetReferenceName fix (#26) * vxRetainRef line 371 fix * vxSetReferenceName fix * pyramid fixes * smoke test fixes for dangling references * resolves all dangling reference issues * vxUnloadKernels Fix (#28) * CTS - graph delay with pyramid fix (#29) * Canny - CTS bug fix (#30) * divide by 4 when grad size 7 * canny fix * merge * canny fix * code cleanup * Run OpenVX 1.3 CPU Conformance * Travis Fix - OpenVX 1.3 * Travis Cleanup * OpenVX 1.3 - Harris corner CTS Fix (#31) * CMake * harris fix * code cleanup * swapImageHandle (#32) * changing buffers to reflect right values * getting correct pointers for ROI * buffer values for multiple plane images * fixes 12/39 swapImageHandle cases * fixes all swapImageHandle errors * fixes color convert errors (#33) * fixes conversion from RGBX to NV12&IYUV * fixes conversion from UYVY to NV12&IYUV * fixes conversion from YUYV to NV12&IYUV * fixes converion from NV12,NV21&IYUV to RGBX * fixes conversion from NV12,NV21&IYUV to RGB * warp affine - fix (#34) * warp affine fix * code cleanup * Travis - Trace Error * Travis - Check VGA * Travis Updates * Fix - variable scope * CXX Flags & OpenVX Version Update * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * Threshold - Update to 1.3 * Jenkins - Check Build & Artifacts * Tests - Fix platform name * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * GPU Flow - fixes LUT for data type s16 (#40) * GPU Flow - channel combine (#41) * channel combine fix * merge fix Co-authored-by: LakshmiKumar23 <kumar.lakshmi1994@gmail.com> Co-authored-by: Hansel Yang <hanselyang123@gmail.com> Co-authored-by: Kiriti Gowda <kiriti@Kiritis-MacBook-Pro.local> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@Kiritis-MacBook-Pro.local> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com>

* Add support for agoKernel_HarrisScore_HVC_HG3_5x5 * Add initial support for integral image * Histogram Node Test Suite Changes & some minor changes in test suite * Minor Changes for Color Convert Cases 128 & 129 * Add test case for integral, fix 7x3 case * Fix integral image * Merge branch 'hip-porting' of https://github.com/MCW-Dev/MIVISION into hip-porting * fixes magnitude kernel * fixes case 128&129 * fixes 119&120 * pass a stream for launching hip arithmetic kernels * fix some formatting issues * remove hip events from some vision kernels * use hipStreamSynchronize and cpu wall time to measure time/wait for launching/completion of hip kernels * Add basic profiling scripts * threshold verifyGraph fix * Add stream parameter to all Hip Kernels Fix formatting issues * Minor Magnitude and Color convert kernel fix * minor modification of runVisionTests script to group nodes for better comparison * Fix Threshold U1 HIP kernel and test suite * Fix - variable scope * Automate OCL/HIP rocprof with runvx * Minor Changes * Add host test cases for Dilate and Erode * Add profiling option param to script * Optimize scale, warpaffine, warpperspective, lut * Optimize filters - sobel, median, erode, dilate, box * cherry-pick "Build Fix - Release/Debug (#423)" from MIVisionX/master branch * Release/Debug Build Fix * CMakeList.txt cleanup * Readme Updates * cmake clean up for hip * CXX Flags & OpenVX Version Update * Add support for HarrisScore_HVC_HG3_7x7 * Add lut and convolve memory support in HIP * optimize float4_to_s16s function for arithmetic kernels - use vector data type for writting to oa buffer for better performance compared to pixel by pixel write * use make_short4 * optimize s16s_to_float4_ungrouped function to use vector read for s16 data type * Optimized Color Convert kernels * Modifiied LUT kernel * Modifiied LUT kernel * update node names in VisionTests script * optimize ColorDepth kernels * Add new coding style for arithmetic/logical/color hip kernels * Merge pull request #32 from asalmanp/as/hip_kernels_style Add new coding style for arithmetic/logical/color hip kernels * Add auto OCL dump generator script * Add gdfs for arithmetic, logical, color kernels * Modify arithmetic kernels as per new std * Add the missing buffer_offset to the hip_memory * Arithmetic kernels fixes * Modify logical kernels as per new std * Revert to previous min max impl * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * add the optimized ChannelExtract_U8_U32_Pos0 and ChannelExtract_U8_U24_Pos0 color kernels * Threshold - Update to 1.3 * Add new gdfs and modify generator script * Jenkins - Check Build & Artifacts * Tests - Fix platform name * Modify generator script for ocl/hip dumps and fixes for gdfs * Add optimized box filter * Modify kernelGDFs, automate script for OCL/HIP bin dumps for different image sizes * Optimize phase, magnitude, weighted average and remove trailing spaces * Optimize magnitude, phase, weighted_average, Minor fix * Formatting fixes * Formatting changes * modify hip pack_ function to fix SAT issue in some kernels * Place kernelGDFs in independent folders * Fix runvxTestAllScript, readme and Modify gitignore * Revert "Optimize phase, magnitude, weighted average and remove trailing spaces" This reverts commit ae97d35. * Move all common types/device codes into a new header * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * optimize hip_clamp function * Partial changes to color kernels * Optimize color kernels * Cleanup * Change typecast float to make_float4() * Add UYVY/YUYV options for ChannelExtract * Modify globalThreads_x and globalThreads_y * Kernel GDF modifications * Script enhancements - add support for single kernel testing, optional build * Edit script readme * minor optimization for Phase kernel * fix comment * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * Add filter kernel GDFs * Add test script support for filter kernel diff checks * Optimizations for filter kernels - initial commit * Optimize ScaleGaussianHalf, other minor fixes * Correct some test names in runVisionTests script * Disable ScaleGaussianHalf temporarily * Optimize Median3_/min3_/max3_ * Fix convolotion issue for hip * fix seg fault for ScaleGaussian * Add support for channelCopy and Lut * Minor change * Optimize statistical kernels * Optimize UV12/UV/IUV and ScaleUp2x2 * Minor change * Add kernelGDFs for IUV/UV12/UV converts, threshold, convolve * Update runVisionTests.py and runvxTestAllScript.sh to run with arithmetic/logical/color/filter/statistical kernels * Add uniform-image inputs with hex pixel values * Remove all U1 kernel testing * Test script mods * Uncomment all kernels except geometric/vision * Minor fix * Optimize geometric kernels - initial commit * Minor changes * Mods to use floorf, mul24, mad24, Scale_U8_U8_Area * ScaleImage_U8_U8_Area fixes and Remap initial commit * Remove #defines for remap * Pass hip_memory for remap * Enable scale, warpAffine, warpPerspective testing * Add kernelGDFs for geometric functions, runvxTestAllScript.sh update * Fix the bug for ScaleImage_Bilinear_Constant and ScaleImage_Bilinear_Replicate * GDF and test script corrections * Disable kernels with attr * Disable UV12/UV/IUV converts and ScaleUp2x2 * Add vision kernelGDFs * Vision kernels - initial commit * Modify helpers to use hip built in functions * Remove code used for testing * Minor changes * use consistent device function names and code clean up * remove extra semicolon * switch to builtin functions for hip_lerp * Formatting fixes * minor cmake change to print HIP path/version correctly * Modify harris corners * Test script mod * cmake file changes for building GPU backends and CPU properly * code clean up to make it more readable that there will be a fatal error if OPENCL or HIP not found in the case of the default GPU_SUPPORT=ON * Remove samples/hip_samples, Add openvx_runvx_tests * Enhance runvxTestAllScript, Change ReadMe * Formatting fixes, Code cleanup * Rename openvx_runvx_tests to openvx_node_tests * fix a seg fault for Canny node * remove unused parameter from CannySuppThreshold * Delete vision_tests outer folder * Enhancements to runVisionTests.py * Remove blank lines * Vision kernel mods * Formatting fix * Codacy fixes 1 * Codacy fixes 2 * Codacy fixes 3 * fix cmake * Make pandas optional * Code cleanup * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Add backend_type OCL Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: kiritigowda <kiritigowda@gmail.com> Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@Kiritis-MacBook-Pro.local> Co-authored-by: LakshmiKumar23 <kumar.lakshmi1994@gmail.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: fiona-gladwin <fionagladwin@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> Co-authored-by: rrawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: Ulagammai <ulagammai@multicorewareinc.com> Co-authored-by: Ulagammai <--local> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com>

* Optimize scale, warpaffine, warpperspective, lut * Optimize filters - sobel, median, erode, dilate, box * cherry-pick "Build Fix - Release/Debug (#423)" from MIVisionX/master branch * Release/Debug Build Fix * CMakeList.txt cleanup * Readme Updates * cmake clean up for hip * CXX Flags & OpenVX Version Update * Add support for HarrisScore_HVC_HG3_7x7 * Add lut and convolve memory support in HIP * optimize float4_to_s16s function for arithmetic kernels - use vector data type for writting to oa buffer for better performance compared to pixel by pixel write * use make_short4 * optimize s16s_to_float4_ungrouped function to use vector read for s16 data type * Optimized Color Convert kernels * Modifiied LUT kernel * Modifiied LUT kernel * update node names in VisionTests script * optimize ColorDepth kernels * Add new coding style for arithmetic/logical/color hip kernels * Merge pull request #32 from asalmanp/as/hip_kernels_style Add new coding style for arithmetic/logical/color hip kernels * Add auto OCL dump generator script * Add gdfs for arithmetic, logical, color kernels * Modify arithmetic kernels as per new std * Add the missing buffer_offset to the hip_memory * Arithmetic kernels fixes * Modify logical kernels as per new std * Revert to previous min max impl * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * add the optimized ChannelExtract_U8_U32_Pos0 and ChannelExtract_U8_U24_Pos0 color kernels * Threshold - Update to 1.3 * Add new gdfs and modify generator script * Jenkins - Check Build & Artifacts * Tests - Fix platform name * Modify generator script for ocl/hip dumps and fixes for gdfs * Add optimized box filter * Modify kernelGDFs, automate script for OCL/HIP bin dumps for different image sizes * Optimize phase, magnitude, weighted average and remove trailing spaces * Optimize magnitude, phase, weighted_average, Minor fix * Formatting fixes * Formatting changes * modify hip pack_ function to fix SAT issue in some kernels * Place kernelGDFs in independent folders * Fix runvxTestAllScript, readme and Modify gitignore * Revert "Optimize phase, magnitude, weighted average and remove trailing spaces" This reverts commit ae97d35. * Move all common types/device codes into a new header * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * optimize hip_clamp function * Partial changes to color kernels * Optimize color kernels * Cleanup * Change typecast float to make_float4() * Add UYVY/YUYV options for ChannelExtract * Modify globalThreads_x and globalThreads_y * Kernel GDF modifications * Script enhancements - add support for single kernel testing, optional build * Edit script readme * minor optimization for Phase kernel * fix comment * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * Add filter kernel GDFs * Add test script support for filter kernel diff checks * Optimizations for filter kernels - initial commit * Optimize ScaleGaussianHalf, other minor fixes * Correct some test names in runVisionTests script * Disable ScaleGaussianHalf temporarily * Optimize Median3_/min3_/max3_ * Fix convolotion issue for hip * fix seg fault for ScaleGaussian * Add support for channelCopy and Lut * Minor change * Optimize statistical kernels * Optimize UV12/UV/IUV and ScaleUp2x2 * Minor change * Add kernelGDFs for IUV/UV12/UV converts, threshold, convolve * Update runVisionTests.py and runvxTestAllScript.sh to run with arithmetic/logical/color/filter/statistical kernels * Add uniform-image inputs with hex pixel values * Remove all U1 kernel testing * Test script mods * Uncomment all kernels except geometric/vision * Minor fix * Optimize geometric kernels - initial commit * Minor changes * Mods to use floorf, mul24, mad24, Scale_U8_U8_Area * ScaleImage_U8_U8_Area fixes and Remap initial commit * Remove #defines for remap * Pass hip_memory for remap * Enable scale, warpAffine, warpPerspective testing * Add kernelGDFs for geometric functions, runvxTestAllScript.sh update * Fix the bug for ScaleImage_Bilinear_Constant and ScaleImage_Bilinear_Replicate * GDF and test script corrections * Disable kernels with attr * Disable UV12/UV/IUV converts and ScaleUp2x2 * Add vision kernelGDFs * Vision kernels - initial commit * Modify helpers to use hip built in functions * Remove code used for testing * Minor changes * use consistent device function names and code clean up * remove extra semicolon * switch to builtin functions for hip_lerp * Formatting fixes * minor cmake change to print HIP path/version correctly * Modify harris corners * Test script mod * cmake file changes for building GPU backends and CPU properly * code clean up to make it more readable that there will be a fatal error if OPENCL or HIP not found in the case of the default GPU_SUPPORT=ON * Remove samples/hip_samples, Add openvx_runvx_tests * Enhance runvxTestAllScript, Change ReadMe * Formatting fixes, Code cleanup * Rename openvx_runvx_tests to openvx_node_tests * fix a seg fault for Canny node * remove unused parameter from CannySuppThreshold * Delete vision_tests outer folder * Enhancements to runVisionTests.py * Remove blank lines * Vision kernel mods * Formatting fix * Codacy fixes 1 * Codacy fixes 2 * Codacy fixes 3 * fix cmake * Make pandas optional * Code cleanup * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Add backend_type OCL * Fix CMake issues for HIP backend build. Fix issues caused by merge. * Add support for HIP backend. * add support for VX_DIRECTIVE_AMD_COPY_TO_HIPMEM * Add HIP backend support for Resize crop function. Modify unittest to save all images in local folder (test HIP support). * Fix minor issues in HIP backend. * Fix rocAL Pybind build issue. Update rocAL README.md for TurboJpeg installation. * Fix brightness updation issue. Set random seed in paramter factory constructor. * Fix issue with CMake to work for OCL and HIP backend. * Fix requested deviceID not found error. * Fix issue with HIP load routine. * Rename rali to rocAL. * Fix merge issues. * Fix build issue for rocAL pybind module. (cherry picked from commit 0e1a43a) * Add prefetching support in RALI pipeline. (cherry picked from commit 0d5cf66) * Fix build warnings. (cherry picked from commit b063ca6) * Fix warnings. * Clean up. * Fix merge issues. * Made suggested PR changes. * Fix build error. * set correct affinity in amd_rpp * Add CMake changes and fix codacy warnings. * Fix core dump issue in rali unittest. * Fix build issue. * cmake cleanup * fix for review comments and unit_test change * fix build error for OpenCL backend Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: Ulagammai <ulagammai@multicorewareinc.com> Co-authored-by: fiona-gladwin <fionagladwin@multicorewareinc.com> Co-authored-by: Ulagammai <--local> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com>

* optimize ColorDepth kernels * Add new coding style for arithmetic/logical/color hip kernels * Merge pull request #32 from asalmanp/as/hip_kernels_style Add new coding style for arithmetic/logical/color hip kernels * Add auto OCL dump generator script * Add gdfs for arithmetic, logical, color kernels * Modify arithmetic kernels as per new std * Add the missing buffer_offset to the hip_memory * Arithmetic kernels fixes * Modify logical kernels as per new std * Revert to previous min max impl * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * add the optimized ChannelExtract_U8_U32_Pos0 and ChannelExtract_U8_U24_Pos0 color kernels * Threshold - Update to 1.3 * Add new gdfs and modify generator script * Jenkins - Check Build & Artifacts * Tests - Fix platform name * Modify generator script for ocl/hip dumps and fixes for gdfs * Add optimized box filter * Modify kernelGDFs, automate script for OCL/HIP bin dumps for different image sizes * Optimize phase, magnitude, weighted average and remove trailing spaces * Optimize magnitude, phase, weighted_average, Minor fix * Formatting fixes * Formatting changes * modify hip pack_ function to fix SAT issue in some kernels * Place kernelGDFs in independent folders * Fix runvxTestAllScript, readme and Modify gitignore * Revert "Optimize phase, magnitude, weighted average and remove trailing spaces" This reverts commit ae97d35. * Move all common types/device codes into a new header * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * optimize hip_clamp function * Partial changes to color kernels * Optimize color kernels * Cleanup * Change typecast float to make_float4() * Add UYVY/YUYV options for ChannelExtract * Modify globalThreads_x and globalThreads_y * Kernel GDF modifications * Script enhancements - add support for single kernel testing, optional build * Edit script readme * minor optimization for Phase kernel * fix comment * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * Add filter kernel GDFs * Add test script support for filter kernel diff checks * Optimizations for filter kernels - initial commit * Optimize ScaleGaussianHalf, other minor fixes * Correct some test names in runVisionTests script * Disable ScaleGaussianHalf temporarily * Optimize Median3_/min3_/max3_ * Fix convolotion issue for hip * fix seg fault for ScaleGaussian * Add support for channelCopy and Lut * Minor change * Optimize statistical kernels * Optimize UV12/UV/IUV and ScaleUp2x2 * Minor change * Add kernelGDFs for IUV/UV12/UV converts, threshold, convolve * Update runVisionTests.py and runvxTestAllScript.sh to run with arithmetic/logical/color/filter/statistical kernels * Add uniform-image inputs with hex pixel values * Remove all U1 kernel testing * Test script mods * Uncomment all kernels except geometric/vision * Minor fix * Optimize geometric kernels - initial commit * Minor changes * Mods to use floorf, mul24, mad24, Scale_U8_U8_Area * ScaleImage_U8_U8_Area fixes and Remap initial commit * Remove #defines for remap * Pass hip_memory for remap * Enable scale, warpAffine, warpPerspective testing * Add kernelGDFs for geometric functions, runvxTestAllScript.sh update * Fix the bug for ScaleImage_Bilinear_Constant and ScaleImage_Bilinear_Replicate * GDF and test script corrections * Disable kernels with attr * Disable UV12/UV/IUV converts and ScaleUp2x2 * Add vision kernelGDFs * Vision kernels - initial commit * Modify helpers to use hip built in functions * Remove code used for testing * Minor changes * use consistent device function names and code clean up * remove extra semicolon * switch to builtin functions for hip_lerp * Formatting fixes * minor cmake change to print HIP path/version correctly * Modify harris corners * Test script mod * cmake file changes for building GPU backends and CPU properly * code clean up to make it more readable that there will be a fatal error if OPENCL or HIP not found in the case of the default GPU_SUPPORT=ON * Remove samples/hip_samples, Add openvx_runvx_tests * Enhance runvxTestAllScript, Change ReadMe * Formatting fixes, Code cleanup * Rename openvx_runvx_tests to openvx_node_tests * fix a seg fault for Canny node * remove unused parameter from CannySuppThreshold * Delete vision_tests outer folder * Enhancements to runVisionTests.py * Remove blank lines * Vision kernel mods * Formatting fix * Codacy fixes 1 * Codacy fixes 2 * Codacy fixes 3 * fix cmake * Make pandas optional * Code cleanup * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Add backend_type OCL * Fix CMake issues for HIP backend build. Fix issues caused by merge. * Add support for HIP backend. * add support for VX_DIRECTIVE_AMD_COPY_TO_HIPMEM * Add HIP backend support for Resize crop function. Modify unittest to save all images in local folder (test HIP support). * Fix minor issues in HIP backend. * Fix rocAL Pybind build issue. Update rocAL README.md for TurboJpeg installation. * Fix brightness updation issue. Set random seed in paramter factory constructor. * Fix issue with CMake to work for OCL and HIP backend. * Fix requested deviceID not found error. * Fix issue with HIP load routine. * Rename rali to rocAL. * Fix merge issues. * Fix build issue for rocAL pybind module. (cherry picked from commit 0e1a43a) * Add prefetching support in RALI pipeline. (cherry picked from commit 0d5cf66) * Fix build warnings. (cherry picked from commit b063ca6) * Fix warnings. * Clean up. * Fix merge issues. * Made suggested PR changes. * Fix build error. * Added HIP functionality to AbsoluteDifference * added HIP support for some functions * Added HIP support for another batch of functions * Add HIP supprt for last batch of functions * Set correct affinity to the below amd_rpp nodes. 1. AbsoluteDifference 2. AccumulateSquared 3. AccumulateWeighted 4. Accumulate 5. Add * Set correct affinity to the below amd_rpp nodes. 1. BilateralFilter 2. BitwiseAND 3. BitwiseNOT 4. Blend 5. Blur 6. BoxFilter 7. Brightness * Set correct affinity to the below amd_rpp nodes. 1. CannyEdgeDetector. 2. ChannelCombine. 3. ChannelExtract. 4. ColorTemperature. 5. ColorTwist. 6. Contrast. 7. ControlFlow. 8. CropMirrorNormalize. 9. Crop. 10. CustomConvolution. * Set correct affinity to the below amd_rpp nodes. 1. DataObjectCopy. 2. Dilate. 3. Erode. 4. ExclusiveOR. 5. Exposure. * Set correct affinity to the below amd_rpp nodes. 1. FastCornerDetector. 2. Fisheye. 3. Flip. 4. Fog. 5. GammaCorrection. 6. GaussianFilter. 7. GaussianImagePyramid. * Set correct affinity to the below amd_rpp nodes. 1. HarrisCornerDetector 2. Histogram 3. HistogramBalance 4. Hue 5. WarpPerspective * Set correct affinity to the below amd_rpp nodes. 1. InclusiveOR 2. Jitter 3. LaplacianImagePyramid 4. LensCorrection 5. LocalBinaryPattern 6. LookUpTable * Set correct affinity to the below amd_rpp nodes. 1. Magnitude 2. Max 3. MeanStddev 4. MedianFilter 5. MinMaxLoc 6. Min 7. Multiply * Set correct affinity to the below amd_rpp nodes. 1. Noise 2. NonLinearFilter 3. NonMaxSupression 4. nop 5. Occlusion 6. Phase 7. Pixelate * Set correct affinity to the below amd_rpp nodes. 1. Rain 2. RandomCropLetterBox 3. RandomShadow 4. Remap 5. ResizeCropMirror 6. ResizeCrop 7. Rotate * Set correct affinity to the below amd_rpp nodes. 1. Saturation 2. Scale 3. Snow 4. Sobel 5. Subtract 6. TensorAdd * Set correct affinity to the below amd_rpp nodes. 1. TensorLookup 2. TensorMatrixMultiply 3. TensorMultiply 4. TensorSubtract 5. Thresholding 6. Vignette 7. WarpAffine * Clean up by reducing the variants from 4 -> 1 in amd_rpp. 1. Retain only batchPD variant and delete all the single, batchPS and batchPDROID variants. 2. Remove the support in header and other files. * Set affinity to CPU for OCL backend for all nodes in amd_rpp to run without codegen. * Fix issue with rocAL pybind installation. * Fix indendation issue with nodes in amd_rpp. * Add HIP backend support for single nodes in amd_rpp * Code clean up for amd_rpp nodes. 1. Move memory allocations to initialize function. 2. Add calls to free up memory in uninitialze function. 3. Remove unused declarations. 4. Move batchsize querying to initialize. * Error handling in amd_rpp nodes. Add return error status for functions which do not have GPU support in RPP. * Fix formatting for all amd_rpp nodes. * Fix codacy issue. Change copy_status to STATUS_ERROR_CHECK. Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com>

* rocal_pybind - fix package link error (#564) * rocAL_pybind - CMakeList Add Final Install Path (#567) * MIVisionX Backends - Expanded Support (#569) * Backend Support - AMD EXT Expanded Support * MIVisionX Backend - Cleanup * rocAL CMakeList - Fix MSG * CPU Backend - Fix CMakeList * AMD RPP - Warning MSG for CPU Backend * rocAL Pybind - Fix Link (#574) * VX_NN - adding validate support (#570) * rocAL - MCW changes (#562) * optimize ColorDepth kernels * Add new coding style for arithmetic/logical/color hip kernels * Merge pull request #32 from asalmanp/as/hip_kernels_style Add new coding style for arithmetic/logical/color hip kernels * Add auto OCL dump generator script * Add gdfs for arithmetic, logical, color kernels * Modify arithmetic kernels as per new std * Add the missing buffer_offset to the hip_memory * Arithmetic kernels fixes * Modify logical kernels as per new std * Revert to previous min max impl * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * add the optimized ChannelExtract_U8_U32_Pos0 and ChannelExtract_U8_U24_Pos0 color kernels * Threshold - Update to 1.3 * Add new gdfs and modify generator script * Jenkins - Check Build & Artifacts * Tests - Fix platform name * Modify generator script for ocl/hip dumps and fixes for gdfs * Add optimized box filter * Modify kernelGDFs, automate script for OCL/HIP bin dumps for different image sizes * Optimize phase, magnitude, weighted average and remove trailing spaces * Optimize magnitude, phase, weighted_average, Minor fix * Formatting fixes * Formatting changes * modify hip pack_ function to fix SAT issue in some kernels * Place kernelGDFs in independent folders * Fix runvxTestAllScript, readme and Modify gitignore * Revert "Optimize phase, magnitude, weighted average and remove trailing spaces" This reverts commit ae97d35. * Move all common types/device codes into a new header * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * optimize hip_clamp function * Partial changes to color kernels * Optimize color kernels * Cleanup * Change typecast float to make_float4() * Add UYVY/YUYV options for ChannelExtract * Modify globalThreads_x and globalThreads_y * Kernel GDF modifications * Script enhancements - add support for single kernel testing, optional build * Edit script readme * minor optimization for Phase kernel * fix comment * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * Add filter kernel GDFs * Add test script support for filter kernel diff checks * Optimizations for filter kernels - initial commit * Optimize ScaleGaussianHalf, other minor fixes * Correct some test names in runVisionTests script * Disable ScaleGaussianHalf temporarily * Optimize Median3_/min3_/max3_ * Fix convolotion issue for hip * fix seg fault for ScaleGaussian * Add support for channelCopy and Lut * Minor change * Optimize statistical kernels * Optimize UV12/UV/IUV and ScaleUp2x2 * Minor change * Add kernelGDFs for IUV/UV12/UV converts, threshold, convolve * Update runVisionTests.py and runvxTestAllScript.sh to run with arithmetic/logical/color/filter/statistical kernels * Add uniform-image inputs with hex pixel values * Remove all U1 kernel testing * Test script mods * Uncomment all kernels except geometric/vision * Minor fix * Optimize geometric kernels - initial commit * Minor changes * Mods to use floorf, mul24, mad24, Scale_U8_U8_Area * ScaleImage_U8_U8_Area fixes and Remap initial commit * Remove #defines for remap * Pass hip_memory for remap * Enable scale, warpAffine, warpPerspective testing * Add kernelGDFs for geometric functions, runvxTestAllScript.sh update * Fix the bug for ScaleImage_Bilinear_Constant and ScaleImage_Bilinear_Replicate * GDF and test script corrections * Disable kernels with attr * Disable UV12/UV/IUV converts and ScaleUp2x2 * Add vision kernelGDFs * Vision kernels - initial commit * Modify helpers to use hip built in functions * Remove code used for testing * Minor changes * use consistent device function names and code clean up * remove extra semicolon * switch to builtin functions for hip_lerp * Formatting fixes * minor cmake change to print HIP path/version correctly * Modify harris corners * Test script mod * cmake file changes for building GPU backends and CPU properly * code clean up to make it more readable that there will be a fatal error if OPENCL or HIP not found in the case of the default GPU_SUPPORT=ON * Remove samples/hip_samples, Add openvx_runvx_tests * Enhance runvxTestAllScript, Change ReadMe * Formatting fixes, Code cleanup * Rename openvx_runvx_tests to openvx_node_tests * fix a seg fault for Canny node * remove unused parameter from CannySuppThreshold * Delete vision_tests outer folder * Enhancements to runVisionTests.py * Remove blank lines * Vision kernel mods * Formatting fix * Codacy fixes 1 * Codacy fixes 2 * Codacy fixes 3 * fix cmake * Make pandas optional * Code cleanup * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Add backend_type OCL * Fix CMake issues for HIP backend build. Fix issues caused by merge. * Add support for HIP backend. * add support for VX_DIRECTIVE_AMD_COPY_TO_HIPMEM * Add HIP backend support for Resize crop function. Modify unittest to save all images in local folder (test HIP support). * Fix minor issues in HIP backend. * Fix rocAL Pybind build issue. Update rocAL README.md for TurboJpeg installation. * Fix brightness updation issue. Set random seed in paramter factory constructor. * Fix issue with CMake to work for OCL and HIP backend. * Fix requested deviceID not found error. * Fix issue with HIP load routine. * Rename rali to rocAL. * Fix merge issues. * Fix build issue for rocAL pybind module. (cherry picked from commit 0e1a43a) * Add prefetching support in RALI pipeline. (cherry picked from commit 0d5cf66) * Fix build warnings. (cherry picked from commit b063ca6) * Fix warnings. * Clean up. * Fix merge issues. * Made suggested PR changes. * Fix build error. * Added HIP functionality to AbsoluteDifference * added HIP support for some functions * Added HIP support for another batch of functions * Add HIP supprt for last batch of functions * Set correct affinity to the below amd_rpp nodes. 1. AbsoluteDifference 2. AccumulateSquared 3. AccumulateWeighted 4. Accumulate 5. Add * Set correct affinity to the below amd_rpp nodes. 1. BilateralFilter 2. BitwiseAND 3. BitwiseNOT 4. Blend 5. Blur 6. BoxFilter 7. Brightness * Set correct affinity to the below amd_rpp nodes. 1. CannyEdgeDetector. 2. ChannelCombine. 3. ChannelExtract. 4. ColorTemperature. 5. ColorTwist. 6. Contrast. 7. ControlFlow. 8. CropMirrorNormalize. 9. Crop. 10. CustomConvolution. * Set correct affinity to the below amd_rpp nodes. 1. DataObjectCopy. 2. Dilate. 3. Erode. 4. ExclusiveOR. 5. Exposure. * Set correct affinity to the below amd_rpp nodes. 1. FastCornerDetector. 2. Fisheye. 3. Flip. 4. Fog. 5. GammaCorrection. 6. GaussianFilter. 7. GaussianImagePyramid. * Set correct affinity to the below amd_rpp nodes. 1. HarrisCornerDetector 2. Histogram 3. HistogramBalance 4. Hue 5. WarpPerspective * Set correct affinity to the below amd_rpp nodes. 1. InclusiveOR 2. Jitter 3. LaplacianImagePyramid 4. LensCorrection 5. LocalBinaryPattern 6. LookUpTable * Set correct affinity to the below amd_rpp nodes. 1. Magnitude 2. Max 3. MeanStddev 4. MedianFilter 5. MinMaxLoc 6. Min 7. Multiply * Set correct affinity to the below amd_rpp nodes. 1. Noise 2. NonLinearFilter 3. NonMaxSupression 4. nop 5. Occlusion 6. Phase 7. Pixelate * Set correct affinity to the below amd_rpp nodes. 1. Rain 2. RandomCropLetterBox 3. RandomShadow 4. Remap 5. ResizeCropMirror 6. ResizeCrop 7. Rotate * Set correct affinity to the below amd_rpp nodes. 1. Saturation 2. Scale 3. Snow 4. Sobel 5. Subtract 6. TensorAdd * Set correct affinity to the below amd_rpp nodes. 1. TensorLookup 2. TensorMatrixMultiply 3. TensorMultiply 4. TensorSubtract 5. Thresholding 6. Vignette 7. WarpAffine * Clean up by reducing the variants from 4 -> 1 in amd_rpp. 1. Retain only batchPD variant and delete all the single, batchPS and batchPDROID variants. 2. Remove the support in header and other files. * Set affinity to CPU for OCL backend for all nodes in amd_rpp to run without codegen. * Fix issue with rocAL pybind installation. * Fix indendation issue with nodes in amd_rpp. * Add HIP backend support for single nodes in amd_rpp * Code clean up for amd_rpp nodes. 1. Move memory allocations to initialize function. 2. Add calls to free up memory in uninitialze function. 3. Remove unused declarations. 4. Move batchsize querying to initialize. * Error handling in amd_rpp nodes. Add return error status for functions which do not have GPU support in RPP. * Fix formatting for all amd_rpp nodes. * Fix codacy issue. Change copy_status to STATUS_ERROR_CHECK. Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> * Neural Network Extension - CMake support for HIP GPU backend (#568) * Neural Network Extension - add CMake support for HIP GPU backend - This PR also adds initial HIP kernel support for the gather layer. - The support for executing the gather layer with HIP GPU backend will be added in the next PR. * add backend type(OpenCL/HIP) in the message * Docker Updates (#581) * CMakeLists - Set all warnings RED * Docker - Updates * Readme - Docker Updates * Docker - Fix RPP install Location * Pytorch Docker - RPP Location * VX_NN - 1.3 BatchNorm Fix (#582) * rocAL - coco meta_data_reader update for SSD (#579) * coco meta_reader optimization to remove reading metadata many times * fix build error * fix bug in checking reader type * adjust spacing * Neural Network Extension - HIP GPU backend: support for gather layer (#575) * Neural Network Extension, HIP GPU backend - add support for gather layer * Fix a bug for registering NN kernels for OCL backend * Neural Network Extension - HIP backend: activation layer support (#576) * MV_Deploy - Fix hard coded link path (#584) * Neural Network extension - HIP GPU backend - add support for tile layer (#587) * Docker - CentOS 7 Fix (#590) * CentOS 7 - L3 fix * CentOS 7 - L4 Fix * CentOS 7 - MIVisionX Docker * OpenVX: GPU OCL&HIP Backend - Color Convert Bug Fix (#589) * fixes RGBX to NV12 and IYUV for OCL * fixes RGBX to NV12 and IYUV for HIP * fixes RGBX to iYUV HIP * Runvx - bug fix for topK layer (#588) * Docker - CentOS 7/8 MIVisionX Support Updates (#591) * Docker - CentOS 7/8 Updates * CentOS 8 - Fix ROCm install * CentOS 8 - Docker updates * Docker - Typo Fix * Issue 558 - Fix (#596) * OpenVX GPU backend - fix a regression for HarrisCorner node introduced by PR#596 (#598) * OpenVX GPU backend - fix a regression for HarrisCorner node introduced by PR#596 * set the environment variable only if the target is GPU * add ENABLE_OPENCL/ENABLE_HIP guards for HarrisCorner workaround * set the AGO_DEFAULT_TARGET if it is not set by the user * Neural Network Extension - HIP GPU backend - add support for cast layer (#595) * Neural Network Extension - add support for cast layer * fix a typo * add missing return * MIVisionX - Fix build issue (#599) * Code clean up * Fix codacy issue * Fix build issues Modify few function calls and its arguments to match with latest RPP changes Remove Bilateral filter Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> * CI : OpenVX 1.3 Updates (#594) * OpenVX 1.3 - No Test Filters * HIP - Backend Install Path * AMD EXT - Fix OpenCL Flow * AMD RPP - CMakeLists Updates * CI:Jenkins - OpenVX 1.3 CTS Artifacts (#603) * Jenkins - OpenVX 1.3 CTS Logs to artifacts * CTS LOG - Name Fix * OpenVX HIP GPU backend - turns on warnings for HIP kernels compilation (#601) * OpenVX HIP GPU backend - code clean up for filter kernels (#602) * OpenVX HIP GPU backend - code clean up for filter kernels * remove extra space * VX_NN - Link OpenCL Lib (#608) * Neural Netwrok Extension - HIP backend - add support for image to tensor and tensor to image layers (#605) * CI:Jenkins - Code Coverage (#606) * CI Code Coverage * CI - Upgrade to Python3 * Neural Network Extension - add support for finding the installed MIOPEN backend (#610) * Neural Network Extension - add support for finding the installed MIOPEN backend * add missing check for miopen before finding it's backend type * Docker - Updates & Bug Fix (#611) * Docker - Install MIVisionX for all Levels * Docker - CentOS 7/8 Updates * Docker - U18/20 HIP updates * Docker - U18/20 MIVisionX * SLES Support - Setup & CI (#612) * Neural Network extension - find OCL/HIP packegs before finding MIOPEN package (#613) * Setup - Updates (#614) * Setup - Add Functionality * Setup - Format Fix * Readme - Update Setup Info * Setup - Updates & Fixes * Setup - Bug Fix * CI - Use Python * Setup - Platform Info Fix * Setup - RPP Backend Info Added * Issue #585: VX_NN - Link With MIOpen & MIOpenGEMM -- Fix (#615) * CMakeList - Updates & Cleanup * Jenkins - CI: Remove dead code * SLES - NN Flow Fix (#616) * VX_NN - Model Compiler OpenCL Find Fix * Setup - SLES Updates * Readme -Updates * Neural Network Extension - find/link MIOpenGEMM only for OCL GPU backend (#617) * Issue #458 Fix - GNU 9.3.0: Warnings & Error Handling (#618) * Loom - Fix Warning & Handle Error * OpenVX - Fix U20 Warnings * VX_NN - Fix Buffer Read Warning * OpenVX HIP GPU backend - fix failures reported in issue #566 for HIP backend (#620) * OpenVX HIP GPU backend - fix failures reported in issue #566 for HIP backend * Extend the same fix to remap kernels * ADAT - Classification Sample Added (#623) * OpenVX1.3: HIP GPU Backend - Fix for vxReplicateNode (#604) * replicate node fix * code cleanup * warp affine bugfix * merge fix * warp affine fix for hip (#11) * Laplacian * merge conflict fix * code cleanup * OpenVX HIP backend - bug fix - RGB to YUV4 &RGBX to YUV4 (#624) * HIP- fixes RGB to YUV4 and RGBX to YUV4 * formatting changes * OpenVX - Documentation (#625) * OpenVX HIP GPU backend - fix an uninitialized variable warning (#626) * rocAL - python package improvisation wrt installation (#622) * Code clean up * Fix codacy issue * Fix build issues Modify few function calls and its arguments to match with latest RPP changes Remove Bilateral filter * Modified run.sh script * Updated Readme Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: shobana-mcw <> * Setup Script - Python3 & Python2 Support (#628) * Setup Script - Python3 & Python2 Support * Setup - Updates * Neural Network Extension, HIP GPU backend - add support for layers that use MIOpen (#619) * Neural Network Extension, HIP GPU backend - add support for layers that use MIOpen * remove extra hipSetDevice call * switch to void* for LocalData struct * remove including ago_internal header * fix codacy issue * aadd support for sclae layer if third parameter exists * code clean up * OpenVX 1.3 - fixes harris corner failure - hip (#630) * OpenVX - U8 for uniformImage (#627) * Fix Windows build (#633) * fix null -> NULL * Add ago_haf_cpu_generic_functions.cpp to VS build Co-authored-by: Liam Wrubleski <lwrubles@amd.com> * Jenkins - MIVisionX HIP Backend on Ubuntu20 & CentOS 8 (#629) * Jenkins - HIP Backend on Ubuntu20 * Jenkins - HIP Backend on CentOS 8 * Setup Script - RPP Updates * Setup - Remove RPP Legacy instructions * Setup - Add rocBLAS for MIOpen HIP * Setup - RPP Upgraded to 0.9 * Setup - RPP Version Updates * add support for detecting RPP backend type Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> * rename VX_TENSOR_STRIDE_OPENCL/VX_TENSOR_OFFSET_OPENCL to VX_TENSOR_STRIDE_GPU/VX_TENSOR_OFFSET_GPU (#634) * rename VX_TENSOR_STRIDE_OPENCL/VX_TENSOR_OFFSET_OPENCL to VX_TENSOR_STRIDE_GPU/VX_TENSOR_OFFSET_GPU * code clean up * AMD_VX_NN - Minor change for Tensor Min/Max nodes (#631) * remove the policy from max/min * minor fix * code cleanup * MIVisionX Package - Set dependency on rocm-core (#637) * Set package to depend on rocm-core package This will set dependency to rocm-core package if ROCM_DEP_ROCMCORE flag is set to ON. * CMakeList - Updates Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> * VX_NN - Link rocblas for HIP Backend (#638) * VX_NN - Link roc::rocblas * CI - use HIP install * Library Tests - HIP & OCL Backend SUpport * Jenkins CI - Save HIP Lib Report * Jenkins - Setup Updates * CI - NN Tests Updates * OpenVX HIP GPU backend - force creation of the HIP stream used in the graph n the context initialization (#641) * Setup & Dockers - FFMPEG updates (#642) * Docker Script - FFMPEG Updates * Setup - Updates to FFMPEG * Library test - Fix find exe * OpenVX 1.3 - Laplacian Pyramid Node fix - GPU OpenCL (#636) * pad first & last row for hip kernel * kernel optimization * code cleanup * initialize to 0 * OpenVX 1.3 - GPU backends - bug fix - vxImageContainmentRelationship (#643) * add U8 for uniformImage * fixes image containment relationship failure * code clean up * adding comments * HIP fix for image Containment * MIVisionX - Logo Update (#645) * rocAL - Box encoder changes (#632) * Add label_map support * Add support for Box Encoder in rocAL * Code Clean up * Box Encoder returns bounding boxes in xc,yc,w,h format * Fix Bug in Box Encoder * Add openmp pragma support in Box Encoder - Fixing segmentation fault YTD (Bug Fix): Number of bounding boxes is less than the actual number * Fix the bug in box encoder optimization leading to lesser number of bboxes then the actual number * Remove Python Post Processing (.view() used instead of .reshape()) * Minor changes * Add support for Image ID in rocAL YTD: To give display support with recent changes * Add support for encoded offset in box encoder * Add rapidJson support for COCO meta data reader. Remove json cpp support. Update in setup and Readme. * Minor change * Change name from jsoncpp to RapidJson in ReadMe. * Add rapidjson include files in rocAL/third_party. Remove rapidjson install instructions in Readme and MIVisionX-setup files. * Code clean up in coco_pipeline.py * Minor changes * Remove unwanted comments * add openMP for copy_out_tensor * Add xcycwh meta data structure * Cast the anchor pointer to BoundingBoxCoords & avoid extra copy * Provide Variable desciption for Box Encoder * Remove _ from varibale names * Minor Changes * Removing Line Breaks , Commented code in coco_piepeline.py * Add Formula for Offset calculation in box encoder implementation * Reduce back & forth copying of data in function update_box_encoder_meta_data * box_encoder changes * Add reference for Offset Calculation * Remove Extra Copies in the meta data updation * add openmp pragma for copy to output- from rajys commit * Correct the issue with Box encoder claculations * Minor change - from rajys commit * Remove tellg for getting the filesize * Add copyright for thirdparty rapidjson files * Add DBG INFO in CMakeLists.txt * Resolve PR Comments * Comment out print statement * Keep the original license info for third_party files * Remove unused import * Add process times to include the meta data time * Resolve the PR Comments * Add openMP Pragma to raliCopyEncodedBoxesAndLables Co-authored-by: root <root@ixt-rack-32.local.lan> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: root <root@jenkins-worker-rocm-amd-104.local.lan> Co-authored-by: rrawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: root <root@gb-sjc2-03.local.lan> Co-authored-by: root <root@ixt-rack-164.local.lan> Co-authored-by: root <root@IXT-RACK-43.local.lan> * OpenVX - fix a bug in vxMapRemapPatch API (#647) - return the stride_y and the correct buffer for the remap object - this fixes the MapRandomRemap random failure issue in CTS 1.3 * Apps - mivisionx_openvx_classifier - Bug Fix + Adding Logos (#649) * CMakeList fix - apps build to /opt/rocm/mivisionx/bin * fixing bugs in apps and adding logos * adding logos and chaning path * OpenVX HIP GPU backend - add HIP support for some vx APIs (#650) * image augmentation - add logos (#652) * Rr/fused decoder update (#653) * remove unnecessary memcpy * use fstream file file reading * remove print statement * code clean_up and merge with upstream * add openmp pragma for copy to output * revert fused_crop_decoder changes * fused crop decoder changes to avoid extra memcpy * adding logo to inference analyzer app (#654) * Neural Network Extension, HIP GPU backend - use object library for the HIP backend (#651) * OpenVX - Add missing API (#655) * OpenVX HIP GPU backend - use object library for the HIP kernels (#656) * OpenVX HIP GPU backend - use object library for the HIP kernels * disable codecov for HIP backend as it's not working for the object libraries * dont report code coverage for the HIP backend as it is temporarily disabled * OpenVX 1.3 - laplacian fix (#657) * initialize output image buffer with zero * code cleanup * laplacian fix * add log entry * fix log entry Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> * rocAL HIP GPU backend- switch to object library for the HIP rocAL kernels (#660) * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Review Comments Updated * Review Comments Updated * Review Comments Updated * Review Comments Updated * Review Comments Updated * Include Path Reorg Update * Neural Network extension - check if the MIOpen's config file exists before reading it (#661) * Neural Network extension - check if the MIOpen's config file exists before reading it * clean up * rocAL - fix for some codacy warnings (#662) * fix for some codacy warnings * get rid of codacy bug in context null ref checking * Neural Network Extension HIP GPU backend - add support for the permute/Tensor_log/Tensor_exp layers (#664) * rocAL - Video pipeline Support (#640) * Fix a few bugs * caffe and caffe2 changes for optimization * removed commented code * Fix issue with SSD meta node. * Add support for original width and height for TF Detection * Add change required to reflect mean and std values in an images pixel value * Clean codes * Hardcoding the key values for tf detection and classification * Release RingBuffer memory * Modify RALI API's to return the bbox coords and bbox labels for all images in the output batch * Update rali_unittest.cpp * Clean repo * Add batching support in PYTHON API for labels,bboxes,img_sizes excluding image_names * Add support for image names for batching support * Merge RALI_Upgrade * Update raliunittest.cpp * Add support for bytes instead of str in rali pybind * Code clean Up * Fix codacy issues * Fix codacy issues * Fix indendation error * Fix codacy warnings * Fix trailing spaces warnings * Fix scope of the variable can be reduced warning * Fix errors in the RALI API * Fix Codacy warnings * Remove extra empty lines * Add support for Box Encoder in coco_pipeline.py * Add codes to retrieve meta data information in loader module * Add support for One Hot Labels for all classification based Readers * Add codes to retrieve meta data information in decoder module * Introduce data loader for coco reader using partail decoder * Add casting Support for Encoded Labels * Add support for RandomBBoxCrop augmenation * Clean codes * Introduce RandomBBoxCrop_MetaData Reader to store the crop params returned by RandomBBoxCrop function * Update RandomBBoxCrop_MetaData Reader * Add meta data update support for both vertical and horizontal flip. * Add RandomBBox support. Introduced map to store image name and crop generated by randomBbox. Look up to fetch CropCordsBatch. Get functionality to get crop wrt to image_name as key. Fixed the seg fault. * Add support for BBFlip * Add support for Random BBox Crop Reader to be part of load routine. Fetches the crop of the image to be decodes and does partial decoding for crop part. * Fix the warnings. * Fix issue with the meta data updation in master graph. * Add API changes. Fix issues with RandomBBoxCrop algorithm. * Add support for Random BBox Crop & ImageDecoderSlice * Small changes in Box Encoder * Small Change in Anchor boxes input comment * Add changes for Crop Dim exceding Image Dim * Minor Changes for RandomBboxCrop * Fix issues with RandomBboxCrop. * Fix Reader seg fault issue. * Add minor Changes for fp16 * Minor Changes * Clean Codes * Minor Changes for Random BBox Crop * Add support for Multi-GPU * Minor Changes for Multi-GPU support in COCO file souce partial * Remove unwanted code. * Minor Changes in RandomBBoxCrop * Fix Random BBox Crop * Comment out the print statements * Minor Changes for RBBOX * Code clean up. * Fix warnings wrt Ubuntu 20.04 * Resolve codacy warnings * Resolve codacy warnings * Fix PR issues. * Revert back Slice to Crop * Resolve Codacy warnings * Resolve codacy issues * Resolve Minor codacy issue * Fix issue to make branch compatible with AMDRPP master TOT. * Fix the crop_x difference in partial decoding image crop * Fix issue with crop fixed. * Crop_x & crop_y value fixed. * Check bounday conditions and update crop params. * Fix the crop_width difference in partial decoding image crop * Change wrt invalid_bboxes. * Rewrite RandomBBoxCrop Code * Fix issue with random generation. Used randomdevice for seed when initializing random param. * Fix Key value zero error. Error introduced by partial decoding crop correction. Fixed by adjusting the calculation of top and right. * Add codes for Video Reader and Loader module (cherry picked from commit d97ab927e9a6597fde666f7dec50ba987259dccf) * Changes in image_augmentation library (cherry picked from commit 153f59bd377cf7cbd87ecf58d722cd88ba79ff22) * Fix Build issues (cherry picked from commit e023b104c9a220e4745b83693b6c38a7411b54fc) * Fix Build errors (cherry picked from commit 908c599fe7b300dfd837a87215b33295bc2752a9) * Add reader type for video reader (cherry picked from commit 843a6cd6a6436513b4c000813a48b66461768ecc) * Adding codes to decode video input file (cherry picked from commit 2c249d7312e0c746fac600b5e38c5b4cb16f1910) * Introduce Video Decoder module to decode video files (cherry picked from commit 4d18ea384a2599aed3b0a0c975d7cd0a343720d2) * Add decoder functions in FFMPEG_VIDEO_DECODER (cherry picked from commit 02224f9601ee4269c68f6725a19468a171718276) * Clean Video Decoder codes (cherry picked from commit 8699179b282aa7870f60ca670a19b512ca45d9ac) * Clean codes to remove build issues (cherry picked from commit 39c5f45ff875d111f2977af0487e9cf8d22174c2) * Clean codes * Initial changes for video reader pipeline. [NBC] * To handle sequence length. * To handle shuffle. * Temp local changes. [NYC] * Changes in the video reader pipeline. [NWC] * Video Reader changes * Fix the segmentation fault in the video reader pipeline * Add support to save decoded output frames in video decoder * Working Pipeline - Single Video file input Add support to modify internal and user batch size in master graph Add ffmpeg seek operation * Minor Changes * Add support for decoding multiple video files and shuffle * Add support to initialize ffmpeg context for each video decoder instance * Code cleanup * Fix issue in Shuffling the images in video reader * Add seek_frame function in video decoder * Code clean up * Update rali_unittest * Add folder based label meta data reader for video reader * Add support for Sequence Reader in RALI * Fix codacy issue * Add Sequence Rearrange initial setup. Works only for sequence length equal to video reader. Introduce ovx node sequence rearrange to support. Introduce API in rali_api_augmentations. * Fix issue with Sequence Rearrange with different sequence length. * Introduce raliVideoFileResize node in RALI to fuse video decoding and resize * Add new_sequence_length parameter to sequence rearrange * Add sequence rearrange algorithm for RGB images * Add support for Sequence Reader in RALI * Fix random shuffling of sequences in video reader * Add support for folder based reader and label support for video decoder and labels. * Clean codes * Fix issue in raliVideoReaderResize * Code clean up. * set batchsize to internal batch size in video pipeline loaders. * Add flag in master graph to switch between video and image pipelines. * Add step and stride parameter to VideoReader and SequenceReader * Fix issue with the sequence rearrange. * Adjust remaining image count in master graph wrt sequence rearrange. * Add meta data support for video reader folder based. * Update decode image info name according to stride * Minor bug fix * Add support for text file input Add support to fetch video properties from text file Modify reader to read from the start to end frame specified in text file Add meta data support for text file input to the video reader * Add support to process repeated file inputs in text file * Add meta data reader support to parse timestamps from text file Introduce enable_timestamps parameter and set_timestamps_bool to the meta data readers * Add rali_video_unittests Video Reader Vidoe Reader Resize Sequence Reader Sequence Rearramge * Code clean up * Fix maximum limit for decoder instance creation. Check if instance is there for the video file if not initialize one using previously created instance. * Fix warnings. * Minor fix * Add file_list_frame_num parameter To switch between timestamp or frame number input passed with text file * Add data samples for testing Add video samples Add coco sample data with 10 images for train and val * Add support to generate frame number and timestamps output * Fix multiple video file input to video pipeline * Add labelled video folder samples * Modified test suite Modified rali_video_unittests.cpp Add testScript.sh to build and execute rali_video_unittests Remove video pipeline tests from rali_unittests.cpp * Code clean up * Modify frame_rate variable * Add step and stride parameters to SequenceReaderSingleSharded * Minor fix * Modify ffmpeg video decoder functions Initialize the ffmpeg context once for each video file * Fix ffmpeg deprecation warnings * Modify ffmpeg video decoder Add width, height, stride and pixel format paramters to Decode * Code clean up Change Video label reader folders to Video label reader * Remove text file input parameter to dataloader * Add support to check variable frame rate videos * Minor changes * Minor fix * Code clean up * Code clean up * Change rali to rocAL * Merge branch 'AMD-Master' into video_devel * Resolve build issues Code clean up * Fix bug with Sequence Rearrange * Add sharding support to Video Reader * Add sharding support for Sequence Reader * Introduce decoder mode parameter * Add U8 support for Sequence Rearrange Minor changes * Add SingleShard API for video readers * Add support to decode more than one sequence Modify the load routine to decode more than one sequence Add sequence count parameter to Sequence rearrange * Merge branch 'video_devel_PR' of https://github.com/MCW-Dev/MIVISION into video_devel_PR * Fix SequenceReader and SequenceReaderSingleShard * Resolve merge conflicts * Minor fix * Add codes for multithreading * Fix build isssue with HIP backend * Fix warnings * Resolve codacy issues Remove blank lines Adjust spacing * Resolve codacy issues * Modify the sequence reader arguments of the ImageLoaderNode * Remove rocAL sample data * Minor changes Add RALI_VIDEO flag to few files * Add seperate VideoReader Introduce VideoFilesourceReader and VideoReaderConfig * Introduce SequenceInfo struct Minor changes * Fix codacy issues in video unit test testScript.sh * Minor fix * Minor bug fix * Introduce the latest FFmpeg API in ffmpeg_video_decoder.cpp * Merge branch 'PR_changes' of https://github.com/fiona-gladwin/MIVisionX into video_devel_PR * Video Pipeline changes * Batch size changes for Video Reader * Video Pipeline Meta data reader changes to store the meta data for each sequence and not for every frame in the sequence * Code cleanup - Video Reader changes * Batch size variable changes Change batch size and internal batch size variables to constant. Introduce batch size and batch ratio variables for the Sequence Reader in master graph. * Change datatype of frame_rate to float * Enable HIP Backend support for video pipeline * Add codes to dump the images in each batch as AVI video file * Minor change in video unit tests * Add HIP backend support for sequence rearrange * Add OpenCL backend support for sequence rearrange * Add condtion to disable ResizeNode update in raliVideoFileResize if resize width and height is same as the videos * PR changes * Fix single folder of images issue in Sequence Reader * Fix codacy issues * PR changes * API changes * Introduce separate output routine for the video pipeline * PR changes * PR changes * Fix for codacy issues Co-authored-by: LokeshBonta <lokeswara@multicorewareinc.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: fionagladwin <fionagladwin@multicorewareinc.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> * Docker - Ubuntu Updates (#667) * Docker - RPP Update Version (#668) * Windows - README updates (#671) * Update README.md * Update README.md * Update CMakeLists.txt Include Path Reorg Changes - moved to include/<compnm>/ Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: Fiona-MCW <70996026+fiona-gladwin@users.noreply.github.com> Co-authored-by: Liam Wrubleski <Liam.Wrubleski@amd.com> Co-authored-by: Liam Wrubleski <lwrubles@amd.com> Co-authored-by: Hansel Yang <hanselyang123@gmail.com> Co-authored-by: frepaul <71665912+frepaul@users.noreply.github.com> Co-authored-by: swetha097 <59434434+swetha097@users.noreply.github.com> Co-authored-by: root <root@ixt-rack-32.local.lan> Co-authored-by: root <root@jenkins-worker-rocm-amd-104.local.lan> Co-authored-by: root <root@gb-sjc2-03.local.lan> Co-authored-by: root <root@ixt-rack-164.local.lan> Co-authored-by: root <root@IXT-RACK-43.local.lan> Co-authored-by: Indumathi31 <59440990+Indumathi31@users.noreply.github.com> Co-authored-by: LokeshBonta <lokeswara@multicorewareinc.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: fionagladwin <fionagladwin@multicorewareinc.com> Co-authored-by: LakshmiKumar23 <kumar.lakshmi1994@gmail.com>

* rocal_pybind - fix package link error (#564) * rocAL_pybind - CMakeList Add Final Install Path (#567) * MIVisionX Backends - Expanded Support (#569) * Backend Support - AMD EXT Expanded Support * MIVisionX Backend - Cleanup * rocAL CMakeList - Fix MSG * CPU Backend - Fix CMakeList * AMD RPP - Warning MSG for CPU Backend * rocAL Pybind - Fix Link (#574) * VX_NN - adding validate support (#570) * rocAL - MCW changes (#562) * optimize ColorDepth kernels * Add new coding style for arithmetic/logical/color hip kernels * Merge pull request #32 from asalmanp/as/hip_kernels_style Add new coding style for arithmetic/logical/color hip kernels * Add auto OCL dump generator script * Add gdfs for arithmetic, logical, color kernels * Modify arithmetic kernels as per new std * Add the missing buffer_offset to the hip_memory * Arithmetic kernels fixes * Modify logical kernels as per new std * Revert to previous min max impl * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * add the optimized ChannelExtract_U8_U32_Pos0 and ChannelExtract_U8_U24_Pos0 color kernels * Threshold - Update to 1.3 * Add new gdfs and modify generator script * Jenkins - Check Build & Artifacts * Tests - Fix platform name * Modify generator script for ocl/hip dumps and fixes for gdfs * Add optimized box filter * Modify kernelGDFs, automate script for OCL/HIP bin dumps for different image sizes * Optimize phase, magnitude, weighted average and remove trailing spaces * Optimize magnitude, phase, weighted_average, Minor fix * Formatting fixes * Formatting changes * modify hip pack_ function to fix SAT issue in some kernels * Place kernelGDFs in independent folders * Fix runvxTestAllScript, readme and Modify gitignore * Revert "Optimize phase, magnitude, weighted average and remove trailing spaces" This reverts commit ae97d35. * Move all common types/device codes into a new header * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * optimize hip_clamp function * Partial changes to color kernels * Optimize color kernels * Cleanup * Change typecast float to make_float4() * Add UYVY/YUYV options for ChannelExtract * Modify globalThreads_x and globalThreads_y * Kernel GDF modifications * Script enhancements - add support for single kernel testing, optional build * Edit script readme * minor optimization for Phase kernel * fix comment * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * Add filter kernel GDFs * Add test script support for filter kernel diff checks * Optimizations for filter kernels - initial commit * Optimize ScaleGaussianHalf, other minor fixes * Correct some test names in runVisionTests script * Disable ScaleGaussianHalf temporarily * Optimize Median3_/min3_/max3_ * Fix convolotion issue for hip * fix seg fault for ScaleGaussian * Add support for channelCopy and Lut * Minor change * Optimize statistical kernels * Optimize UV12/UV/IUV and ScaleUp2x2 * Minor change * Add kernelGDFs for IUV/UV12/UV converts, threshold, convolve * Update runVisionTests.py and runvxTestAllScript.sh to run with arithmetic/logical/color/filter/statistical kernels * Add uniform-image inputs with hex pixel values * Remove all U1 kernel testing * Test script mods * Uncomment all kernels except geometric/vision * Minor fix * Optimize geometric kernels - initial commit * Minor changes * Mods to use floorf, mul24, mad24, Scale_U8_U8_Area * ScaleImage_U8_U8_Area fixes and Remap initial commit * Remove #defines for remap * Pass hip_memory for remap * Enable scale, warpAffine, warpPerspective testing * Add kernelGDFs for geometric functions, runvxTestAllScript.sh update * Fix the bug for ScaleImage_Bilinear_Constant and ScaleImage_Bilinear_Replicate * GDF and test script corrections * Disable kernels with attr * Disable UV12/UV/IUV converts and ScaleUp2x2 * Add vision kernelGDFs * Vision kernels - initial commit * Modify helpers to use hip built in functions * Remove code used for testing * Minor changes * use consistent device function names and code clean up * remove extra semicolon * switch to builtin functions for hip_lerp * Formatting fixes * minor cmake change to print HIP path/version correctly * Modify harris corners * Test script mod * cmake file changes for building GPU backends and CPU properly * code clean up to make it more readable that there will be a fatal error if OPENCL or HIP not found in the case of the default GPU_SUPPORT=ON * Remove samples/hip_samples, Add openvx_runvx_tests * Enhance runvxTestAllScript, Change ReadMe * Formatting fixes, Code cleanup * Rename openvx_runvx_tests to openvx_node_tests * fix a seg fault for Canny node * remove unused parameter from CannySuppThreshold * Delete vision_tests outer folder * Enhancements to runVisionTests.py * Remove blank lines * Vision kernel mods * Formatting fix * Codacy fixes 1 * Codacy fixes 2 * Codacy fixes 3 * fix cmake * Make pandas optional * Code cleanup * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Add backend_type OCL * Fix CMake issues for HIP backend build. Fix issues caused by merge. * Add support for HIP backend. * add support for VX_DIRECTIVE_AMD_COPY_TO_HIPMEM * Add HIP backend support for Resize crop function. Modify unittest to save all images in local folder (test HIP support). * Fix minor issues in HIP backend. * Fix rocAL Pybind build issue. Update rocAL README.md for TurboJpeg installation. * Fix brightness updation issue. Set random seed in paramter factory constructor. * Fix issue with CMake to work for OCL and HIP backend. * Fix requested deviceID not found error. * Fix issue with HIP load routine. * Rename rali to rocAL. * Fix merge issues. * Fix build issue for rocAL pybind module. (cherry picked from commit 0e1a43a) * Add prefetching support in RALI pipeline. (cherry picked from commit 0d5cf66) * Fix build warnings. (cherry picked from commit b063ca6) * Fix warnings. * Clean up. * Fix merge issues. * Made suggested PR changes. * Fix build error. * Added HIP functionality to AbsoluteDifference * added HIP support for some functions * Added HIP support for another batch of functions * Add HIP supprt for last batch of functions * Set correct affinity to the below amd_rpp nodes. 1. AbsoluteDifference 2. AccumulateSquared 3. AccumulateWeighted 4. Accumulate 5. Add * Set correct affinity to the below amd_rpp nodes. 1. BilateralFilter 2. BitwiseAND 3. BitwiseNOT 4. Blend 5. Blur 6. BoxFilter 7. Brightness * Set correct affinity to the below amd_rpp nodes. 1. CannyEdgeDetector. 2. ChannelCombine. 3. ChannelExtract. 4. ColorTemperature. 5. ColorTwist. 6. Contrast. 7. ControlFlow. 8. CropMirrorNormalize. 9. Crop. 10. CustomConvolution. * Set correct affinity to the below amd_rpp nodes. 1. DataObjectCopy. 2. Dilate. 3. Erode. 4. ExclusiveOR. 5. Exposure. * Set correct affinity to the below amd_rpp nodes. 1. FastCornerDetector. 2. Fisheye. 3. Flip. 4. Fog. 5. GammaCorrection. 6. GaussianFilter. 7. GaussianImagePyramid. * Set correct affinity to the below amd_rpp nodes. 1. HarrisCornerDetector 2. Histogram 3. HistogramBalance 4. Hue 5. WarpPerspective * Set correct affinity to the below amd_rpp nodes. 1. InclusiveOR 2. Jitter 3. LaplacianImagePyramid 4. LensCorrection 5. LocalBinaryPattern 6. LookUpTable * Set correct affinity to the below amd_rpp nodes. 1. Magnitude 2. Max 3. MeanStddev 4. MedianFilter 5. MinMaxLoc 6. Min 7. Multiply * Set correct affinity to the below amd_rpp nodes. 1. Noise 2. NonLinearFilter 3. NonMaxSupression 4. nop 5. Occlusion 6. Phase 7. Pixelate * Set correct affinity to the below amd_rpp nodes. 1. Rain 2. RandomCropLetterBox 3. RandomShadow 4. Remap 5. ResizeCropMirror 6. ResizeCrop 7. Rotate * Set correct affinity to the below amd_rpp nodes. 1. Saturation 2. Scale 3. Snow 4. Sobel 5. Subtract 6. TensorAdd * Set correct affinity to the below amd_rpp nodes. 1. TensorLookup 2. TensorMatrixMultiply 3. TensorMultiply 4. TensorSubtract 5. Thresholding 6. Vignette 7. WarpAffine * Clean up by reducing the variants from 4 -> 1 in amd_rpp. 1. Retain only batchPD variant and delete all the single, batchPS and batchPDROID variants. 2. Remove the support in header and other files. * Set affinity to CPU for OCL backend for all nodes in amd_rpp to run without codegen. * Fix issue with rocAL pybind installation. * Fix indendation issue with nodes in amd_rpp. * Add HIP backend support for single nodes in amd_rpp * Code clean up for amd_rpp nodes. 1. Move memory allocations to initialize function. 2. Add calls to free up memory in uninitialze function. 3. Remove unused declarations. 4. Move batchsize querying to initialize. * Error handling in amd_rpp nodes. Add return error status for functions which do not have GPU support in RPP. * Fix formatting for all amd_rpp nodes. * Fix codacy issue. Change copy_status to STATUS_ERROR_CHECK. Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> * Neural Network Extension - CMake support for HIP GPU backend (#568) * Neural Network Extension - add CMake support for HIP GPU backend - This PR also adds initial HIP kernel support for the gather layer. - The support for executing the gather layer with HIP GPU backend will be added in the next PR. * add backend type(OpenCL/HIP) in the message * Docker Updates (#581) * CMakeLists - Set all warnings RED * Docker - Updates * Readme - Docker Updates * Docker - Fix RPP install Location * Pytorch Docker - RPP Location * VX_NN - 1.3 BatchNorm Fix (#582) * rocAL - coco meta_data_reader update for SSD (#579) * coco meta_reader optimization to remove reading metadata many times * fix build error * fix bug in checking reader type * adjust spacing * Neural Network Extension - HIP GPU backend: support for gather layer (#575) * Neural Network Extension, HIP GPU backend - add support for gather layer * Fix a bug for registering NN kernels for OCL backend * Neural Network Extension - HIP backend: activation layer support (#576) * MV_Deploy - Fix hard coded link path (#584) * Neural Network extension - HIP GPU backend - add support for tile layer (#587) * Docker - CentOS 7 Fix (#590) * CentOS 7 - L3 fix * CentOS 7 - L4 Fix * CentOS 7 - MIVisionX Docker * OpenVX: GPU OCL&HIP Backend - Color Convert Bug Fix (#589) * fixes RGBX to NV12 and IYUV for OCL * fixes RGBX to NV12 and IYUV for HIP * fixes RGBX to iYUV HIP * Runvx - bug fix for topK layer (#588) * Docker - CentOS 7/8 MIVisionX Support Updates (#591) * Docker - CentOS 7/8 Updates * CentOS 8 - Fix ROCm install * CentOS 8 - Docker updates * Docker - Typo Fix * Issue 558 - Fix (#596) * OpenVX GPU backend - fix a regression for HarrisCorner node introduced by PR#596 (#598) * OpenVX GPU backend - fix a regression for HarrisCorner node introduced by PR#596 * set the environment variable only if the target is GPU * add ENABLE_OPENCL/ENABLE_HIP guards for HarrisCorner workaround * set the AGO_DEFAULT_TARGET if it is not set by the user * Neural Network Extension - HIP GPU backend - add support for cast layer (#595) * Neural Network Extension - add support for cast layer * fix a typo * add missing return * MIVisionX - Fix build issue (#599) * Code clean up * Fix codacy issue * Fix build issues Modify few function calls and its arguments to match with latest RPP changes Remove Bilateral filter Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> * CI : OpenVX 1.3 Updates (#594) * OpenVX 1.3 - No Test Filters * HIP - Backend Install Path * AMD EXT - Fix OpenCL Flow * AMD RPP - CMakeLists Updates * CI:Jenkins - OpenVX 1.3 CTS Artifacts (#603) * Jenkins - OpenVX 1.3 CTS Logs to artifacts * CTS LOG - Name Fix * OpenVX HIP GPU backend - turns on warnings for HIP kernels compilation (#601) * OpenVX HIP GPU backend - code clean up for filter kernels (#602) * OpenVX HIP GPU backend - code clean up for filter kernels * remove extra space * VX_NN - Link OpenCL Lib (#608) * Neural Netwrok Extension - HIP backend - add support for image to tensor and tensor to image layers (#605) * CI:Jenkins - Code Coverage (#606) * CI Code Coverage * CI - Upgrade to Python3 * Neural Network Extension - add support for finding the installed MIOPEN backend (#610) * Neural Network Extension - add support for finding the installed MIOPEN backend * add missing check for miopen before finding it's backend type * Docker - Updates & Bug Fix (#611) * Docker - Install MIVisionX for all Levels * Docker - CentOS 7/8 Updates * Docker - U18/20 HIP updates * Docker - U18/20 MIVisionX * SLES Support - Setup & CI (#612) * Neural Network extension - find OCL/HIP packegs before finding MIOPEN package (#613) * Setup - Updates (#614) * Setup - Add Functionality * Setup - Format Fix * Readme - Update Setup Info * Setup - Updates & Fixes * Setup - Bug Fix * CI - Use Python * Setup - Platform Info Fix * Setup - RPP Backend Info Added * Issue #585: VX_NN - Link With MIOpen & MIOpenGEMM -- Fix (#615) * CMakeList - Updates & Cleanup * Jenkins - CI: Remove dead code * SLES - NN Flow Fix (#616) * VX_NN - Model Compiler OpenCL Find Fix * Setup - SLES Updates * Readme -Updates * Neural Network Extension - find/link MIOpenGEMM only for OCL GPU backend (#617) * Issue #458 Fix - GNU 9.3.0: Warnings & Error Handling (#618) * Loom - Fix Warning & Handle Error * OpenVX - Fix U20 Warnings * VX_NN - Fix Buffer Read Warning * OpenVX HIP GPU backend - fix failures reported in issue #566 for HIP backend (#620) * OpenVX HIP GPU backend - fix failures reported in issue #566 for HIP backend * Extend the same fix to remap kernels * ADAT - Classification Sample Added (#623) * OpenVX1.3: HIP GPU Backend - Fix for vxReplicateNode (#604) * replicate node fix * code cleanup * warp affine bugfix * merge fix * warp affine fix for hip (#11) * Laplacian * merge conflict fix * code cleanup * OpenVX HIP backend - bug fix - RGB to YUV4 &RGBX to YUV4 (#624) * HIP- fixes RGB to YUV4 and RGBX to YUV4 * formatting changes * OpenVX - Documentation (#625) * OpenVX HIP GPU backend - fix an uninitialized variable warning (#626) * rocAL - python package improvisation wrt installation (#622) * Code clean up * Fix codacy issue * Fix build issues Modify few function calls and its arguments to match with latest RPP changes Remove Bilateral filter * Modified run.sh script * Updated Readme Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: shobana-mcw <> * Setup Script - Python3 & Python2 Support (#628) * Setup Script - Python3 & Python2 Support * Setup - Updates * Neural Network Extension, HIP GPU backend - add support for layers that use MIOpen (#619) * Neural Network Extension, HIP GPU backend - add support for layers that use MIOpen * remove extra hipSetDevice call * switch to void* for LocalData struct * remove including ago_internal header * fix codacy issue * aadd support for sclae layer if third parameter exists * code clean up * OpenVX 1.3 - fixes harris corner failure - hip (#630) * OpenVX - U8 for uniformImage (#627) * Fix Windows build (#633) * fix null -> NULL * Add ago_haf_cpu_generic_functions.cpp to VS build Co-authored-by: Liam Wrubleski <lwrubles@amd.com> * Jenkins - MIVisionX HIP Backend on Ubuntu20 & CentOS 8 (#629) * Jenkins - HIP Backend on Ubuntu20 * Jenkins - HIP Backend on CentOS 8 * Setup Script - RPP Updates * Setup - Remove RPP Legacy instructions * Setup - Add rocBLAS for MIOpen HIP * Setup - RPP Upgraded to 0.9 * Setup - RPP Version Updates * add support for detecting RPP backend type Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> * rename VX_TENSOR_STRIDE_OPENCL/VX_TENSOR_OFFSET_OPENCL to VX_TENSOR_STRIDE_GPU/VX_TENSOR_OFFSET_GPU (#634) * rename VX_TENSOR_STRIDE_OPENCL/VX_TENSOR_OFFSET_OPENCL to VX_TENSOR_STRIDE_GPU/VX_TENSOR_OFFSET_GPU * code clean up * AMD_VX_NN - Minor change for Tensor Min/Max nodes (#631) * remove the policy from max/min * minor fix * code cleanup * MIVisionX Package - Set dependency on rocm-core (#637) * Set package to depend on rocm-core package This will set dependency to rocm-core package if ROCM_DEP_ROCMCORE flag is set to ON. * CMakeList - Updates Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> * VX_NN - Link rocblas for HIP Backend (#638) * VX_NN - Link roc::rocblas * CI - use HIP install * Library Tests - HIP & OCL Backend SUpport * Jenkins CI - Save HIP Lib Report * Jenkins - Setup Updates * CI - NN Tests Updates * OpenVX HIP GPU backend - force creation of the HIP stream used in the graph n the context initialization (#641) * Setup & Dockers - FFMPEG updates (#642) * Docker Script - FFMPEG Updates * Setup - Updates to FFMPEG * Library test - Fix find exe * OpenVX 1.3 - Laplacian Pyramid Node fix - GPU OpenCL (#636) * pad first & last row for hip kernel * kernel optimization * code cleanup * initialize to 0 * OpenVX 1.3 - GPU backends - bug fix - vxImageContainmentRelationship (#643) * add U8 for uniformImage * fixes image containment relationship failure * code clean up * adding comments * HIP fix for image Containment * MIVisionX - Logo Update (#645) * rocAL - Box encoder changes (#632) * Add label_map support * Add support for Box Encoder in rocAL * Code Clean up * Box Encoder returns bounding boxes in xc,yc,w,h format * Fix Bug in Box Encoder * Add openmp pragma support in Box Encoder - Fixing segmentation fault YTD (Bug Fix): Number of bounding boxes is less than the actual number * Fix the bug in box encoder optimization leading to lesser number of bboxes then the actual number * Remove Python Post Processing (.view() used instead of .reshape()) * Minor changes * Add support for Image ID in rocAL YTD: To give display support with recent changes * Add support for encoded offset in box encoder * Add rapidJson support for COCO meta data reader. Remove json cpp support. Update in setup and Readme. * Minor change * Change name from jsoncpp to RapidJson in ReadMe. * Add rapidjson include files in rocAL/third_party. Remove rapidjson install instructions in Readme and MIVisionX-setup files. * Code clean up in coco_pipeline.py * Minor changes * Remove unwanted comments * add openMP for copy_out_tensor * Add xcycwh meta data structure * Cast the anchor pointer to BoundingBoxCoords & avoid extra copy * Provide Variable desciption for Box Encoder * Remove _ from varibale names * Minor Changes * Removing Line Breaks , Commented code in coco_piepeline.py * Add Formula for Offset calculation in box encoder implementation * Reduce back & forth copying of data in function update_box_encoder_meta_data * box_encoder changes * Add reference for Offset Calculation * Remove Extra Copies in the meta data updation * add openmp pragma for copy to output- from rajys commit * Correct the issue with Box encoder claculations * Minor change - from rajys commit * Remove tellg for getting the filesize * Add copyright for thirdparty rapidjson files * Add DBG INFO in CMakeLists.txt * Resolve PR Comments * Comment out print statement * Keep the original license info for third_party files * Remove unused import * Add process times to include the meta data time * Resolve the PR Comments * Add openMP Pragma to raliCopyEncodedBoxesAndLables Co-authored-by: root <root@ixt-rack-32.local.lan> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: root <root@jenkins-worker-rocm-amd-104.local.lan> Co-authored-by: rrawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: root <root@gb-sjc2-03.local.lan> Co-authored-by: root <root@ixt-rack-164.local.lan> Co-authored-by: root <root@IXT-RACK-43.local.lan> * OpenVX - fix a bug in vxMapRemapPatch API (#647) - return the stride_y and the correct buffer for the remap object - this fixes the MapRandomRemap random failure issue in CTS 1.3 * Apps - mivisionx_openvx_classifier - Bug Fix + Adding Logos (#649) * CMakeList fix - apps build to /opt/rocm/mivisionx/bin * fixing bugs in apps and adding logos * adding logos and chaning path * OpenVX HIP GPU backend - add HIP support for some vx APIs (#650) * image augmentation - add logos (#652) * Rr/fused decoder update (#653) * remove unnecessary memcpy * use fstream file file reading * remove print statement * code clean_up and merge with upstream * add openmp pragma for copy to output * revert fused_crop_decoder changes * fused crop decoder changes to avoid extra memcpy * adding logo to inference analyzer app (#654) * Neural Network Extension, HIP GPU backend - use object library for the HIP backend (#651) * OpenVX - Add missing API (#655) * OpenVX HIP GPU backend - use object library for the HIP kernels (#656) * OpenVX HIP GPU backend - use object library for the HIP kernels * disable codecov for HIP backend as it's not working for the object libraries * dont report code coverage for the HIP backend as it is temporarily disabled * OpenVX 1.3 - laplacian fix (#657) * initialize output image buffer with zero * code cleanup * laplacian fix * add log entry * fix log entry Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> * rocAL HIP GPU backend- switch to object library for the HIP rocAL kernels (#660) * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * Review Comments Updated * Review Comments Updated * Review Comments Updated * Review Comments Updated * Review Comments Updated * Include Path Reorg Update * Neural Network extension - check if the MIOpen's config file exists before reading it (#661) * Neural Network extension - check if the MIOpen's config file exists before reading it * clean up * rocAL - fix for some codacy warnings (#662) * fix for some codacy warnings * get rid of codacy bug in context null ref checking * Neural Network Extension HIP GPU backend - add support for the permute/Tensor_log/Tensor_exp layers (#664) * rocAL - Video pipeline Support (#640) * Fix a few bugs * caffe and caffe2 changes for optimization * removed commented code * Fix issue with SSD meta node. * Add support for original width and height for TF Detection * Add change required to reflect mean and std values in an images pixel value * Clean codes * Hardcoding the key values for tf detection and classification * Release RingBuffer memory * Modify RALI API's to return the bbox coords and bbox labels for all images in the output batch * Update rali_unittest.cpp * Clean repo * Add batching support in PYTHON API for labels,bboxes,img_sizes excluding image_names * Add support for image names for batching support * Merge RALI_Upgrade * Update raliunittest.cpp * Add support for bytes instead of str in rali pybind * Code clean Up * Fix codacy issues * Fix codacy issues * Fix indendation error * Fix codacy warnings * Fix trailing spaces warnings * Fix scope of the variable can be reduced warning * Fix errors in the RALI API * Fix Codacy warnings * Remove extra empty lines * Add support for Box Encoder in coco_pipeline.py * Add codes to retrieve meta data information in loader module * Add support for One Hot Labels for all classification based Readers * Add codes to retrieve meta data information in decoder module * Introduce data loader for coco reader using partail decoder * Add casting Support for Encoded Labels * Add support for RandomBBoxCrop augmenation * Clean codes * Introduce RandomBBoxCrop_MetaData Reader to store the crop params returned by RandomBBoxCrop function * Update RandomBBoxCrop_MetaData Reader * Add meta data update support for both vertical and horizontal flip. * Add RandomBBox support. Introduced map to store image name and crop generated by randomBbox. Look up to fetch CropCordsBatch. Get functionality to get crop wrt to image_name as key. Fixed the seg fault. * Add support for BBFlip * Add support for Random BBox Crop Reader to be part of load routine. Fetches the crop of the image to be decodes and does partial decoding for crop part. * Fix the warnings. * Fix issue with the meta data updation in master graph. * Add API changes. Fix issues with RandomBBoxCrop algorithm. * Add support for Random BBox Crop & ImageDecoderSlice * Small changes in Box Encoder * Small Change in Anchor boxes input comment * Add changes for Crop Dim exceding Image Dim * Minor Changes for RandomBboxCrop * Fix issues with RandomBboxCrop. * Fix Reader seg fault issue. * Add minor Changes for fp16 * Minor Changes * Clean Codes * Minor Changes for Random BBox Crop * Add support for Multi-GPU * Minor Changes for Multi-GPU support in COCO file souce partial * Remove unwanted code. * Minor Changes in RandomBBoxCrop * Fix Random BBox Crop * Comment out the print statements * Minor Changes for RBBOX * Code clean up. * Fix warnings wrt Ubuntu 20.04 * Resolve codacy warnings * Resolve codacy warnings * Fix PR issues. * Revert back Slice to Crop * Resolve Codacy warnings * Resolve codacy issues * Resolve Minor codacy issue * Fix issue to make branch compatible with AMDRPP master TOT. * Fix the crop_x difference in partial decoding image crop * Fix issue with crop fixed. * Crop_x & crop_y value fixed. * Check bounday conditions and update crop params. * Fix the crop_width difference in partial decoding image crop * Change wrt invalid_bboxes. * Rewrite RandomBBoxCrop Code * Fix issue with random generation. Used randomdevice for seed when initializing random param. * Fix Key value zero error. Error introduced by partial decoding crop correction. Fixed by adjusting the calculation of top and right. * Add codes for Video Reader and Loader module (cherry picked from commit d97ab927e9a6597fde666f7dec50ba987259dccf) * Changes in image_augmentation library (cherry picked from commit 153f59bd377cf7cbd87ecf58d722cd88ba79ff22) * Fix Build issues (cherry picked from commit e023b104c9a220e4745b83693b6c38a7411b54fc) * Fix Build errors (cherry picked from commit 908c599fe7b300dfd837a87215b33295bc2752a9) * Add reader type for video reader (cherry picked from commit 843a6cd6a6436513b4c000813a48b66461768ecc) * Adding codes to decode video input file (cherry picked from commit 2c249d7312e0c746fac600b5e38c5b4cb16f1910) * Introduce Video Decoder module to decode video files (cherry picked from commit 4d18ea384a2599aed3b0a0c975d7cd0a343720d2) * Add decoder functions in FFMPEG_VIDEO_DECODER (cherry picked from commit 02224f9601ee4269c68f6725a19468a171718276) * Clean Video Decoder codes (cherry picked from commit 8699179b282aa7870f60ca670a19b512ca45d9ac) * Clean codes to remove build issues (cherry picked from commit 39c5f45ff875d111f2977af0487e9cf8d22174c2) * Clean codes * Initial changes for video reader pipeline. [NBC] * To handle sequence length. * To handle shuffle. * Temp local changes. [NYC] * Changes in the video reader pipeline. [NWC] * Video Reader changes * Fix the segmentation fault in the video reader pipeline * Add support to save decoded output frames in video decoder * Working Pipeline - Single Video file input Add support to modify internal and user batch size in master graph Add ffmpeg seek operation * Minor Changes * Add support for decoding multiple video files and shuffle * Add support to initialize ffmpeg context for each video decoder instance * Code cleanup * Fix issue in Shuffling the images in video reader * Add seek_frame function in video decoder * Code clean up * Update rali_unittest * Add folder based label meta data reader for video reader * Add support for Sequence Reader in RALI * Fix codacy issue * Add Sequence Rearrange initial setup. Works only for sequence length equal to video reader. Introduce ovx node sequence rearrange to support. Introduce API in rali_api_augmentations. * Fix issue with Sequence Rearrange with different sequence length. * Introduce raliVideoFileResize node in RALI to fuse video decoding and resize * Add new_sequence_length parameter to sequence rearrange * Add sequence rearrange algorithm for RGB images * Add support for Sequence Reader in RALI * Fix random shuffling of sequences in video reader * Add support for folder based reader and label support for video decoder and labels. * Clean codes * Fix issue in raliVideoReaderResize * Code clean up. * set batchsize to internal batch size in video pipeline loaders. * Add flag in master graph to switch between video and image pipelines. * Add step and stride parameter to VideoReader and SequenceReader * Fix issue with the sequence rearrange. * Adjust remaining image count in master graph wrt sequence rearrange. * Add meta data support for video reader folder based. * Update decode image info name according to stride * Minor bug fix * Add support for text file input Add support to fetch video properties from text file Modify reader to read from the start to end frame specified in text file Add meta data support for text file input to the video reader * Add support to process repeated file inputs in text file * Add meta data reader support to parse timestamps from text file Introduce enable_timestamps parameter and set_timestamps_bool to the meta data readers * Add rali_video_unittests Video Reader Vidoe Reader Resize Sequence Reader Sequence Rearramge * Code clean up * Fix maximum limit for decoder instance creation. Check if instance is there for the video file if not initialize one using previously created instance. * Fix warnings. * Minor fix * Add file_list_frame_num parameter To switch between timestamp or frame number input passed with text file * Add data samples for testing Add video samples Add coco sample data with 10 images for train and val * Add support to generate frame number and timestamps output * Fix multiple video file input to video pipeline * Add labelled video folder samples * Modified test suite Modified rali_video_unittests.cpp Add testScript.sh to build and execute rali_video_unittests Remove video pipeline tests from rali_unittests.cpp * Code clean up * Modify frame_rate variable * Add step and stride parameters to SequenceReaderSingleSharded * Minor fix * Modify ffmpeg video decoder functions Initialize the ffmpeg context once for each video file * Fix ffmpeg deprecation warnings * Modify ffmpeg video decoder Add width, height, stride and pixel format paramters to Decode * Code clean up Change Video label reader folders to Video label reader * Remove text file input parameter to dataloader * Add support to check variable frame rate videos * Minor changes * Minor fix * Code clean up * Code clean up * Change rali to rocAL * Merge branch 'AMD-Master' into video_devel * Resolve build issues Code clean up * Fix bug with Sequence Rearrange * Add sharding support to Video Reader * Add sharding support for Sequence Reader * Introduce decoder mode parameter * Add U8 support for Sequence Rearrange Minor changes * Add SingleShard API for video readers * Add support to decode more than one sequence Modify the load routine to decode more than one sequence Add sequence count parameter to Sequence rearrange * Merge branch 'video_devel_PR' of https://github.com/MCW-Dev/MIVISION into video_devel_PR * Fix SequenceReader and SequenceReaderSingleShard * Resolve merge conflicts * Minor fix * Add codes for multithreading * Fix build isssue with HIP backend * Fix warnings * Resolve codacy issues Remove blank lines Adjust spacing * Resolve codacy issues * Modify the sequence reader arguments of the ImageLoaderNode * Remove rocAL sample data * Minor changes Add RALI_VIDEO flag to few files * Add seperate VideoReader Introduce VideoFilesourceReader and VideoReaderConfig * Introduce SequenceInfo struct Minor changes * Fix codacy issues in video unit test testScript.sh * Minor fix * Minor bug fix * Introduce the latest FFmpeg API in ffmpeg_video_decoder.cpp * Merge branch 'PR_changes' of https://github.com/fiona-gladwin/MIVisionX into video_devel_PR * Video Pipeline changes * Batch size changes for Video Reader * Video Pipeline Meta data reader changes to store the meta data for each sequence and not for every frame in the sequence * Code cleanup - Video Reader changes * Batch size variable changes Change batch size and internal batch size variables to constant. Introduce batch size and batch ratio variables for the Sequence Reader in master graph. * Change datatype of frame_rate to float * Enable HIP Backend support for video pipeline * Add codes to dump the images in each batch as AVI video file * Minor change in video unit tests * Add HIP backend support for sequence rearrange * Add OpenCL backend support for sequence rearrange * Add condtion to disable ResizeNode update in raliVideoFileResize if resize width and height is same as the videos * PR changes * Fix single folder of images issue in Sequence Reader * Fix codacy issues * PR changes * API changes * Introduce separate output routine for the video pipeline * PR changes * PR changes * Fix for codacy issues Co-authored-by: LokeshBonta <lokeswara@multicorewareinc.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: fionagladwin <fionagladwin@multicorewareinc.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> * Docker - Ubuntu Updates (#667) * Docker - RPP Update Version (#668) * Windows - README updates (#671) * Update README.md * Update README.md * Update CMakeLists.txt Include Path Reorg Changes - moved to include/<compnm>/ * Update CMakeLists.txt * Update CMakeLists.txt Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: Fiona-MCW <70996026+fiona-gladwin@users.noreply.github.com> Co-authored-by: Liam Wrubleski <Liam.Wrubleski@amd.com> Co-authored-by: Liam Wrubleski <lwrubles@amd.com> Co-authored-by: Hansel Yang <hanselyang123@gmail.com> Co-authored-by: frepaul <71665912+frepaul@users.noreply.github.com> Co-authored-by: swetha097 <59434434+swetha097@users.noreply.github.com> Co-authored-by: root <root@ixt-rack-32.local.lan> Co-authored-by: root <root@jenkins-worker-rocm-amd-104.local.lan> Co-authored-by: root <root@gb-sjc2-03.local.lan> Co-authored-by: root <root@ixt-rack-164.local.lan> Co-authored-by: root <root@IXT-RACK-43.local.lan> Co-authored-by: Indumathi31 <59440990+Indumathi31@users.noreply.github.com> Co-authored-by: LokeshBonta <lokeswara@multicorewareinc.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: fionagladwin <fionagladwin@multicorewareinc.com> Co-authored-by: LakshmiKumar23 <kumar.lakshmi1994@gmail.com>

…w#32) * Changed Channel extract and channel combine function call * updated erode dilate kernals [OCL] * Non Working [FULLY BUILD] code for min_max_loc and mean_stddev * Updated Rain GPU kernel for multiple destination image calls [OCL] * Updated Median, Non Max and Histogram and added support for mean * Updated tensor [OCL] * updated table lookup [OCL] * small updates in mean and stddev [OCL] * Full functioning code for mean and standard deviation [OCL] * Added Support to Min Max Location [OCL] * Added support for gaussian_image_pyramid [OCL] * Added support for laplacian_image_pyramid [OCL] * small modification in LIP [OCL] * small modification in Min Max Location and Mean stddev [OCL] * box filter hisEq [OCL] * Added support for gaussian filter * Added support for bin in Histogram [OCL] * updated sobel [OCL] * Update in Temperature [CPU] * FIX SNP CPU half noise issue [OCL] * fin small change in Absolute difference [OCL] * Small changes in Custom convolution and table lookup [OCL} * Fix regressions due to scripting [cl & CPU]. * fix histogram [OCL] * Updated snow [OCL] * updated snow [CPU] * small update in Snow [OCL] * Modify filter_operations to add gaussian_filter with same backend as blur * Fix issue with rain Grey Scale [OCL] * Fix Rain GPU Transparancy [OCL] * Add Kernel Caching using Map/Kernelmanger * Resolved histogram grayscale issue in GPU * Resolved histogram grayscale issue in GPU * Fix the bug in warp affine planar call * Fix issue with resize crop validation [cl & CPU]. * Cl_enque_buffer, the argument is set to CL_FALSE * Fix Gamma correction [OCL] * minor changes to gamma_correction, vignette commons, flip functionalities * Fix Jitter with new Implementation [CPU & OCL] * Modify brightness bug that gave patches in output * Fix the buy with Lens correction [OCL]. * Fix Median filter issue * Modify rotate to match GPU functionality * Fix median filter * merge abi-dev-host-ms4 to main-hipcl-dev * Fix a round about fix for Hue and Saturation Shift * Modify scale to match GPU functionality * Fix a round about fix for Hue and Saturation Shift * Fix syntax error in hsvkernel * changed CL_False to CL_True in minmax location * Resolve merg * Modify warp affine to match GPU - inversion exists * Add validation for Warp Affine Matrix * changes in Warp Affine * Add Blocking calls [CL_TRUE flag is on] * Removed validation printf statements in the library * Removed a syntax error * Add extra validation for contrast * Fix issue with rain [CPU] * Added support to new Pixelate [OCL & CPU] * Fix issue with Fish eye [OCL] * Modify Histogram Implementation * Histogram Balance Fix * Update Readme.md Amended the list * Update Readme.md * Fix Histogram Planar Version * Add new support to Histogram [OCL] * Remove all files to include batch version * Move Mem-Mgmt_HIP branch files to master * Update Readme.md * Put all the recent changes or RPP here * Fix Border issues in crop mirror normalize and crop * Fix Crop mirror normalize border issue * Add RPP UnitTests * Add f32 support for crop_mirror_normalize * Add f32 support for crop * Add f32 support for resize_crop_mirror * Add f32 support for resize and resize_crop * Add f32 support for color_twist * Correct blur * Add f32 support for rotate * Add f16 host support for rotate, resize, resize_crop, crop, resize_crop_mirror, crop_mirror_normalize, color_twist * Major changes to host test suite * Separate host test suites for pkd3 and pln1 * modify rpp_unittests host * correct additional folder creation and readme * Minor correction in pln1/pkd3 host test scripts * Add basic float tensor support * Add FP32 and FP16 support for Crop function * Fix bug in crop * crop mirror normalize report * Float Support for Rotate GPU * Add Kernel Support in OCL for colorTwist and resize funtionalities * Add float support for ColorTwist and Resize Crop Mirror - FP16 and FP32 * Code Refactoring and Rotate Support for FP16 and FP32 * Fix Rotate Float issue * Fix FP32 Rotate Issue * Add Resize Function * Add Resize Crop Mirror in GPU OCL * Fix Typo * Add Resize Crop GPU FP16 and FP32 support * Update rppdefs.h * Crop Mirror Normalize Support is added * Support for ColorTwist in Float space * Update Colortwwist.cl - temp * Remove MIOPEN dependency in RPP build set-up * Update colortwist.cl * Fix Bug in ColorTwist * Fix Bug in ColorTwist (shobana-mcw#6) * API refactoring for fused_functions * Fix make_data-type bug and code formatting * Testsuite for Float Support Functions * Removed the brace in switchcase * Add free statements for unreleased memomry and f16 fix for colortwist * rename folders * Fix Resize for U8 case * minor change in BatchPD host * Fix type error in resize.cl * Fix float errors for resize fucntions * foramt file * Fix Bug in ColorTwist (shobana-mcw#6) (shobana-mcw#8) * Fix Bug in ColorTwist (shobana-mcw#6) (shobana-mcw#8) (shobana-mcw#9) * Update * update (shobana-mcw#10) * Fix Bug in ColorTwist (shobana-mcw#6) * Fix Bug in ColorTwist (shobana-mcw#6) (shobana-mcw#8) (shobana-mcw#9) * Update * Format files * New Changes (shobana-mcw#11) * Fix Bug in ColorTwist (shobana-mcw#6) * Fix Bug in ColorTwist (shobana-mcw#6) (shobana-mcw#8) (shobana-mcw#9) * Update * Format files * Correct f16 color twist host bug * Change test suite to input 0-1 normalized values for all f16/f32 functionalities * Refactor API code for geometry_transforms * Added Testsuite for Float Functions in OCL * AMD Docs * Create install.rst * Update index.rst * Add host support for u8->f16 and u8->f32 for resize, crop, crop_mirror_normalize * Add host support in test suite for u8->f16 and u8->f32 for resize, crop, crop_mirror_normalize * Add host support for i8 in resize, crop, cmn, rotate, resize_crop, resize_crop_mirror and color_twist * Add host test suite support for i8 * Add host support for u8->i8 in crop, resize, crop_mirror_normalize * modify test suite * Add host plan1 test suite to SOW3_HOST * crop mirror normalize full support in w.r.t type change and layout change * Add API calls for CMN function for new set of variations * Fix bug with respect to I8 * change type info in kernels * Fix cmn bub * Support I8 for Rotate * Int 8 support for colortwist and code refactoring * Add int8 support for resize crop mirror function * resize crop mirror int8 support is added * Crop various variations are added * Add crop support for all the conversions * Add host support for resize outputFormatToggle * Add host support for crop outputFormatToggle * Add host support for rotate outputFormatToggle * Add host support for resize_crop outputFormatToggle * Add host support for resize_crop_mirror outputFormatToggle * Add host support for crop_mirror_normalize outputFormatToggle * Add host support for color_twist outputFormatToggle and all other pln->pkd support * Add missing pln3 API for crop host * Major modifications in test suite and ReadMe for pkd3, pln3 and pln1 inputs for host * Modify resize kernel * Add outputtoggle in the API and functions * Add new changes to all the fused function w.r.t to outputFormatToggle * Add pln3 api for Crop on GPU * add missing API for resize cro * Fix compilation bugs * Remove unnecessary functions and fix build bug * Add ocl testing framework * Fix bug in rotate helper * Minor temp changes in test code to accomodate PKD3 input U8 cases with toggle format * Correct resize_u8_i8_pkd * Fix resize kenel issues for output toogle change * colortwist bug fix * Fix colortwist bug * resize tensor fix * Minor mods to both pln3 and pkd3 test suite to accomodate CMN's ability to do U8 format toggles * Corrections in PLN3 input funcitons for host * Fix bugs in Fused function new code * Add changes relatedd to planar format in padded * Fix issues with pln3 colortwist * Fix issue with test suite * Add pln3 testing and fix issues * Modify a few things in test script * Fix pln3 issue for FP16 for Rotate * Fix index issues with Test suit * Add output layout toggle for host API * ix pln3 issues in test suite Fix pln1 issues in testsuite Fix other minor bugs * Change paramerter order in resize pd pln host * remove print statements * Update README.MD * Codacy issues corrections in utilities/rpp-unittests * Codacy issues corrections for resize kernel * Codacy issues corrections in utilities/rpp-unittests OCL/HIP * Codacy issues corrections in utilities/rpp-unittests * Codacy issues corrections in utilities/rpp-unittests * Fix some codecy issues * Remove some Codecy issues in rpp unnittests * Remove a few codecy issues * Remove Print statements Co-authored-by: Muthukumaravel <muthukumaravel@multicorewareinc.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: LokeshBonta <you@example.com> Co-authored-by: Reza <Seyedreza.Najafi@amd.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com>

* All tensor functions in vx_rpp

fixed so that batch norm layer does not convert tensors to FP16

8dcb6eb

kiritigowda requested review from hansely and LakshmiKumar23 January 25, 2019 01:08

kiritigowda added the bugfix Bug fixes to existing features label Jan 25, 2019

kiritigowda assigned paveltc Jan 28, 2019

LakshmiKumar23 approved these changes Jan 29, 2019

View reviewed changes

kiritigowda removed the request for review from hansely January 30, 2019 18:01

kiritigowda approved these changes Jan 30, 2019

View reviewed changes

kiritigowda merged commit aa5d3ab into ROCm:master Jan 30, 2019

hansely pushed a commit to hansely/MIVisionX that referenced this pull request Aug 12, 2019

Logo Replaced (ROCm#32)

d1689ba

swetha097 referenced this pull request in swetha097/MIVisionX Aug 21, 2023

VX_RPP Extend OpenVX tensor support for augmentations in RPP (#32)

5bbd4bd

* All tensor functions in vx_rpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed so that batch norm layer does not convert tensors to FP16 #32

fixed so that batch norm layer does not convert tensors to FP16 #32

paveltc commented Jan 25, 2019

fixed so that batch norm layer does not convert tensors to FP16 #32

fixed so that batch norm layer does not convert tensors to FP16 #32

Conversation

paveltc commented Jan 25, 2019