rocAL - PR 529 + updates (#539)

* Optimize scale, warpaffine, warpperspective, lut * Optimize filters - sobel, median, erode, dilate, box * cherry-pick "Build Fix - Release/Debug (#423)" from MIVisionX/master branch * Release/Debug Build Fix * CMakeList.txt cleanup * Readme Updates * cmake clean up for hip * CXX Flags & OpenVX Version Update * Add support for HarrisScore_HVC_HG3_7x7 * Add lut and convolve memory support in HIP * optimize float4_to_s16s function for arithmetic kernels - use vector data type for writting to oa buffer for better performance compared to pixel by pixel write * use make_short4 * optimize s16s_to_float4_ungrouped function to use vector read for s16 data type * Optimized Color Convert kernels * Modifiied LUT kernel * Modifiied LUT kernel * update node names in VisionTests script * optimize ColorDepth kernels * Add new coding style for arithmetic/logical/color hip kernels * Merge pull request #32 from asalmanp/as/hip_kernels_style Add new coding style for arithmetic/logical/color hip kernels * Add auto OCL dump generator script * Add gdfs for arithmetic, logical, color kernels * Modify arithmetic kernels as per new std * Add the missing buffer_offset to the hip_memory * Arithmetic kernels fixes * Modify logical kernels as per new std * Revert to previous min max impl * changed Threshold to support new OpenVX 1.3 format (#38) Co-authored-by: paveltc <pavel.tcherniaev@amd.com> * add the optimized ChannelExtract_U8_U32_Pos0 and ChannelExtract_U8_U24_Pos0 color kernels * Threshold - Update to 1.3 * Add new gdfs and modify generator script * Jenkins - Check Build & Artifacts * Tests - Fix platform name * Modify generator script for ocl/hip dumps and fixes for gdfs * Add optimized box filter * Modify kernelGDFs, automate script for OCL/HIP bin dumps for different image sizes * Optimize phase, magnitude, weighted average and remove trailing spaces * Optimize magnitude, phase, weighted_average, Minor fix * Formatting fixes * Formatting changes * modify hip pack_ function to fix SAT issue in some kernels * Place kernelGDFs in independent folders * Fix runvxTestAllScript, readme and Modify gitignore * Revert "Optimize phase, magnitude, weighted average and remove trailing spaces" This reverts commit ae97d35. * Move all common types/device codes into a new header * GPU Fix - multiply gpu (#39) * CMake * multiply fix * code cleanup * GPU Flow - Canny Fix (#36) * CMake * canny fix * code cleanup * optimize hip_clamp function * Partial changes to color kernels * Optimize color kernels * Cleanup * Change typecast float to make_float4() * Add UYVY/YUYV options for ChannelExtract * Modify globalThreads_x and globalThreads_y * Kernel GDF modifications * Script enhancements - add support for single kernel testing, optional build * Edit script readme * minor optimization for Phase kernel * fix comment * GPU Flow - Bug Fixes (#35) * fixes GraphROI.Simple & vxMapRemapPatch.MapRandomRemap * Graph.GraphState * fixes Threshold.OnRandom/4/Graph/BINARY/U8/U8 * removing unwanted commits * fixes Threshold.OnRandom/5/Graph/BINARY/S16/U8 * fixes Threshold.OnRandom/7/Graph/RANGE/S16/U8 * removing unnecessary changes * Add filter kernel GDFs * Add test script support for filter kernel diff checks * Optimizations for filter kernels - initial commit * Optimize ScaleGaussianHalf, other minor fixes * Correct some test names in runVisionTests script * Disable ScaleGaussianHalf temporarily * Optimize Median3_/min3_/max3_ * Fix convolotion issue for hip * fix seg fault for ScaleGaussian * Add support for channelCopy and Lut * Minor change * Optimize statistical kernels * Optimize UV12/UV/IUV and ScaleUp2x2 * Minor change * Add kernelGDFs for IUV/UV12/UV converts, threshold, convolve * Update runVisionTests.py and runvxTestAllScript.sh to run with arithmetic/logical/color/filter/statistical kernels * Add uniform-image inputs with hex pixel values * Remove all U1 kernel testing * Test script mods * Uncomment all kernels except geometric/vision * Minor fix * Optimize geometric kernels - initial commit * Minor changes * Mods to use floorf, mul24, mad24, Scale_U8_U8_Area * ScaleImage_U8_U8_Area fixes and Remap initial commit * Remove #defines for remap * Pass hip_memory for remap * Enable scale, warpAffine, warpPerspective testing * Add kernelGDFs for geometric functions, runvxTestAllScript.sh update * Fix the bug for ScaleImage_Bilinear_Constant and ScaleImage_Bilinear_Replicate * GDF and test script corrections * Disable kernels with attr * Disable UV12/UV/IUV converts and ScaleUp2x2 * Add vision kernelGDFs * Vision kernels - initial commit * Modify helpers to use hip built in functions * Remove code used for testing * Minor changes * use consistent device function names and code clean up * remove extra semicolon * switch to builtin functions for hip_lerp * Formatting fixes * minor cmake change to print HIP path/version correctly * Modify harris corners * Test script mod * cmake file changes for building GPU backends and CPU properly * code clean up to make it more readable that there will be a fatal error if OPENCL or HIP not found in the case of the default GPU_SUPPORT=ON * Remove samples/hip_samples, Add openvx_runvx_tests * Enhance runvxTestAllScript, Change ReadMe * Formatting fixes, Code cleanup * Rename openvx_runvx_tests to openvx_node_tests * fix a seg fault for Canny node * remove unused parameter from CannySuppThreshold * Delete vision_tests outer folder * Enhancements to runVisionTests.py * Remove blank lines * Vision kernel mods * Formatting fix * Codacy fixes 1 * Codacy fixes 2 * Codacy fixes 3 * fix cmake * Make pandas optional * Code cleanup * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Codacy issue fix * Add backend_type OCL * Fix CMake issues for HIP backend build. Fix issues caused by merge. * Add support for HIP backend. * add support for VX_DIRECTIVE_AMD_COPY_TO_HIPMEM * Add HIP backend support for Resize crop function. Modify unittest to save all images in local folder (test HIP support). * Fix minor issues in HIP backend. * Fix rocAL Pybind build issue. Update rocAL README.md for TurboJpeg installation. * Fix brightness updation issue. Set random seed in paramter factory constructor. * Fix issue with CMake to work for OCL and HIP backend. * Fix requested deviceID not found error. * Fix issue with HIP load routine. * Rename rali to rocAL. * Fix merge issues. * Fix build issue for rocAL pybind module. (cherry picked from commit 0e1a43a) * Add prefetching support in RALI pipeline. (cherry picked from commit 0d5cf66) * Fix build warnings. (cherry picked from commit b063ca6) * Fix warnings. * Clean up. * Fix merge issues. * Made suggested PR changes. * Fix build error. * set correct affinity in amd_rpp * Add CMake changes and fix codacy warnings. * Fix core dump issue in rali unittest. * Fix build issue. * cmake cleanup * fix for review comments and unit_test change * fix build error for OpenCL backend Co-authored-by: Kiriti Nagesh Gowda <kiritigowda@gmail.com> Co-authored-by: r-abishekmcw <abishek@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiriti.nageshgowda@amd.com> Co-authored-by: Abishek <52214183+r-abishekmcw@users.noreply.github.com> Co-authored-by: Aryan Salmanpour <aryan.salmanpour@amd.com> Co-authored-by: Swetha B S <swetha@multicorewareinc.com> Co-authored-by: Ulagammai <ulagammai@multicorewareinc.com> Co-authored-by: fiona-gladwin <fionagladwin@multicorewareinc.com> Co-authored-by: Ulagammai <--local> Co-authored-by: Pavel Tcherniaev <ptcherni@amd.com> Co-authored-by: paveltc <pavel.tcherniaev@amd.com> Co-authored-by: Hansel Yang <hansyang@amd.com> Co-authored-by: LakshmiKumar23 <lakshmi.kumar@amd.com> Co-authored-by: shobana-mcw <shobana@multicorewareinc.com>
ROCm · Jun 29, 2021 · 5f6ab34 · 5f6ab34
1 parent f2a616b
commit 5f6ab34
Show file tree

Hide file tree

Showing 87 changed files with 2,233 additions and 572 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,15 +1,15 @@
 # Copyright (c) 2015 - 2020 Advanced Micro Devices, Inc. All rights reserved.
-# 
+#
 # Permission is hereby granted, free of charge, to any person obtaining a copy
 # of this software and associated documentation files (the "Software"), to deal
 # in the Software without restriction, including without limitation the rights
 # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 # copies of the Software, and to permit persons to whom the Software is
 # furnished to do so, subject to the following conditions:
-#  
+#
 # The above copyright notice and this permission notice shall be included in
 # all copies or substantial portions of the Software.
-#  
+#
 # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE

diff --git a/amd_openvx/openvx/ago/ago_drama_alloc.cpp b/amd_openvx/openvx/ago/ago_drama_alloc.cpp
@@ -1,16 +1,16 @@
-/* 
+/*
 Copyright (c) 2015 - 2020 Advanced Micro Devices, Inc. All rights reserved.
- 
+
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
- 
+
 The above copyright notice and this permission notice shall be included in
 all copies or substantial portions of the Software.
- 
+
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
@@ -592,7 +592,6 @@ static int agoOptimizeDramaAllocSetDefaultTargets(AgoGraph * agraph)
                 agoAddLogEntry(&node->akernel->ref, status, "ERROR: kernel %s: query_target_support_f(*,*,%d,*) => %d\n", node->akernel->name, vx_false_e, status);
                 return -1;
             }
-            supported_target_affinity &= ~AGO_KERNEL_FLAG_DEVICE_GPU;
 #endif
             node->target_support_flags = 0;
             if (supported_target_affinity & AGO_KERNEL_FLAG_DEVICE_CPU) {
@@ -762,7 +761,7 @@ static int agoOptimizeDramaAllocMergeSuperNodes(AgoGraph * graph)
     }
     // perform  one hierarchical level at a time
     for (auto enode = graph->nodeList.head; enode;) {
-        // get snode..enode with next hierarchical_level 
+        // get snode..enode with next hierarchical_level
         auto hierarchical_level = enode->hierarchical_level;
         auto snode = enode; enode = enode->next;
         while (enode && enode->hierarchical_level == hierarchical_level)
@@ -891,14 +890,14 @@ int agoOptimizeDramaAlloc(AgoGraph * agraph)
     // make sure all buffers are allocated and initialized
     for (AgoData * adata = agraph->dataList.head; adata; adata = adata->next) {
         if (agoAllocData(adata)) {
-            vx_char name[256]; agoGetDataName(name, adata); 
+            vx_char name[256]; agoGetDataName(name, adata);
             agoAddLogEntry(&adata->ref, VX_FAILURE, "ERROR: agoOptimizeDramaAlloc: data allocation failed for %s\n", name);
             return -1;
         }
     }
     for (AgoData * adata = agraph->ref.context->dataList.head; adata; adata = adata->next) {
         if (agoAllocData(adata)) {
-            vx_char name[256]; agoGetDataName(name, adata); 
+            vx_char name[256]; agoGetDataName(name, adata);
             agoAddLogEntry(&adata->ref, VX_FAILURE, "ERROR: agoOptimizeDramaAlloc: data allocation failed for %s\n", name);
             return -1;
         }

diff --git a/amd_openvx/openvx/ago/ago_kernel_api.cpp b/amd_openvx/openvx/ago/ago_kernel_api.cpp
@@ -5288,7 +5288,7 @@ int agoKernel_And_U8_U1U8(AgoNode * node, AgoKernelCommand cmd)
             status = VX_FAILURE;
         }
     }
-#endif
+	#endif
     return status;
 }
 
@@ -22611,4 +22611,4 @@ int agoKernel_LaplacianReconstruct_DATA_DATA_DATA(AgoNode * node, AgoKernelComma
         status = VX_SUCCESS;
     }
     return status;
-}
+}
diff --git a/amd_openvx/openvx/api/vx_api.cpp b/amd_openvx/openvx/api/vx_api.cpp
@@ -407,6 +407,18 @@ VX_API_ENTRY vx_status VX_API_CALL vxSetContextAttribute(vx_context context, vx_
                 status = VX_SUCCESS;
             }
             break;
+#elif ENABLE_HIP
+        case VX_CONTEXT_ATTRIBUTE_AMD_HIP_DEVICE:
+            if (size == sizeof(hipDevice_t)) {
+                if (context->hip_device < 0 && agoGpuHipCreateContext(context, context->hip_device) != VX_SUCCESS) {
+                    status = VX_FAILURE;
+                }
+                else {
+                    *(int *)ptr = context->hip_device_id;
+                    status = VX_SUCCESS;
+                }
+            }
+            break;
 #endif
         default:
             status = VX_ERROR_NOT_SUPPORTED;
@@ -3360,6 +3372,14 @@ VX_API_ENTRY vx_status VX_API_CALL vxQueryNode(vx_node node, vx_enum attribute,
                     status = VX_SUCCESS;
                 }
                 break;
+#elif ENABLE_HIP
+            case VX_NODE_ATTRIBUTE_AMD_HIP_STREAM:
+                if (size == sizeof(hipStream_t)){
+                    AgoGraph * graph = (AgoGraph *)node->ref.scope;
+                    *(hipStream_t *)ptr = graph->hip_stream0;
+                    status = VX_SUCCESS;
+            }
+            break;
 #endif
             default:
                 status = VX_ERROR_NOT_SUPPORTED;

diff --git a/amd_openvx/openvx/include/vx_ext_amd.h b/amd_openvx/openvx/include/vx_ext_amd.h
@@ -1,16 +1,16 @@
-/* 
+/*
 Copyright (c) 2015 - 2020 Advanced Micro Devices, Inc. All rights reserved.
- 
+
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
- 
+
 The above copyright notice and this permission notice shall be included in
 all copies or substantial portions of the Software.
- 
+
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
@@ -47,7 +47,7 @@ THE SOFTWARE.
 #define AGO_TARGET_AFFINITY_GPU_INFO_SVM_AS_CLMEM      0x20
 #define AGO_TARGET_AFFINITY_GPU_INFO_SVM_NO_FGS        0x40
 
-/*! \brief Maximum size of scalar string buffer. The local buffers used for accessing scalar strings 
+/*! \brief Maximum size of scalar string buffer. The local buffers used for accessing scalar strings
 * should be of size VX_MAX_STRING_BUFFER_SIZE_AMD and the maximum allowed string length is
 * VX_MAX_STRING_BUFFER_SIZE_AMD-1.
 * \ingroup group_scalar
@@ -156,6 +156,7 @@ enum vx_node_attribute_amd_e {
     VX_NODE_ATTRIBUTE_AMD_AFFINITY                      = VX_ATTRIBUTE_BASE(VX_ID_AMD, VX_TYPE_NODE) + 0x01,
     /*! \brief OpenCL command queue. Use a <tt>\ref cl_command_queue</tt> parameter.*/
     VX_NODE_ATTRIBUTE_AMD_OPENCL_COMMAND_QUEUE          = VX_ATTRIBUTE_BASE(VX_ID_AMD, VX_TYPE_NODE) + 0x02,
+    VX_NODE_ATTRIBUTE_AMD_HIP_STREAM           = VX_ATTRIBUTE_BASE(VX_ID_AMD, VX_TYPE_NODE) + 0x03
 };
 
 /*! \brief The AMD image attributes list.

diff --git a/amd_openvx_extensions/CMakeLists.txt b/amd_openvx_extensions/CMakeLists.txt
@@ -24,49 +24,85 @@ project(amd_openvx_extensions)
 list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/../amd_openvx/cmake)
 
 if(GPU_SUPPORT)
+    find_package(AMDRPP QUIET)
     if("${BACKEND}" STREQUAL "OPENCL")
         find_package(OpenCL QUIET)
         if(OpenCL_FOUND)
             SET(BUILD_OPENCL True)
             message("-- ${Blue}amd_openvx_extensions BACKEND set to ${BACKEND}${ColourReset}")
             find_package(OpenCV QUIET)
             find_package(FFmpeg QUIET)
-            find_package(AMDRPP QUIET)
             find_package(miopengemm PATHS ${ROCM_PATH} QUIET)
             find_package(miopen     PATHS ${ROCM_PATH} QUIET)
         else()
             set(GPU_SUPPORT OFF)
             set(BUILD_OPENCL False)
         endif()
+    elseif("${BACKEND}" STREQUAL "HIP")
+        if(NOT DEFINED HIP_PATH)
+            if(NOT DEFINED ENV{HIP_PATH})
+                set(HIP_PATH ${ROCM_PATH}/hip CACHE PATH "Path to which HIP has been installed")
+            else()
+                set(HIP_PATH $ENV{HIP_PATH} CACHE PATH "Path to which HIP has been installed")
+            endif()
+        endif()
+        list(APPEND CMAKE_MODULE_PATH ${HIP_PATH}/cmake)
+        # HSA_PATH
+        IF(NOT DEFINED ENV{HSA_PATH})
+            SET(HSA_PATH ${ROCM_PATH}/hsa)
+        ELSE()
+            SET(HSA_PATH $ENV{HSA_PATH})
+        ENDIF()
+        find_package(HIP QUIET REQUIRED)
     else()
         SET(BUILD_OPENCL False)
-        message("-- ${Blue}amd_openvx_extensions is currenlty not supported with BACKEND set to ${BACKEND}${ColourReset}")
+        set(GPU_SUPPORT OFF)
+        message("-- ${BoldBlue}amd_openvx_extensions is currenlty not supported with BACKEND set to ${BACKEND}")
     endif()
 endif()
 
-if(GPU_SUPPORT AND OpenCL_FOUND AND BUILD_OPENCL)
-    if(AMDRPP_FOUND)
-        add_subdirectory(amd_rpp)
-    else()
-        message("-- ${Red}WARNING:AMDRPP Not Found -- amd_rpp module excluded${ColourReset}")
-    endif()
-    if(LOOM)
-        add_subdirectory(amd_loomsl)
-        message("-- ${Green}AMD OpenVX Loom Stich Library Extension -- amd_loomsl module added${ColourReset}")
+if(GPU_SUPPORT) #AND OpenCL_FOUND AND BUILD_OPENCL)
+    if (OpenCL_FOUND AND BUILD_OPENCL)
+        if(AMDRPP_FOUND)
+            add_subdirectory(amd_rpp)
+        else()
+            message("-- ${Red}WARNING:AMDRPP Not Found -- amd_rpp module excluded${ColourReset}")
+        endif()
+        if(LOOM)
+            add_subdirectory(amd_loomsl)
+            message("-- ${Green}AMD OpenVX Loom Stich Library Extension -- amd_loomsl module added${ColourReset}")
+        else()
+            message("-- ${Cyan}LOOM Module turned OFF by user option -D LOOM=OFF ${ColourReset}")
+        endif()
+        if(FFMPEG_FOUND)
+            add_subdirectory(amd_media)
+            message("-- ${Green}AMD OpenVX Media Extension -- amd_media module added${ColourReset}")
+        else()
+            message("-- ${Red}WARNING:FFMPEG Not Found -- amd_media module excluded${ColourReset}")
+        endif(FFMPEG_FOUND)
+    elseif("${BACKEND}" STREQUAL "HIP" AND HIP_FOUND)
+        if(AMDRPP_FOUND)
+          add_subdirectory(amd_rpp)
+        else()
+          message("-- ${Red}WARNING:AMDRPP Not Found -- amd_rpp module excluded${ColourReset}")
+        endif()
+        if(HIP_COMPILER STREQUAL clang)
+              set(hip_library_name amdhip64)
+        else()
+              message(FATAL_ERROR "Unsupported HIP compiler")
+        endif()
+        if(FFMPEG_FOUND)
+            add_subdirectory(amd_media)
+            message("-- ${Green}AMD OpenVX Media Extension -- amd_media module added${ColourReset}")
+        else()
+            message("-- ${Red}WARNING:FFMPEG Not Found -- amd_media module excluded${ColourReset}")
+        endif(FFMPEG_FOUND)
     else()
-        message("-- ${Cyan}LOOM Module turned OFF by user option -D LOOM=OFF ${ColourReset}")
+      message("-- ${Red}WARNING:GPU support/OpenCL/HIP/FFMPEGNot Found -- amd_loomsl, & amd_media modules excluded${ColourReset}")
     endif()
-    if(FFMPEG_FOUND)
-        add_subdirectory(amd_media)
-        message("-- ${Green}AMD OpenVX Media Extension -- amd_media module added${ColourReset}")
-    else()
-        message("-- ${Red}WARNING:FFMPEG Not Found -- amd_media module excluded${ColourReset}")
-    endif(FFMPEG_FOUND)
-else()
-    message("-- ${Red}WARNING:OpenCL Not Found -- amd_rpp, amd_loomsl, & amd_media modules excluded${ColourReset}")
-endif()
+endif(GPU_SUPPORT)
 
-if(NEURAL_NET)
+if (NEURAL_NET)
     if(GPU_SUPPORT AND OpenCL_FOUND AND BUILD_OPENCL AND miopengemm_FOUND AND miopen_FOUND)
         add_subdirectory(amd_nn)
         message("-- ${Green}AMD OpenVX Neural Network Extension -- amd_nn module added${ColourReset}")