[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711

srkreddy1238 · 2022-09-06T10:58:05Z

Dynamic creation of command queue (#11180) breaks OpenCLML profiling as OpenCLML stores ref. for command queue and reuse later.

Now, we access the command queue from work space always rather storing internally.

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

masahi · 2022-09-06T11:13:03Z

cc @echuraev

echuraev · 2022-09-06T13:16:26Z

src/runtime/contrib/clml/clml_runtime.cc

-    }
-    ICHECK(queue_found != false) << "Device queue not found in OpenCL Workspace";
-
+    this->queue = GetCommadQueue();


Sorry, probably I missed something. I have never looked at OpenCLML part of TVM before. Could you please explain why you didn't reuse code from OpenCL runtime?
Won't this pseudocode work in the same way?

Suggested change

this->queue = GetCommadQueue();

void InitCLML() {

// Setup CLML Context

cl_int result = 0;

// Initialize Context and Command Queue

OpenCLWorkspace* workspace = OpenCLWorkspace::Global();

workspace->Init();

OpenCLThreadEntry* t = workspace->GetThreadEntry();

cl_device_id did = workspace->devices[t->device.device_id];

if (!ExtensionStringPresent(did)) {

LOG(WARNING) << "CLML Runtime Init: Qualcomm extn not present.\n";

return;

}

this->queue = workspace->GetQueue(t->device);

If it works, it won't be necessary to implement function GetCommadQueue.

Both approaches leads to the same issue.

this->queue = workspace->GetQueue(t->device);
The queue referenced here is getting recreated at https://github.com/apache/tvm/pull/11180/files#diff-783a8419915dfdfe242ef786bce4252f60f9d0decebea66d72af63d9cf9d0058R472

and OpenCLML runtime is unaware of this recreation and continues to use a released command queue.

Yes, I agree with you. One thing that I wanted to say that probably we could unify and reuse existing code instead of writing new similar functions.
Speaking about recreation function. So, in function recreateCommandQueue a new queue is creating and storing in the vector with queues. And when we call GetQueue then a new (recreated) queue will be returned. I don't see any reasons why we cannot call cl::OpenCLWorkspace::Global()->GetQueue(device) in all places where we'd like to use command queue and get a new queue. What do you think about that?

Agreed. Let me cleanup a bit by holding ref. for workspace instead of context and queues. This makes CLML unaffected by any changes to workspace internals.

…changes.

echuraev · 2022-09-07T04:56:33Z

src/runtime/contrib/clml/clml_runtime.cc

-  cl_command_queue queue = NULL;
-  std::vector<cl_event>* evts;
+  cl::OpenCLWorkspace* workspace = NULL;
+  cl::OpenCLThreadEntry* tentry = NULL;


nit: Do you really need a separate variable for ThreadEntry? You can get it anytime from OpenCLWorkspace.

Just for convenience (also it wont change anyway) to minimize long path of accessing queue.

echuraev · 2022-09-07T04:59:26Z

src/runtime/contrib/clml/clml_runtime.cc

+    cl_platform_id platform;
+    cl_device_id device_id;
+    result = clGetPlatformIDs(1, &platform, NULL);
+    ICHECK(result == CL_SUCCESS) << "clGetPlatformIDs:" << result;
+    uint32_t num_devices = 0;
+    result = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices);
+    ICHECK(result == CL_SUCCESS && num_devices == 1) << "clGetDeviceIDs:" << result;
+    result = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
+    ICHECK(device_id && result == CL_SUCCESS) << "clGetDeviceIDs:" << result;


Why do you need this code? You can get device_id from the workspace:

cl_device_id did = workspace->devices[workspace->GetThreadEntry()->device.device_id];

I think that the best solution is to extend class OpenCLWorkspace with method cl_device_id GetClDeviceId(Device dev). This method also will do all necessary checks. I think, implementation of this method will be similar with implementation of GetQueue.

And after that you'll be able to get cl_device_id by calling this new method.

The device_id from workspace is a sequence number in our case it's always 0. What we need here is the real id from driver.

The device_id from workspace is a sequence number in our case it's always 0.

I'm not agreed with you. You can also get real id from driver. In your code, you get the number of available GPU devices and then get id of one GPU device.

Please, take a look into the method Init. We do absolutely the same things and here we create a vector of real device ids from driver. And next we store these values in a member devices. So you can extract the real device id from this vector.

When you call workspace->Init(); it will also create an OpenCL workspace for GPU device. In case if no GPU devices found, it will print a warning message. If you need only a GPU device, then you can add a check/assert for it in the CLML specific code.

Only one thing why I suggested adding a new method GetClDeviceId because in this method we can do all necessary boundary and other checks.

Yep, workspace has real id. Earlier I looked into ThreadEntry->Device, which is DLDevice and it's always zero.
We always use the first OpenCL device available and a new API GetClDeviceId may not be required at the moment.

I agree with you. Added new comment below. Let's discuss this moment there.

echuraev · 2022-09-07T05:10:17Z

src/runtime/contrib/clml/clml_runtime.cc

@@ -253,6 +230,8 @@ class CLMLRuntime : public JSONRuntimeBase {
   */
  void Run() override {
    cl_int result = 0;
+    cl_command_queue queue = workspace->GetQueue(tentry->device);
+    std::vector<cl_event>* evts = &(workspace->GetEventQueue(tentry->device));


Why do you need pointer on the vector with events instead of reference?

echuraev · 2022-09-07T05:15:53Z

src/runtime/contrib/clml/clml_runtime.cc

-      cl_event* evt = &(this->evts->back());
-      result = h_ClmlIntf->clEnqueueMLOpQCOM(queue, this->layer_.function[i],
-                                             this->layer_.descriptorSet, 0, NULL, evt);
+      if (getenv("CLML_PROFILING")) {


What if we will use method IsProfiling instead of this variable? Or it is a possible situation that we have an OpenCL queue created with profiling enable option, but we don't want to profile CLML?

CLML profiling is about profiling the ML op's within CLML sub graph (within BYOC). isProfiling is controlled by OpenCLTimer when ever someone want to profile OpenCL kernels (Generated by OpenCL Codegen). CLML doesn't have any kernels (no clEnqueueNDRangeKernel calls here) instead it has extension API.

More details,

Ideally, CLML can have it's own workspace (context & queue) and operate. The only dependency on TVM's OpenCL workspace is to have the buffers allocated on same queue so that we can do hardware level copy while context switching from TVM's OpenCL sub graph to CLML subgraph. Too tight integration here may lead to unexpected functionality break as those who enhance OpenCL runtime may not pay attention to CLML component dependencies.

Thank you for clarification. In this case, I agree with you. It makes no sense to use IsProfiling here.

echuraev

In general LGTM. One point to discuss below.

echuraev · 2022-09-07T09:42:07Z

src/runtime/contrib/clml/clml_runtime.cc

    size_t reqd_size = 0;
-    result = clGetDeviceInfo(device_id, CL_DEVICE_EXTENSIONS, 0, NULL, &reqd_size);
+    result = clGetDeviceInfo(workspace->devices[0], CL_DEVICE_EXTENSIONS, 0, NULL, &reqd_size);


I agree with you that in current implementation, device_id is always zero. So, we can just get the first device in the vector.

But I have one point for a discussion. I'm afraid that if we change something in the OpenCLWorkspace related to the list of the devices in the future, then we can break this piece of code. So, this is why I prefer to get the device in this way: workspace->devices[workspace->GetThreadEntry()->device.device_id]. To make it more safe and readable, I proposed to introduce a new method. What do you think about it?

Its very unlikely for Snapdragon to have multi GPU (at least near future). Anyway, I modified it to be generic.

echuraev

LGTM. Thanks!

srkreddy1238 · 2022-09-08T04:37:35Z

@masahi can you take a look and merge it ?

apache#12711) * [OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent changes. * [OpenCLML] Review comments. * * review comment

srkreddy1238 requested a review from masahi September 6, 2022 10:58

srkreddy1238 force-pushed the timerfix branch from d034839 to efe2361 Compare September 6, 2022 11:29

echuraev reviewed Sep 6, 2022

View reviewed changes

srkreddy1238 force-pushed the timerfix branch from efe2361 to 03643db Compare September 6, 2022 16:32

[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent …

8deb4be

…changes.

srkreddy1238 force-pushed the timerfix branch from 03643db to 8deb4be Compare September 7, 2022 04:28

echuraev requested changes Sep 7, 2022

View reviewed changes

srkreddy1238 force-pushed the timerfix branch from 6a978fb to 110ff51 Compare September 7, 2022 07:35

[OpenCLML] Review comments.

8143a84

srkreddy1238 force-pushed the timerfix branch from 110ff51 to 8143a84 Compare September 7, 2022 08:55

echuraev reviewed Sep 7, 2022

View reviewed changes

echuraev approved these changes Sep 7, 2022

View reviewed changes

* review comment

3f8131f

srkreddy1238 force-pushed the timerfix branch from 51ee37d to 3f8131f Compare September 7, 2022 16:16

srkreddy1238 requested a review from echuraev September 8, 2022 04:37

masahi approved these changes Sep 8, 2022

View reviewed changes

masahi merged commit 6be04d7 into apache:main Sep 8, 2022

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711

[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711

srkreddy1238 commented Sep 6, 2022

masahi commented Sep 6, 2022

echuraev Sep 6, 2022 •

edited

Loading

srkreddy1238 Sep 6, 2022

echuraev Sep 6, 2022

srkreddy1238 Sep 6, 2022

echuraev Sep 7, 2022

srkreddy1238 Sep 7, 2022

echuraev Sep 7, 2022

srkreddy1238 Sep 7, 2022

echuraev Sep 7, 2022

srkreddy1238 Sep 7, 2022

echuraev Sep 7, 2022

echuraev Sep 7, 2022

echuraev Sep 7, 2022

srkreddy1238 Sep 7, 2022

echuraev Sep 7, 2022

echuraev left a comment

echuraev Sep 7, 2022 •

edited

Loading

srkreddy1238 Sep 7, 2022

echuraev left a comment

srkreddy1238 commented Sep 8, 2022

-    this->queue = GetCommadQueue();
+  void InitCLML() {
+    // Setup CLML Context
+    cl_int result = 0;
+    // Initialize Context and Command Queue
+    OpenCLWorkspace* workspace = OpenCLWorkspace::Global();
+    workspace->Init();
+    OpenCLThreadEntry* t = workspace->GetThreadEntry();
+    cl_device_id did = workspace->devices[t->device.device_id];
+    if (!ExtensionStringPresent(did)) {
+      LOG(WARNING) << "CLML Runtime Init: Qualcomm extn not present.\n";
+      return;
+    }
+    this->queue = workspace->GetQueue(t->device);

[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711

[OpenCLML] CLML Profiling fixes corresponding to OpenCL Timer recent … #12711

Conversation

srkreddy1238 commented Sep 6, 2022

masahi commented Sep 6, 2022

echuraev Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echuraev left a comment

Choose a reason for hiding this comment

echuraev Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echuraev left a comment

Choose a reason for hiding this comment

srkreddy1238 commented Sep 8, 2022

echuraev Sep 6, 2022 •

edited

Loading

echuraev Sep 7, 2022 •

edited

Loading