Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Query Memory Consumption and expanded profiler with memory consumption #448

Merged
merged 13 commits into from
Jun 12, 2024

Conversation

jjfumero
Copy link
Member

@jjfumero jjfumero commented Jun 11, 2024

Description

This PR expands the profiler and the TornadoVM Execution Plan API to obtain the current memory consumption per task-graph, total number of bytes transferred to the device back and forth per execution plan, and the total memory usage. This feature is requested from the GAIA project.

a) getTotalBytesTransferred

This expands the profiler to obtain the total number of bytes transferred in every execution plan launch. The number of bytes transferred depends on READ_ONLY, WRITE_ONLY and READ_WRITE data buffers.

 try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
            TornadoExecutionResult executionResult = executionPlan.execute();
            long totalBytesTransferred = executionResult.getProfilerResult().getTotalBytesTransferred();
            long copyInBytes = executionResult.getProfilerResult().getTotalBytesCopyIn();
            long copyOutBytes = executionResult.getProfilerResult().getTotalBytesCopyOut();
            assertEquals(copyInBytes + copyOutBytes, totalBytesTransferred);
        }

b) getTotalDeviceMemoryUsage

The total device memory usage registers all allocations needed to run an execution plan. This value can be queried using the TornadoVM profiler.

try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
            TornadoExecutionResult executionResult = executionPlan.execute();
            long totalMemoryUsedInBytes = executionResult.getProfilerResult().getTotalDeviceMemoryUsage();

            // 3 Arrays
            final long sizeAllocated = a.getNumBytesOfSegmentWithHeader() * 3;
            assertEquals(sizeAllocated, totalMemoryUsedInBytes);

        }

c) getCurrentMemoryUsage

Value to query at the execution plan level, to obtain the actual memory usage at any point during execution.

try (TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph)) {
            executionPlan.execute();
            long currentMemoryUsageInBytes = executionPlan.getCurrentMemoryUsage();
            final long sizeAllocated = a.getNumBytesOfSegmentWithHeader() * 3;
            assertEquals(sizeAllocated, currentMemoryUsageInBytes);
        }

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

make BACKEND=opencl
tornado-test --enableProfiler console -V uk.ac.manchester.tornado.unittests.memory.MemoryConsumptionTest

make BACKEND=ptx
tornado-test --enableProfiler console -V uk.ac.manchester.tornado.unittests.memory.MemoryConsumptionTest

make BACKEND=spirv
tornado-test --enableProfiler console -V uk.ac.manchester.tornado.unittests.memory.MemoryConsumptionTest

@jjfumero jjfumero added API feature New feature proposal labels Jun 11, 2024
@jjfumero jjfumero self-assigned this Jun 11, 2024
Comment on lines 204 to 214
public long getTotalBytesTransferred() {
return taskGraph.getTotalBytesTransferred();
}

public long getTotalDeviceMemoryUsage() {
return taskGraph.getTotalDeviceMemoryUsage();
}

public long getCurrentMemoryUsage() {
return taskGraph.getCurrentMemoryUsage();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the public access modifier for those methods, since we do not want to expose them from the TaskGraph.

Comment on lines 925 to 935
public long getTotalBytesTransferred() {
return taskGraphImpl.getTotalBytesTransferred();
}

public long getTotalDeviceMemoryUsage() {
return taskGraphImpl.getTotalDeviceMemoryUsage();
}

public long getCurrentMemoryUsage() {
return taskGraphImpl.getCurrentMemoryUsage();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's remove the public for those methods.

@@ -514,6 +524,14 @@ long getDeviceKernelTime() {
return immutableTaskGraphList.stream().map(ImmutableTaskGraph::getDeviceKernelTime).mapToLong(Long::longValue).sum();
}

long getTotalBytesCopyIn() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add Java docs to describe what is the functionality of each method. The description should make clear when each method should be used.

@@ -124,6 +124,7 @@ __TEST_THE_WORLD__ = [
TestEntry("uk.ac.manchester.tornado.unittests.api.TestDevices"),
TestEntry("uk.ac.manchester.tornado.unittests.tensors.TestTensorTypes"),
TestEntry("uk.ac.manchester.tornado.unittests.tensors.TestTensorAPIWithOnnx"),
TestEntry("uk.ac.manchester.tornado.unittests.memory.TestStressDeviceMemory"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add the new unit-test?

MemoryConsumptionTest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

}

@Override
public synchronized int deallocate(DeviceBufferState deviceBufferState) {
public synchronized long deallocate(DeviceBufferState deviceBufferState) {
long memoryRegionDellocated = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor the variable to deallocatedSpace to keep consistency with the allocate method. Also, let's use the same variable names across all backends:
PTXTornadoDevice, SPIRVTornadoDevice.

}

@Override
public synchronized int deallocate(DeviceBufferState deviceBufferState) {
public synchronized long deallocate(DeviceBufferState deviceBufferState) {
long spaceDeallocated = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor variable to deallocatedSpace

}

@Override
public synchronized int deallocate(DeviceBufferState deviceBufferState) {
public synchronized long deallocate(DeviceBufferState deviceBufferState) {
long spaceDeallocated = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor variable to deallocatedSpace

jjfumero and others added 3 commits June 12, 2024 11:54
…r/ProfilerType.java

Co-authored-by: Thanos Stratikopoulos <34061419+stratika@users.noreply.github.com>
@jjfumero
Copy link
Member Author

All comments applied

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jjfumero. LGTM

@jjfumero jjfumero merged commit dd4bba7 into beehive-lab:develop Jun 12, 2024
2 checks passed
@jjfumero jjfumero deleted the feat/query/memory branch June 12, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API feature New feature proposal
Projects
Development

Successfully merging this pull request may close these issues.

3 participants