Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a stack to the statistics resource #1563

Merged
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
6cf4ee7
use std::shared_mutex
madsbk May 16, 2024
d13ee6b
clean up
madsbk May 16, 2024
25ff814
impl. push_counters() and pop_counters()
madsbk May 16, 2024
b453393
python bindings
madsbk May 16, 2024
1f7daa1
python tests
madsbk May 16, 2024
d6fd147
doc
madsbk May 17, 2024
fc49fe9
test_statistics
madsbk May 17, 2024
b9d57db
current allocation statistics
madsbk May 17, 2024
0e2f19c
clean up
madsbk May 17, 2024
7f7f940
Context to enable allocation statistics
madsbk May 17, 2024
68fcd08
Apply suggestions from code review
madsbk May 23, 2024
0254b73
doc
madsbk May 23, 2024
e6dd682
add_counters_from_tracked_sub_block
madsbk May 23, 2024
bf49dab
c++ tests
madsbk May 23, 2024
aef3b9f
Merge branch 'branch-24.06' into statistics_resource_counters_stack
madsbk May 23, 2024
145cb96
Merge branch 'branch-24.06' of github.com:rapidsai/rmm into statistic…
madsbk May 24, 2024
9bd1c2e
Merge branch 'branch-24.08' of github.com:rapidsai/rmm into statistic…
madsbk May 24, 2024
cef74e5
Merge branch 'branch-24.08' of github.com:rapidsai/rmm into statistic…
madsbk May 27, 2024
04b39cb
use dataclass Statistics
madsbk May 27, 2024
1dbd022
memory profiler
madsbk May 27, 2024
badbb56
clean up
madsbk May 27, 2024
6f35d23
fix typo
madsbk May 27, 2024
d8ee633
descriptive name
madsbk May 27, 2024
2df77d1
default_profiler_records
madsbk May 27, 2024
89827ad
tracking_resource_adaptor: use std::shared_mutex
madsbk May 28, 2024
24157b5
fix pytorch test
madsbk May 28, 2024
8df2d0a
pretty_print: added memory units
madsbk May 28, 2024
a82a7b6
doc
madsbk May 28, 2024
2b4b7d3
profiler: accept name argument
madsbk May 28, 2024
0067c05
profiler: now also a context manager
madsbk May 28, 2024
ab97d2a
cleanup
madsbk May 28, 2024
189ca30
pretty_print: output format
madsbk May 28, 2024
6debd83
fix doc build
madsbk May 28, 2024
796e159
Apply suggestions from code review
madsbk May 29, 2024
d2d64a1
style clean up
madsbk May 29, 2024
394d39f
doc
madsbk May 29, 2024
499c173
rename Data => MemoryRecord
madsbk May 29, 2024
463172d
rename pretty_print => report
madsbk May 29, 2024
c11b1c5
ruff check --fix --select D400
madsbk May 29, 2024
3d929d6
report: style
madsbk May 29, 2024
62a3870
doc
madsbk May 29, 2024
8d71415
spelling
madsbk May 30, 2024
8b8176b
Merge branch 'branch-24.08' of github.com:rapidsai/rmm into statistic…
madsbk May 30, 2024
a230794
style
madsbk May 30, 2024
17d9fd9
doc
madsbk May 30, 2024
23eb075
doc
madsbk Jun 4, 2024
9e92414
doc
madsbk Jun 4, 2024
42fb6c7
Merge branch 'branch-24.08' of github.com:rapidsai/rmm into statistic…
madsbk Jun 5, 2024
8b52c83
Update python/rmm/docs/guide.md
madsbk Jun 6, 2024
0b59246
Merge branch 'branch-24.08' into statistics_resource_counters_stack
madsbk Jun 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 76 additions & 15 deletions include/rmm/mr/device/statistics_resource_adaptor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <cstddef>
#include <mutex>
#include <shared_mutex>
#include <stack>

namespace rmm::mr {
/**
Expand All @@ -36,20 +37,27 @@ namespace rmm::mr {
* resource in order to satisfy allocation requests, but any existing
* allocations will be untracked. Tracking statistics stores the current, peak
* and total memory allocations for both the number of bytes and number of calls
* to the memory resource. `statistics_resource_adaptor` is intended as a debug
* adaptor and shouldn't be used in performance-sensitive code.
* to the memory resource.
*
* This resource supports nested statistics, which makes it possible to track statistics
* of a code block. Use `.push_counters()` to start tracking statistics on a code block
* and use `.pop_counters()` to stop the tracking. The nested statistics are cascading
* such that the statistics tracked by a code block include the statistics tracked in
* all its tracked sub code blocks.
*
* `statistics_resource_adaptor` is intended as a debug adaptor and shouldn't be
* used in performance-sensitive code.
*
* @tparam Upstream Type of the upstream resource used for
* allocation/deallocation.
*/
template <typename Upstream>
class statistics_resource_adaptor final : public device_memory_resource {
public:
// can be a std::shared_mutex once C++17 is adopted
using read_lock_t =
std::shared_lock<std::shared_timed_mutex>; ///< Type of lock used to synchronize read access
std::shared_lock<std::shared_mutex>; ///< Type of lock used to synchronize read access
harrism marked this conversation as resolved.
Show resolved Hide resolved
using write_lock_t =
std::unique_lock<std::shared_timed_mutex>; ///< Type of lock used to synchronize write access
std::unique_lock<std::shared_mutex>; ///< Type of lock used to synchronize write access
/**
* @brief Utility struct for counting the current, peak, and total value of a number
*/
Expand Down Expand Up @@ -83,6 +91,24 @@ class statistics_resource_adaptor final : public device_memory_resource {
value -= val;
return *this;
}

/**
* @brief Add `val` to the current value and update the peak value if necessary
*
* When updating the peak value, we assume that `val` is tracking a code block inside the
* code block tracked by `this`. Because nested statistics are cascading, we have to convert
* `val.peak` to the peak it would have been if it was part of the statistics tracked by `this`.
* We do this by adding the current value that was active when `val` started tracking such that
* we get `std::max(value + val.peak, peak)`.
*
* @param val Value to add
*/
void add_counters_from_tracked_sub_block(const counter& val)
{
peak = std::max(value + val.peak, peak);
harrism marked this conversation as resolved.
Show resolved Hide resolved
value += val.value;
total += val.total;
}
};

/**
Expand All @@ -96,6 +122,8 @@ class statistics_resource_adaptor final : public device_memory_resource {
statistics_resource_adaptor(Upstream* upstream) : upstream_{upstream}
{
RMM_EXPECTS(nullptr != upstream, "Unexpected null upstream resource pointer.");
// Initially, we push a single counter pair on the stack
push_counters();
}

statistics_resource_adaptor() = delete;
Expand Down Expand Up @@ -131,7 +159,7 @@ class statistics_resource_adaptor final : public device_memory_resource {
{
read_lock_t lock(mtx_);

return bytes_;
return counter_stack_.top().first;
}

/**
Expand All @@ -145,7 +173,40 @@ class statistics_resource_adaptor final : public device_memory_resource {
{
read_lock_t lock(mtx_);

return allocations_;
return counter_stack_.top().second;
}

/**
* @brief Push a pair of zero counters on the stack, which becomes the new
* counters returned by `get_bytes_counter()` and `get_allocations_counter()`
*
* @return top pair of counters <bytes, allocations> from the stack _before_
* the push
*/
std::pair<counter, counter> push_counters()
{
write_lock_t lock(mtx_);
harrism marked this conversation as resolved.
Show resolved Hide resolved
auto ret = counter_stack_.top();
counter_stack_.push(std::make_pair(counter{}, counter{}));
return ret;
}

/**
* @brief Pop a pair of counters from the stack
*
* @return top pair of counters <bytes, allocations> from the stack _before_
* the pop
*/
std::pair<counter, counter> pop_counters()
{
write_lock_t lock(mtx_);
if (counter_stack_.size() < 2) { throw std::out_of_range("cannot pop the last counter pair"); }
auto ret = counter_stack_.top();
counter_stack_.pop();
// Update the new top pair of counters
counter_stack_.top().first.add_counters_from_tracked_sub_block(ret.first);
counter_stack_.top().second.add_counters_from_tracked_sub_block(ret.second);
return ret;
}

private:
Expand All @@ -171,8 +232,8 @@ class statistics_resource_adaptor final : public device_memory_resource {
write_lock_t lock(mtx_);

// Increment the allocation_count_ while we have the lock
bytes_ += bytes;
allocations_ += 1;
counter_stack_.top().first += bytes;
counter_stack_.top().second += 1;
}

return ptr;
Expand All @@ -193,8 +254,8 @@ class statistics_resource_adaptor final : public device_memory_resource {
write_lock_t lock(mtx_);

// Decrement the current allocated counts.
bytes_ -= bytes;
allocations_ -= 1;
counter_stack_.top().first -= bytes;
counter_stack_.top().second -= 1;
}
}

Expand All @@ -213,10 +274,10 @@ class statistics_resource_adaptor final : public device_memory_resource {
return get_upstream_resource() == cast->get_upstream_resource();
}

counter bytes_; // peak, current and total allocated bytes
counter allocations_; // peak, current and total allocation count
std::shared_timed_mutex mutable mtx_; // mutex for thread safe access to allocations_
Upstream* upstream_; // the upstream resource used for satisfying allocation requests
// Stack of counter pairs <bytes, allocations>
std::stack<std::pair<counter, counter>> counter_stack_;
std::shared_mutex mutable mtx_; // mutex for thread safe access to allocations_
Upstream* upstream_; // the upstream resource used for satisfying allocation requests
};

/**
Expand Down
7 changes: 3 additions & 4 deletions include/rmm/mr/device/tracking_resource_adaptor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,10 @@ namespace rmm::mr {
template <typename Upstream>
class tracking_resource_adaptor final : public device_memory_resource {
public:
// can be a std::shared_mutex once C++17 is adopted
using read_lock_t =
std::shared_lock<std::shared_timed_mutex>; ///< Type of lock used to synchronize read access
std::shared_lock<std::shared_mutex>; ///< Type of lock used to synchronize read access
using write_lock_t =
std::unique_lock<std::shared_timed_mutex>; ///< Type of lock used to synchronize write access
std::unique_lock<std::shared_mutex>; ///< Type of lock used to synchronize write access
/**
* @brief Information stored about an allocation. Includes the size
* and a stack trace if the `tracking_resource_adaptor` was initialized
Expand Down Expand Up @@ -271,7 +270,7 @@ class tracking_resource_adaptor final : public device_memory_resource {
bool capture_stacks_; // whether or not to capture call stacks
std::map<void*, allocation_info> allocations_; // map of active allocations
std::atomic<std::size_t> allocated_bytes_; // number of bytes currently allocated
std::shared_timed_mutex mutable mtx_; // mutex for thread safe access to allocations_
std::shared_mutex mutable mtx_; // mutex for thread safe access to allocations_
Upstream* upstream_; // the upstream resource used for satisfying allocation requests
};

Expand Down
103 changes: 103 additions & 0 deletions python/rmm/docs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,3 +187,106 @@ allocator.

>>> torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
```


madsbk marked this conversation as resolved.
Show resolved Hide resolved

## Memory statistics and profiling

RMM can profile memory usage and track memory statistics by using either of the following:
- Use the context manager `rmm.statistics.statistics()` to enable statistics tracking for a specific code block.
- Call `rmm.statistics.enable_statistics()` to enable statistics tracking globally.

Common to both usages is that they modify the currently active RMM memory resource. The current device resource is wrapped with a `StatisticsResourceAdaptor` which must remain the topmost resource throughout the statistics tracking:
```python
>>> import rmm
>>> import rmm.statistics

>>> # We start with the default cuda memory resource
>>> rmm.mr.get_current_device_resource()
<rmm._lib.memory_resource.CudaMemoryResource at 0x7f7e6c0a1ce0>

>>> # When using statistics, we get a StatisticsResourceAdaptor with the context
>>> with rmm.statistics.statistics():
... rmm.mr.get_current_device_resource()
<rmm._lib.memory_resource.StatisticsResourceAdaptor at 0x7f7e6c524900>

>>> # We can also enable statistics globally
>>> rmm.statistics.enable_statistics()
>>> print(rmm.mr.get_current_device_resource())
<rmm._lib.memory_resource.StatisticsResourceAdaptor at 0x7f662c2bb3c0>
```

With statistics enabled, you can query statistics of the current and peak bytes and number of allocations performed by the current RMM memory resource:
```python
>>> buf = rmm.DeviceBuffer(size=10)
>>> rmm.statistics.get_statistics()
Statistics(current_bytes=16, current_count=1, peak_bytes=16, peak_count=1, total_bytes=16, total_count=1)
```

### Memory Profiler
To profile a specific block of code, first enable memory statistics by calling `rmm.statistics.enable_statistics()`. To profile a function, use `profiler` as a function decorator:
```python
>>> @rmm.statistics.profiler()
... def f(size):
... rmm.DeviceBuffer(size=size)
>>> f(1000)

>>> # By default, the profiler write to rmm.statistics.default_profiler_records
>>> print(rmm.statistics.default_profiler_records.report())
Memory Profiling
================

Legends:
ncalls - number of times the function or code block was called
memory_peak - peak memory allocated in function or code block (in bytes)
memory_total - total memory allocated in function or code block (in bytes)

Ordered by: memory_peak

ncalls memory_peak memory_total filename:lineno(function)
1 1,008 1,008 <ipython-input-11-5fc63161ac29>:1(f)
```

To profile a code block, use `profiler` as a context manager:
```python
>>> with rmm.statistics.profiler(name="my code block"):
... rmm.DeviceBuffer(size=20)
>>> print(rmm.statistics.default_profiler_records.report())
Memory Profiling
================

Legends:
ncalls - number of times the function or code block was called
memory_peak - peak memory allocated in function or code block (in bytes)
memory_total - total memory allocated in function or code block (in bytes)

Ordered by: memory_peak

ncalls memory_peak memory_total filename:lineno(function)
1 1,008 1,008 <ipython-input-11-5fc63161ac29>:1(f)
1 32 32 my code block
```

The `profiler` supports nesting:
```python
>>> with rmm.statistics.profiler(name="outer"):
... buf1 = rmm.DeviceBuffer(size=10)
... with rmm.statistics.profiler(name="inner"):
... buf2 = rmm.DeviceBuffer(size=10)
>>> print(rmm.statistics.default_profiler_records.report())
Memory Profiling
================

Legends:
ncalls - number of times the function or code block was called
memory_peak - peak memory allocated in function or code block (in bytes)
memory_total - total memory allocated in function or code block (in bytes)

Ordered by: memory_peak

ncalls memory_peak memory_total filename:lineno(function)
1 1,008 1,008 <ipython-input-4-865fbe04e29f>:1(f)
1 32 32 my code block
1 32 32 outer
1 16 16 inner
```
9 changes: 9 additions & 0 deletions python/rmm/docs/python_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,12 @@ Memory Allocators
:members:
:undoc-members:
:show-inheritance:

Memory Statistics
-----------------

.. automodule:: rmm.statistics
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
Loading
Loading