-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic & stream-aware scratchpad #3667
Conversation
CI MESSAGE: [3926424]: BUILD STARTED |
std::vector<double> alloc_times[nkinds]; | ||
std::vector<double> destroy_times; | ||
for (auto &v : alloc_times) | ||
v.reserve(max_attempts*10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v.reserve(max_attempts*10); | |
v.reserve(max_attempts*1024); |
I think we do on average 1024 allocations per attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now we do up to a 100. 1024 is the size, in bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 100. I confused it with size_dist. So average would be like 50 and 100 max.
|
||
std::sort(v.begin(), v.end()); | ||
double sum = std::accumulate(v.begin(), v.end(), 0); | ||
auto b98 = v.begin() + v.size()/100; | ||
auto e98 = v.end() - v.size()/100; | ||
double sum98 = std::accumulate(b98, e98, 0); | ||
std::cout << "Allocation performance for " << names[k] << " memory.\n" | ||
<< "Median time: " << v[v.size()/2] << " ns\n" | ||
<< "90th percentile: " << v[v.size()*90/100] << " ns\n" | ||
<< "99th percentile: " << v[v.size()*99/100] << " ns\n" | ||
<< "Mean time: " << sum/v.size() << " ns\n" | ||
<< "Mean time (middle 98%): " << sum98/(e98-b98) << " ns\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this could be a function.
dali/kernels/dynamic_scratchpad.h
Outdated
template <typename T, typename... Ts> | ||
struct index_in_pack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it repeat L36-L37?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove.
CI MESSAGE: [3926424]: BUILD FAILED |
dali/kernels/dynamic_scratchpad.h
Outdated
class DynamicScratchpadImplT { | ||
protected: | ||
template <typename Kind> | ||
void set_upstream_resrouce(mm::memory_resource<Kind> *rsrc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void set_upstream_resrouce(mm::memory_resource<Kind> *rsrc) { | |
void set_upstream_resource(mm::memory_resource<Kind> *rsrc) { |
dali/kernels/dynamic_scratchpad.h
Outdated
} | ||
|
||
template <typename Kind> | ||
void set_upstream_resrouce(mm::async_memory_resource<Kind> *rsrc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void set_upstream_resrouce(mm::async_memory_resource<Kind> *rsrc, | |
void set_upstream_resource(mm::async_memory_resource<Kind> *rsrc, |
if (was_running && !running) | ||
break; | ||
} | ||
if (!was_running) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about ASSERT_TRUE(was_running)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want it to just fail. On some machines it might be impossible to reach this kind of concurrency, e.g. due to CPU load.
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
- extract perf printing to a function - remove duplicate forward-declaration - fix typos - properly reserve vectors for perf results Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
8337aaf
to
57aa697
Compare
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
CI MESSAGE: [3933823]: BUILD STARTED |
CI MESSAGE: [3933823]: BUILD PASSED |
* Fix monotonic resource with 0 initial size. * Add dynamic scratchpad with tests and benchmarks. * Add fixed_order_memory_resource - a wrapper which exposes a streamless interface for stream-ordered resources Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
* Fix monotonic resource with 0 initial size. * Add dynamic scratchpad with tests and benchmarks. * Add fixed_order_memory_resource - a wrapper which exposes a streamless interface for stream-ordered resources Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
* Fix monotonic resource with 0 initial size. * Add dynamic scratchpad with tests and benchmarks. * Add fixed_order_memory_resource - a wrapper which exposes a streamless interface for stream-ordered resources Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Category:
New feature (non-breaking change which adds functionality)
Description:
Dynamic scratchpad is an implementation of the Scratchpad interface which uses built-in monotonic resources instead of preallocated buffers.
Another important feature is that device memory can be allocated and deallocated in steam order. Pinned host memory can be deallocated in stream order, too, which is essential for safe fire-and-forget H2D copying.
To facilitate stream-ordered deallocation of upstream blocks in monotonic resources, an adapter called
fixed_ordered_memory_resource
is added, which executes all allocations and deallocations in a predefined order (stream or host).Additional information:
Affected modules and functionalities:
Monotonic buffer received minor modifications.
Key points relevant for the review:
N/A
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2449