Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic & stream-aware scratchpad #3667

Merged
merged 8 commits into from
Feb 10, 2022
Merged

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Feb 9, 2022

Category:

New feature (non-breaking change which adds functionality)

Description:

Dynamic scratchpad is an implementation of the Scratchpad interface which uses built-in monotonic resources instead of preallocated buffers.
Another important feature is that device memory can be allocated and deallocated in steam order. Pinned host memory can be deallocated in stream order, too, which is essential for safe fire-and-forget H2D copying.
To facilitate stream-ordered deallocation of upstream blocks in monotonic resources, an adapter called fixed_ordered_memory_resource is added, which executes all allocations and deallocations in a predefined order (stream or host).

Additional information:

Affected modules and functionalities:

Monotonic buffer received minor modifications.

Key points relevant for the review:

N/A

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-2449

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3926424]: BUILD STARTED

std::vector<double> alloc_times[nkinds];
std::vector<double> destroy_times;
for (auto &v : alloc_times)
v.reserve(max_attempts*10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
v.reserve(max_attempts*10);
v.reserve(max_attempts*1024);

I think we do on average 1024 allocations per attempt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now we do up to a 100. 1024 is the size, in bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100. I confused it with size_dist. So average would be like 50 and 100 max.

Comment on lines 129 to 140

std::sort(v.begin(), v.end());
double sum = std::accumulate(v.begin(), v.end(), 0);
auto b98 = v.begin() + v.size()/100;
auto e98 = v.end() - v.size()/100;
double sum98 = std::accumulate(b98, e98, 0);
std::cout << "Allocation performance for " << names[k] << " memory.\n"
<< "Median time: " << v[v.size()/2] << " ns\n"
<< "90th percentile: " << v[v.size()*90/100] << " ns\n"
<< "99th percentile: " << v[v.size()*99/100] << " ns\n"
<< "Mean time: " << sum/v.size() << " ns\n"
<< "Mean time (middle 98%): " << sum98/(e98-b98) << " ns\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could be a function.

Comment on lines 46 to 47
template <typename T, typename... Ts>
struct index_in_pack;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it repeat L36-L37?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3926424]: BUILD FAILED

@JanuszL JanuszL self-assigned this Feb 9, 2022
@jantonguirao jantonguirao changed the title Dynamic & stream-aware sratchpad Dynamic & stream-aware scratchpad Feb 10, 2022
@jantonguirao jantonguirao self-assigned this Feb 10, 2022
class DynamicScratchpadImplT {
protected:
template <typename Kind>
void set_upstream_resrouce(mm::memory_resource<Kind> *rsrc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void set_upstream_resrouce(mm::memory_resource<Kind> *rsrc) {
void set_upstream_resource(mm::memory_resource<Kind> *rsrc) {

}

template <typename Kind>
void set_upstream_resrouce(mm::async_memory_resource<Kind> *rsrc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void set_upstream_resrouce(mm::async_memory_resource<Kind> *rsrc,
void set_upstream_resource(mm::async_memory_resource<Kind> *rsrc,

if (was_running && !running)
break;
}
if (!was_running)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about ASSERT_TRUE(was_running)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want it to just fail. On some machines it might be impossible to reach this kind of concurrency, e.g. due to CPU load.

mzient and others added 6 commits February 10, 2022 11:17
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
- extract perf printing to a function
- remove duplicate forward-declaration
- fix typos
- properly reserve vectors for perf results

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@NVIDIA NVIDIA deleted a comment from dali-automaton Feb 10, 2022
@NVIDIA NVIDIA deleted a comment from dali-automaton Feb 10, 2022
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3933823]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3933823]: BUILD PASSED

@mzient mzient merged commit bf16cc8 into NVIDIA:main Feb 10, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022
* Fix monotonic resource with 0 initial size.
* Add dynamic scratchpad with tests and benchmarks.
* Add fixed_order_memory_resource - a wrapper which exposes a streamless interface for stream-ordered resources

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@JanuszL JanuszL mentioned this pull request Mar 30, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
* Fix monotonic resource with 0 initial size.
* Add dynamic scratchpad with tests and benchmarks.
* Add fixed_order_memory_resource - a wrapper which exposes a streamless interface for stream-ordered resources

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
* Fix monotonic resource with 0 initial size.
* Add dynamic scratchpad with tests and benchmarks.
* Add fixed_order_memory_resource - a wrapper which exposes a streamless interface for stream-ordered resources

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants