Create a pool wrapper of DynamicSparseNumberArray #29

lindsayad · 2022-07-08T01:07:27Z

One of our tasks for NEAMS this year is to test pool-based AD

lindsayad · 2022-07-08T22:17:48Z

Running:

lindad@pop-os:~/projects/moose/modules/tensor_mechanics/test/tests/ad_isotropic_elasticity_tensor(devel)$ ../../../tensor_mechanics-oprof -i 2D-axisymmetric_rz_test.i Mesh/uniform_refine=6

I get these profiles (click-in to see closely). The timings are

PoolDynamicSparseNumberArray: 22 seconds
DynamicSparseNumberArray: 10 seconds (it seems like this has gotten a lot better?!)
SemiDynamicSparseNumberArray (our currently used container): 7 seconds

PoolDynamicSparseNumberArray

DynamicSparseNumberArray

SemiDynamicSparseNumberArray

lindsayad · 2022-07-09T00:39:17Z

It looks like smart pointer counting is the primary expense. Using this becomes trickier if we are running with multithreading. Just doing a thread_local pool global isn't safe enough

lindsayad · 2022-07-09T01:07:33Z

Hmm actually running the same test with the same refinement as above with --n-threads=4 I get segmentation faults with an openmp thread model with thread_local pool, but a clean run with a pthread thread model. For the latter I run with -fsanitize=thread and no errors. So maybe openmp doesn't work with thread_local?

lindsayad · 2022-07-09T01:09:25Z

Anyway maybe @permcody @friedmud @roystgnr and I should meet and talk about whether we want to try and drill down on this PR more. I think I'd like to merge something, but what that is is up for discussion. (And then what the end result is will determine whether we push this downstream at all. As it stands now, we would not push this downstream)

lindsayad · 2022-07-18T23:37:10Z

@roystgnr what are your thoughts on this PR?

roystgnr · 2022-07-19T14:08:19Z

For merging? IMHO adding this would be something in between a red herring and an outright trap for users; if we're 120% slower than non-pool DSNA then there's nothing to justify the extra complication.

For improvement? I'd try, instead of having a PDSNA that pulls from a pool-of-DSNA and accesses the memory via pointers, have a PDSNA that pulls from a pool-of-vector and a pool-of-vector and accesses the memory by std::move'ing it directly into (and out from, when returning it to the pool) its own _data and _indices. I'm not confident that this would be faster than pure DSNA either, but it would at least get rid of the extra indirection layer that the PDSNA here requires.

lindsayad · 2022-07-19T23:24:20Z

Looks like slower in the naive implementation I just tried (13 seconds)

lindsayad · 2022-07-19T23:57:40Z

With a vector backing the stack

lindsayad · 2022-07-19T23:59:05Z

Yea I think resize is a small enough contributor to the overall simulation time at this point, that spending more development time trying to abate its effect seems like time poorly spent. Probably better time would be spent figuring out if there are any possible algorithmic improvements in sparsity_union. I'm leaning towards closing this

roystgnr · 2022-07-20T14:21:35Z

11.88-12.02% in resize() and dependencies isn't trivial. We could probably cut that in half if we put our data and indices into a single vector<pair> so every two resizes becomes one. Not sure if that would be worth the pain, though.

Or ... perhaps when we first find ourselves having to resize() above zero, we could first reserve() more than the std::vector default, to reduce the odds of having to hit the heap again later? If we were to bump the initial reserved size up high enough for most MOOSE users to never go beyond it, our speed would probably be within spitting distance of non-heap-allocated SDSNA, but then the exceptional MOOSE apps would just run a little slower when they went beyond the default max, rather than seeing an error and having to reconfigure and recompile.

The 15.48-15.84% in sparsity_union does look serious. I'd love to see line-by-line profiling of that. I wish there were some way to at least make the if(!unseen_indices) return; path faster, but I don't see one ... and us spending non-trivial time in resize() probably indicates that the early-return case isn't ubiquitous enough to focus on anyway.

I'm leaning towards closing this

Sad, but "we tried X and it didn't work" is a useful test outcome too.

lindsayad · 2022-07-20T18:28:45Z

So I did some work on reserving up front for what the maximum sparsity pattern size might be in roystgnr#1 (comment) and at that time, it was slower than vanilla DynamicSparseNumberArray by a significant margin

Create a pool wrapper of DynamicSparseNumberArray

a97e0aa

lindsayad force-pushed the pool branch from 4e4dd94 to a97e0aa Compare July 8, 2022 21:21

Define copy assignment for PDSNA

c7a78f7

lindsayad force-pushed the pool branch from cd46eec to c7a78f7 Compare July 8, 2022 22:03

lindsayad marked this pull request as draft July 19, 2022 16:45

lindsayad added 2 commits July 19, 2022 16:05

Try stack with moves

1f79119

Test

63b4f64

Try vector backer

138fb8c

lindsayad force-pushed the pool branch from df75104 to 138fb8c Compare July 20, 2022 00:13

lindsayad closed this Jul 20, 2022

lindsayad mentioned this pull request Sep 27, 2022

Output desired size when AD container size is exceeded roystgnr/MetaPhysicL#84

Open

lindsayad mentioned this pull request Jan 7, 2023

Make some types potentially trivially constructable #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a pool wrapper of DynamicSparseNumberArray #29

Create a pool wrapper of DynamicSparseNumberArray #29

lindsayad commented Jul 8, 2022 •

edited

Loading

lindsayad commented Jul 8, 2022

lindsayad commented Jul 9, 2022

lindsayad commented Jul 9, 2022

lindsayad commented Jul 9, 2022

lindsayad commented Jul 18, 2022

roystgnr commented Jul 19, 2022

lindsayad commented Jul 19, 2022

lindsayad commented Jul 19, 2022

lindsayad commented Jul 19, 2022

roystgnr commented Jul 20, 2022

lindsayad commented Jul 20, 2022

Create a pool wrapper of DynamicSparseNumberArray #29

Create a pool wrapper of DynamicSparseNumberArray #29

Conversation

lindsayad commented Jul 8, 2022 • edited Loading

lindsayad commented Jul 8, 2022

PoolDynamicSparseNumberArray

DynamicSparseNumberArray

SemiDynamicSparseNumberArray

lindsayad commented Jul 9, 2022

lindsayad commented Jul 9, 2022

lindsayad commented Jul 9, 2022

lindsayad commented Jul 18, 2022

roystgnr commented Jul 19, 2022

lindsayad commented Jul 19, 2022

lindsayad commented Jul 19, 2022

lindsayad commented Jul 19, 2022

roystgnr commented Jul 20, 2022

lindsayad commented Jul 20, 2022

lindsayad commented Jul 8, 2022 •

edited

Loading