Device Packing Optimization #1382

wrtobin · 2021-04-06T20:21:45Z

Improve CUDA buffer packing performance by using assignment-based packing on properly aligned memory.

This reduces our packing time approximately one order of magnitude in observed cases. Further reduction should be possible through kernel register reduction and loop fusion (not in this PR).

…lignment for assignment packing

…earing in time history collection

wrtobin · 2021-04-06T20:22:17Z

I've already run the integratedTests on both lassen and quartz, so as long as this passes the CI checks it should be good to go.

corbett5 · 2021-04-06T20:43:09Z

src/coreComponents/dataRepository/BufferOpsDevice.cpp

  if( DO_PACKING )
  {
+    T * devBuffer = reinterpret_cast< T * >( buffer );
    parallelDeviceStream stream;


We should clean up this stream and events business later. We should not be creating a new stream for every field that we pack and we should probably do stream based synchronization instead of event based.

corbett5 · 2021-04-06T20:49:28Z

src/coreComponents/dataRepository/BufferOpsDevice.cpp

-    buffer_unit_type const * threadBuffer = devBuffer + i * unitSize;
-    LvArray::forValuesInSlice( var[ indices[ i ] ], [&threadBuffer] GEOSX_DEVICE ( T & value )
+    T const * threadBuffer = &devBuffer[ ii * sliceSize ];
+    LvArray::forValuesInSlice( var[ indices[ ii ] ], [&threadBuffer] GEOSX_DEVICE ( T & value )


I was thinking about it later and you can only get rid of forValuesInSlice if the slice is contiguous. If we don't care the order that it's packed in, as long as it's unpacked in a consistent manner then we can probably do much better but it would take some thinking as to how best to pack things and it would likely involve a single thread packing parts of multiple objects in some cases.

rrsettgast

Looks good. I don't know if this is "outdated" but I still always pre-increment rather than post-increment.

@corbett5 can you merge this if the suggested changes are still worthwhile?

src/coreComponents/dataRepository/BufferOpsDevice.cpp

src/coreComponents/dataRepository/BufferOpsDevice.hpp

Co-authored-by: Randolph Settgast <settgast1@llnl.gov>

wrtobin added 2 commits March 29, 2021 13:43

remove memcpy usage and ensure (un) packing uses the correct memory a…

1b90f68

…lignment for assignment packing

packing optimizations, moving alignment to meta-level to avoid interf…

1404bfe

…earing in time history collection

wrtobin added type: optimization flag: ready for review new labels Apr 6, 2021

wrtobin requested review from rrsettgast and corbett5 April 6, 2021 20:21

wrtobin self-assigned this Apr 6, 2021

wrtobin changed the title ~~Feature/wrtobin/packing opt~~ CUDA Packing Optimization Apr 6, 2021

wrtobin changed the title ~~CUDA Packing Optimization~~ Device Packing Optimization Apr 6, 2021

corbett5 approved these changes Apr 6, 2021

View reviewed changes

rrsettgast approved these changes Apr 6, 2021

View reviewed changes

src/coreComponents/dataRepository/BufferOpsDevice.cpp Outdated Show resolved Hide resolved

src/coreComponents/dataRepository/BufferOpsDevice.cpp Outdated Show resolved Hide resolved

src/coreComponents/dataRepository/BufferOpsDevice.hpp Outdated Show resolved Hide resolved

post-increment and move const

fd6f576

Co-authored-by: Randolph Settgast <settgast1@llnl.gov>

wrtobin added ci: run CUDA builds Allows to triggers (costly) CUDA jobs and removed flag: ready for review labels Apr 7, 2021

Merge branch 'develop' into feature/wrtobin/packing-opt

8ffd3d3

rrsettgast merged commit 88b0759 into develop Apr 21, 2021

rrsettgast deleted the feature/wrtobin/packing-opt branch April 21, 2021 04:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device Packing Optimization #1382

Device Packing Optimization #1382

wrtobin commented Apr 6, 2021 •

edited

Loading

wrtobin commented Apr 6, 2021

corbett5 Apr 6, 2021

corbett5 Apr 6, 2021

rrsettgast left a comment

Device Packing Optimization #1382

Device Packing Optimization #1382

Conversation

wrtobin commented Apr 6, 2021 • edited Loading

wrtobin commented Apr 6, 2021

corbett5 Apr 6, 2021

Choose a reason for hiding this comment

corbett5 Apr 6, 2021

Choose a reason for hiding this comment

rrsettgast left a comment

Choose a reason for hiding this comment

wrtobin commented Apr 6, 2021 •

edited

Loading