Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

702 make reduce return pending send #714

Merged
merged 25 commits into from
Aug 26, 2020

Conversation

nmm0
Copy link
Collaborator

@nmm0 nmm0 commented Feb 28, 2020

Fixes: #702

@codecov
Copy link

codecov bot commented Feb 28, 2020

Codecov Report

Merging #714 into develop will increase coverage by 0.12%.
The diff coverage is 92.46%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #714      +/-   ##
===========================================
+ Coverage    77.42%   77.54%   +0.12%     
===========================================
  Files          656      656              
  Lines        25144    25363     +219     
===========================================
+ Hits         19468    19668     +200     
- Misses        5676     5695      +19     
Impacted Files Coverage Δ
src/vt/collective/reduce/reduce.h 100.00% <ø> (ø)
src/vt/messaging/pending_send.h 100.00% <ø> (ø)
src/vt/objgroup/manager.h 100.00% <ø> (ø)
src/vt/objgroup/proxy/proxy_objgroup.h 100.00% <ø> (ø)
src/vt/vrt/collection/manager.h 100.00% <ø> (ø)
src/vt/vrt/collection/reducable/reducable.h 100.00% <ø> (ø)
tests/unit/termination/test_term_chaining.cc 88.63% <78.46%> (-9.88%) ⬇️
tests/unit/termination/test_term_dep_send_chain.cc 98.65% <95.41%> (-1.35%) ⬇️
src/vt/collective/reduce/reduce.impl.h 100.00% <100.00%> (ø)
src/vt/messaging/collection_chain_set.h 100.00% <100.00%> (ø)
... and 9 more

@lifflander
Copy link
Collaborator

@nmm0 Just heads up that PR #757 touches the reduce code and ignores the sequence ID due to lack of reasonable use.

@nmm0 nmm0 force-pushed the 702-make-reduce-return-PendingSend branch from ccb7957 to 24f0d39 Compare April 29, 2020 18:03
@nmm0 nmm0 force-pushed the 702-make-reduce-return-PendingSend branch from 66c170c to fe927c3 Compare July 15, 2020 18:10
@nmm0 nmm0 changed the title WIP: 702 make reduce return pending send 702 make reduce return pending send Jul 15, 2020
@nmm0 nmm0 marked this pull request as ready for review July 15, 2020 21:21
@nmm0
Copy link
Collaborator Author

nmm0 commented Jul 15, 2020

FYI the build test on mac os x is failing because of an annoying mac os x issue where the hosts file is configured in a certain way that breaks mpich: https://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed

@lifflander
Copy link
Collaborator

FYI the build test on mac os x is failing because of an annoying mac os x issue where the hosts file is configured in a certain way that breaks mpich: https://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed

I'm trying to debug the failure now. Something changed very recently in Github's configuration.

@nmm0
Copy link
Collaborator Author

nmm0 commented Jul 15, 2020

FYI the build test on mac os x is failing because of an annoying mac os x issue where the hosts file is configured in a certain way that breaks mpich: https://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed

I'm trying to debug the failure now. Something changed very recently in Github's configuration.

FWIW this happens to me every couple of months on my mac os machine; it seems to reset the hosts file

@lifflander
Copy link
Collaborator

FYI the build test on mac os x is failing because of an annoying mac os x issue where the hosts file is configured in a certain way that breaks mpich: https://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed

I'm trying to debug the failure now. Something changed very recently in Github's configuration.

FWIW this happens to me every couple of months on my mac os machine; it seems to reset the hosts file

Ok, I have a issue #931/PR #932 open to try to fix it. I'm planning to add some entries to the hosts file and possible explore other options also.

@lifflander
Copy link
Collaborator

@PhilMiller Can you review this?
@nmm0 Can you rebase so we get the CI fix?

@nmm0 nmm0 force-pushed the 702-make-reduce-return-PendingSend branch from fe927c3 to 951903b Compare July 16, 2020 17:51
}

static void test_handler_reduce(ChainReduceMsg *msg) {
if (msg->isRoot()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually assert that condition? The handler presumably shouldn't be getting run on anywhere other than the root

Copy link
Collaborator

@lifflander lifflander Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a bare reduce handler (the low-level interface) that is invoked by the newer higher level reduce handlers/callbacks. The low-level interface uses a single handler up the tree (including at the root). See default_op.impl.h for reference. This is the correct way to determine if the handler has made its way to the root

Anyway, @nmm0, we should switch this to a modern handler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifflander I wasn't aware there was a preferred handler -- which one should I use here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reduction in this case is vt::collective::None.

  1. Make the message ChainReduceMsg inherit from vt::collective::ReduceNoneMsg
  2. Create a callback to a lambda:
  auto cb = vt::theCB()->makeFunc<ChainReduceMsg>([](ChainReduceMsg*){
    // action to execute on root
  });
  1. Call the reduce like this:
vt::NodeType root = 0;
auto r = vt::theCollective()->global(); // if using the global reducer
r->reduce<vt::collective::None>(root, msg.get(), cb);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to the new interface with the latest commit

@PhilMiller
Copy link
Member

All of the titular stuff about making collectives return PendingSend seems fine.

I see nothing in the issue or PR that motivates or explains the merging bits, though. Their logic seems sound on their face, but I don't understand the use case they fulfill.

@nmm0 nmm0 force-pushed the 702-make-reduce-return-PendingSend branch from 951903b to 0f22df1 Compare July 28, 2020 17:59
@nmm0
Copy link
Collaborator Author

nmm0 commented Jul 28, 2020

All of the titular stuff about making collectives return PendingSend seems fine.

I see nothing in the issue or PR that motivates or explains the merging bits, though. Their logic seems sound on their face, but I don't understand the use case they fulfill.

In my particular use case (contact) they are required for the "CollisionObject" interface.

The main idea is that each object has its own CollectionChainSet representing the operations performed on that object. That includes converting local data to a vt collection, building trees, and evaluating collision. In the case of evaluating collision the idea is that both CollisionObject A and CollisionObject B must have completed their previous operations (i.e. building hierarchies and building the VT collections) in order for the collision to work. So the collision detection can be thought of as dependent on both A and B's chainsets.

@PhilMiller
Copy link
Member

All of the titular stuff about making collectives return PendingSend seems fine.
I see nothing in the issue or PR that motivates or explains the merging bits, though. Their logic seems sound on their face, but I don't understand the use case they fulfill.

In my particular use case (contact) they are required for the "CollisionObject" interface.

The main idea is that each object has its own CollectionChainSet representing the operations performed on that object. That includes converting local data to a vt collection, building trees, and evaluating collision. In the case of evaluating collision the idea is that both CollisionObject A and CollisionObject B must have completed their previous operations (i.e. building hierarchies and building the VT collections) in order for the collision to work. So the collision detection can be thought of as dependent on both A and B's chainsets.

Ok, cool, I'll buy that.

When you say "object" here, I assume you mean the physical bodies in the structures being simulated, through their dynamics and collisions?

@nmm0
Copy link
Collaborator Author

nmm0 commented Jul 28, 2020

All of the titular stuff about making collectives return PendingSend seems fine.
I see nothing in the issue or PR that motivates or explains the merging bits, though. Their logic seems sound on their face, but I don't understand the use case they fulfill.

In my particular use case (contact) they are required for the "CollisionObject" interface.
The main idea is that each object has its own CollectionChainSet representing the operations performed on that object. That includes converting local data to a vt collection, building trees, and evaluating collision. In the case of evaluating collision the idea is that both CollisionObject A and CollisionObject B must have completed their previous operations (i.e. building hierarchies and building the VT collections) in order for the collision to work. So the collision detection can be thought of as dependent on both A and B's chainsets.

Ok, cool, I'll buy that.

When you say "object" here, I assume you mean the physical bodies in the structures being simulated, through their dynamics and collisions?

Exactly -- though they are also referred to as "blocks". Right now in NimbleSM there are two objects, primary and secondary though we want to target arbitrary numbers of objects. Basically in this model there are no self-collision; i.e. elements in one block will not collide with other elements in the block.

@nmm0 nmm0 force-pushed the 702-make-reduce-return-PendingSend branch from 4480bba to f8d5895 Compare August 4, 2020 17:43
nmm0 added 25 commits August 25, 2020 15:20
…w constructor that allows construction with the new EpochActionType. Additionally move some messages that are not used externally to private to preserve class invariant. Move the definitions of some non-templated functions to the source file. Make the move constructor noexcept.
… collective reduce, collections, and objgroups

This makes returning a sequentialID impossible. In order for collections to order properly, add a reduceImmediate public overload set that takes the same parameters as reduce but returns the sequential id instead of a PendingSend. This is used internally by the new reduce and collection reduceMsgExprImpl, but can be called by the user.
…s to have a PendingSend be dependent on two chain sets
…epCollective that the index exists in both chain sets
…orm bcast/send is added add extra checks for combining those in the chainset.
…o reflect that the chain sets deal with indices, not necessarily all resident elements from the collection
@nmm0 nmm0 force-pushed the 702-make-reduce-return-PendingSend branch from a10af4b to beea5ea Compare August 25, 2020 22:25
@nmm0
Copy link
Collaborator Author

nmm0 commented Aug 25, 2020

Signed commits and rebased on develop

Copy link
Collaborator

Codacy Here is an overview of what got changed by this pull request:

Clones added
============
- tests/unit/termination/test_term_chaining.cc  4
- tests/unit/termination/test_term_dep_send_chain.cc  4
- src/vt/collective/reduce/reduce.h  2
- src/vt/vrt/collection/reducable/reducable.impl.h  2
- src/vt/objgroup/proxy/proxy_objgroup.impl.h  2
         

See the complete overview on Codacy

@PhilMiller PhilMiller merged commit c959dbc into develop Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make reduce return PendingSend
3 participants