Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

jsquyres
Copy link
Member

  • ensure all messages are sent on the data channel
  • fix wraparound sequence number issue
  • don't overrun fi_av_insert() EQ
  • better handling for AV EQ length default value

@bturrubiates Please review

@jsquyres jsquyres added the bug label Jan 30, 2016
@jsquyres jsquyres added this to the v1.10.3 milestone Jan 30, 2016
@jsquyres jsquyres mentioned this pull request Jan 30, 2016
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1287/ for details.

@jsquyres jsquyres changed the title v1.10: 3 usnic fixes v1.10: usnic fixes Jan 30, 2016
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1290/ for details.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before this change, all sends were going over the priority channel (is that for acks?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small sends. See old line 1178 (a few lines above this one):

if (frag->sf_base.uf_type == OPAL_BTL_USNIC_FRAG_SMALL_SEND &&
             frag->sf_ack_bytes_left < module->max_tiny_payload &&
// ...etc.

@bturrubiates
Copy link

👍

@ompiteam-bot
Copy link

@bturrubiates: Sorry, only this repo's organization members can interact with this bot. If you're a member of this organization, make your membership public so that this bot can verify your membership (go to http://github.com/orgs/open-mpi/people, find yourself on that page, then change your membership from "Private" to "Public").

@jsquyres
Copy link
Member Author

Ben gave a 👍

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1291/ for details.

Messages should go on the data channel, even if they're short.  Only
ACKs go on the priority channel.

(cherry picked from commit open-mpi/ompi@4de4a26)
Sequence numbers will wrap around; it is not sufficient to check for
(seq-1) -- must use the SEQ_DIFF macro to properly handle the
wraparound.

This bug wasn't serious; it just meant we might retransmit one or two
extra times when retransmits were triggerd and the sequence numbers
wrapped around their sliding windows.

(cherry picked from commit open-mpi/ompi@d624e0d)
Add endpoints in a blocked manner so that we don't overrun the
fi_av_insert() event queue.  Also make the AV EQ length an MCA param,
and report it in mca_btl_base_verbose >=5 output.

(cherry picked from commit open-mpi/ompi@db825ab)
A bunch of empirical testing has shown that increasing the retranmit
timeout from 1ms to 5ms doesn't adversely affect performance, yet
decreases the number of gratuitious retransmissions.

(cherry picked from commit open-mpi/ompi@c2615a4)
@jsquyres jsquyres force-pushed the pr/v1.10/usnic-fixes branch from 5e721ad to 2715e03 Compare February 1, 2016 12:24
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1294/ for details.

jsquyres added a commit to open-mpi/ompi that referenced this pull request Feb 1, 2016
Three minor updates from the code review of
open-mpi/ompi-release#933:

* Remove an extra blank line a show_help message
* We no longer allow -1 for the MCA param btl_usnic_av_eq_num, so
  change the flag to REGINT_GE_ONE
* Change "num_blocks" definition to be in terms of block_len (not
  eq_size)
jsquyres added a commit to jsquyres/ompi-release that referenced this pull request Feb 1, 2016
Three minor updates from the code review of
open-mpi#933:

* Remove an extra blank line a show_help message
* We no longer allow -1 for the MCA param btl_usnic_av_eq_num, so
  change the flag to REGINT_GE_ONE
* Change "num_blocks" definition to be in terms of block_len (not
  eq_size)

(cherry picked from commit open-mpi/ompi@9f3ed00)
Three minor updates from the code review of
open-mpi#933:

* Remove an extra blank line a show_help message
* We no longer allow -1 for the MCA param btl_usnic_av_eq_num, so
  change the flag to REGINT_GE_ONE
* Change "num_blocks" definition to be in terms of block_len (not
  eq_size)

(cherry picked from commit open-mpi/ompi@9f3ed00)
@jsquyres
Copy link
Member Author

jsquyres commented Feb 1, 2016

Pushed one additional commit as a result from @bturrubiates' review.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1296/ for details.

@rhc54
Copy link

rhc54 commented Feb 1, 2016

@jsquyres any urgency to this?

@jsquyres
Copy link
Member Author

jsquyres commented Feb 1, 2016

A release "sometime soon" would be nice -- I don't need an immediate release, however. Similar to other vendors, it's enough to know that it's actually merged in the community upstream.

rhc54 pushed a commit that referenced this pull request Feb 2, 2016
@rhc54 rhc54 merged commit e2b518e into open-mpi:v1.10 Feb 2, 2016
@jsquyres jsquyres deleted the pr/v1.10/usnic-fixes branch February 2, 2016 19:42
bosilca pushed a commit to bosilca/ompi that referenced this pull request Oct 3, 2016
Three minor updates from the code review of
open-mpi/ompi-release#933:

* Remove an extra blank line a show_help message
* We no longer allow -1 for the MCA param btl_usnic_av_eq_num, so
  change the flag to REGINT_GE_ONE
* Change "num_blocks" definition to be in terms of block_len (not
  eq_size)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants