-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat rxd av #2
Feat rxd av #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments. There are a lot of indenting/spacing things to fix. I commented on a few but not all.
Main thing that I think is missing is where a peer gets initialized/looked up in the ep_rxd hash. These also need to be stored in an rxd_ep list of all initialized peers for cleanup when the endpoint is closed.
Overall, good first draft!
prov/rxd/src/rxd.h
Outdated
fi_addr_t fi_addr; | ||
fi_addr_t dg_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indenting here
prov/rxd/src/rxd.h
Outdated
@@ -172,6 +171,7 @@ struct rxd_cq { | |||
enum rxd_pool_type { | |||
RXD_BUF_POOL_RX, | |||
RXD_BUF_POOL_TX, | |||
RXD_BUF_POOL_PEERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indenting
prov/rxd/src/rxd_atomic.c
Outdated
HASH_FIND(hh, rxd_ep->fi_rxdaddr_hash, (void*)&addr, sizeof(fi_addr_t), entry); | ||
if (entry == NULL) | ||
goto out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create an inline function for this lookup. Maybe just return the rxd_addr and check for FI_ADDR_UNSPEC if we don't need the entry
prov/rxd/src/rxd_atomic.c
Outdated
peer_entry = ofi_bufpool_get_ibuf(rxd_ep->peer_pool.pool, rxd_addr); | ||
if (peer_entry->peer_addr == FI_ADDR_UNSPEC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here - let's create an inline function to check this and return the peer_addr so we don't need to save the peer_entry and can just check the peer_addr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases when we get the peer entry, we also use or manipulate other fields of the peer_entry struct. (for ex : rxd_cq.c rxd_ep_recv_data we use rx_seq_no and rx_window). should we then write inline getters and setters for every field that is being used or written to of the peer_entry? or should we do it only for peer_addr, because there are a few functions which only use the peer_addr.
prov/rxd/src/rxd_av.c
Outdated
int ret; | ||
size_t len; | ||
len = RXD_NAME_LENGTH; | ||
uint8_t tmp_addr[RXD_NAME_LENGTH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things here -
You can just directly set the len instead of setting it on a separate line.
And the second is just a nit for me (I don't think this is actually a thing haha) but I like to organize longest to shortest variables so it looks cleaner in the declarations ie
uint8_t tmp_addr[RXD_NAME_LENGTH];
size_t len = RXD_NAME_LENGTH;
int ret;
prov/rxd/src/rxd_cq.c
Outdated
@@ -206,51 +209,59 @@ void rxd_ep_recv_data(struct rxd_ep *ep, struct rxd_x_entry *x_entry, | |||
static void rxd_verify_active(struct rxd_ep *ep, fi_addr_t addr, fi_addr_t peer_addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't write this code haha but could you rename addr to be rxd_addr for clarity's sake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is changed to pass the peer entry instead since the function calling this was also using the same peer_entry.
rxd_verify_active(struct rxd_ep *ep, struct rxd_peer *peer_entry,
fi_addr_t peer_addr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rxd_update_peer(struct rxd_ep *ep, fi_addr_t peer, fi_addr_t peer_addr) calls this function. so making this change in rxd_update_peer to use rxd_addr as argument name.
prov/rxd/src/rxd_cq.c
Outdated
@@ -329,8 +343,12 @@ void rxd_progress_tx_list(struct rxd_ep *ep, struct rxd_peer *peer) | |||
|
|||
static void rxd_update_peer(struct rxd_ep *ep, fi_addr_t peer, fi_addr_t peer_addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - I know you didn't write this but let's update to rxd_peer instead of peer for consistency
prov/rxd/src/rxd_cq.c
Outdated
if (ret) | ||
return; | ||
} | ||
peer_entry = rxd_get_peer_entry(ep, pkt->source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again - not sure where this is defined. I'm guessing this is part of the stuff you're working on. I understand now the addr/name being passed in but let's rename it to make it clear that we're looking up by name especially because at this point we're not guaranteed that the peer entry exists so it's more of a lookup than a get. If it's not there then we need to do some sort of initialization of the peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rxd_ep.c
prov/rxd/src/rxd_cq.c
Outdated
|
||
if (rxd_send_cts(ep, pkt, rxd_addr)) { | ||
if (rxd_send_cts(ep, pkt, peer_entry->peer_addr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the rxd_addr is our local rxd_addr here, not the peer_addr (buf pool index, not peer_addr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed this
prov/rxd/src/rxd_cq.c
Outdated
if (!rx_entry) { | ||
FI_WARN(&rxd_prov, FI_LOG_EP_CTRL, "could not get tx entry\n"); | ||
FI_WARN(&rxd_prov, FI_LOG_EP_CTRL, "could not get rx entry\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate this into a separate typo patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted for now will create a patch for this.
prov/rxd/src/rxd_av.c
Outdated
rxd_ep = container_of(util_ep, struct rxd_ep, util_ep); | ||
|
||
// function to get peer_entry from ep->rxd_peer_pool | ||
peer_entry = ofi_bufpool_get_ibuf(rxd_ep->peer_pool.pool, rxd_addr); | ||
assert(peer_entry->peer_addr == rxd_addr); | ||
|
||
peer_entry->fi_addr = FI_ADDR_UNSPEC; | ||
|
||
HASH_DELETE(hh, rxd_ep->fi_rxdaddr_hash, entry); | ||
free(entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second look i feel that the setting and unsetting of fi_addr in the peer entry is redundant I don't see peer_entry->fi_addr being used anywhere in the code. since we always use the hash( fi_addr-->rxd_addr) to fetch the peer entry from the pool we cannot use the fi_addr stored inside the peer entry. Also it seems that the removal of the fi_addr-->rxd_addr mapping in the rxd_av_remove as it may cause issues in the underlying protocol for fetching the peer if some communication is still underway?
Effectively anything which is still a part of the ep structure still probably needs to be retained and so the loop over the endpoint list is probably not required in this function.
@@ -221,6 +227,8 @@ static ssize_t rxd_atomic_inject(struct fid_ep *ep_fid, const void *buf, | |||
struct fi_rma_iov rma_iov; | |||
fi_addr_t rxd_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two function arguments fi_addr_t dest_addr and uint64_t addr, the dest_addr argument is not being used. the man pages give addr as destination for remote memory while dest_addr as dest address for connectionless operations. what is the difference between the two. are they being used correctly in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dest_addr refers to the peer. It's the same as the fi_addr.
The addr refers to the memory address where the data is going. This can either be an offset in bytes (starting at 0) or the virtual address if the app/provider is using FI_MR_VIRT_ADDR.
You're right - this is incorrect. The lookup should be using the dest_addr, not the addr. The reason this is probably working is because most apps will be using offsets and typically start at 0 which will always exist.
Please fix this and all other errors in a separate patch. Good catch!
67203dc
to
c7ed89d
Compare
prov/rxd/src/rxd.h
Outdated
uint8_t addr[RXD_NAME_LENGTH]; //key | ||
fi_addr_t dg_addr; //value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually get rid of these comments for the actual PR.
Let's also change the order here to
UT_hash_handle hh;
fi_addr_t dg_addr;
uint8_t addr[RXD_NAME_LENGTH];
I like to have arrays at the end because if we change the size of the array for some reason, then it doesn't affect the placement of the other fields within the struct. Putting the handle first with this and the other entry helps with consistency (handle is always first).
prov/rxd/src/rxd.h
Outdated
struct rxd_fiaddr_entry { | ||
|
||
fi_addr_t fi_addr; //key | ||
fi_addr_t rxd_addr; //value | ||
UT_hash_handle hh; | ||
}; | ||
struct rxd_epname_entry { | ||
|
||
uint8_t addr[RXD_NAME_LENGTH]; //key | ||
fi_addr_t rxd_addr; //value | ||
UT_hash_handle hh; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment about field order and comments
prov/rxd/src/rxd_ep.c
Outdated
@@ -758,10 +903,16 @@ static int rxd_ep_bind(struct fid *ep_fid, struct fid *bfid, uint64_t flags) | |||
ret = ofi_ep_bind_av(&ep->util_ep, &av->util_av); | |||
if (ret) | |||
return ret; | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra spaces
prov/rxd/src/rxd.h
Outdated
int rxd_update_av_peers(struct util_av *util_av, void *dg_addr, | ||
fi_addr_t fi_addr, void *arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is only called in rxd_ep.c so it should be able to be static
prov/rxd/src/rxd.h
Outdated
struct rxd_fiaddr_entry *entry = NULL; | ||
HASH_FIND(hh, ep->fi_rxdaddr_hash, (void*)&fi_addr, sizeof(fi_addr), entry); | ||
return entry ? entry->rxd_addr : FI_ADDR_UNSPEC; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra space
prov/rxd/src/rxd_ep.c
Outdated
static void rxd_peer_init_fn(struct ofi_bufpool_region *region, void *buf) | ||
{ | ||
struct rxd_peer *entry = (struct rxd_peer*) buf; | ||
rxd_init_peer(entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just move the entire init_peer code here since this is only called here.
prov/rxd/src/rxd_ep.c
Outdated
//initialize hash as NULL | ||
rxd_ep->fi_rxdaddr_hash = NULL; | ||
rxd_ep->ep_rxdaddr_hash = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need these since you calloc'ed the rxd_ep
prov/rxd/src/rxd_msg.c
Outdated
struct rxd_peer *peer_entry; | ||
|
||
peer_entry = ofi_bufpool_get_ibuf(ep->peer_pool.pool, unexp_msg->base_hdr->peer); | ||
uint16_t curr_id = peer_entry->curr_rx_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move declaration above and add an initialization line here
prov/rxd/src/rxd_rma.c
Outdated
|
||
rxd_addr = rxd_ep_av(rxd_ep)->fi_addr_table[addr]; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra spaces
prov/rxd/src/rxd_ep.c
Outdated
@@ -659,7 +699,7 @@ static int rxd_ep_close(struct fid *fid) | |||
struct slist_entry *entry; | |||
struct rxd_peer *peer; | |||
|
|||
ep = container_of(fid, struct rxd_ep, util_ep.ep_fid.fid); | |||
ep = container_of(fid, struct rxd_ep, util_ep.ep_fid.fid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove spaces
d6cfcfb
to
9012c37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is pretty close to being done!
Main things:
- Either separate out or remove all the indenting and spacing changes. Most of them are unnecessary. Some of them are helpful so if you want to keep those, you can separate them into a separate patch and include them in this PR. If you include them in this patch, it makes the code really difficult to review and also Sean is going to scold you ;)
- Renaming/refactoring of peer lookup, update, and initialization
- Still think you could merge the fiaddr has lookup with the get peer
- Please fix that bug you found regarding the dest_addr!
- Lastly, include more in your commit message to explain the flow of the new addressing. Explain the need and purpose of the new structs and hashes that you've added. It's easier for me to follow because we've discussed this so much, but it won't be that clear for someone like Sean who will be looking at this for the first time and have a hard time understanding why this has to be so complicated!
Once you've done all that, I say go for it and open up a PR upstream!
prov/rxd/src/rxd.h
Outdated
#define RXD_PROTOCOL_VERSION (2) | ||
|
||
#define RXD_MAX_MTU_SIZE 4096 | ||
|
||
#define RXD_MAX_TX_BITS 10 | ||
#define RXD_MAX_RX_BITS 10 | ||
#define RXD_MAX_TX_BITS 10 | ||
#define RXD_MAX_RX_BITS 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these (and any other unrelated formatting issues throughout the patch) need to be fixed for some reason, separate all of them into a separate formatting patch when you open a PR upstream. If they don't need to be fixed, just drop the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as an example, On vim these lines look exactly the same 1 tab distance. how to make such a change. also what do you mean by drop that which don't need to be fixed? can you point out what you are referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While fixing indentation. I also removed spaces in many parts of the code and replaced them with tabs, since there should be no spaces only tabs. I had some confusion when a single line of code spans multiple lines. The lines after the first in some parts of the code were formatted differently, in different parts of the code, so I tried to make them uniform (for ex all second lines which are not function headers in a file to have the same number of tabs in a file from the left margin if it is not a RHS of an expression). Is this what you are referring to as changes which are not need to be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below. The spacing here is also ok. This one is here so that all the define values line up since RXD_MAX_MTU_SIZE is an extra character and requires an extra tab for the values below but it shows up as a space instead of a tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just read your first comment here - if they look the same and it was just a space before and now a tab, this is an ok fix. Thanks!
prov/rxd/src/rxd.h
Outdated
int rxd_epaddr_hash_insert(struct rxd_ep *ep, void *addr, | ||
fi_addr_t *rxd_addr); | ||
int rxd_update_av_peers(struct util_av *util_av, void *dg_addr, | ||
fi_addr_t fi_addr, void *arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions I think are only called once each in rxd_ep. Remove them here and make them static in rxd_ep
@@ -145,7 +149,8 @@ static ssize_t rxd_generic_atomic(struct rxd_ep *rxd_ep, | |||
if (!tx_entry) | |||
goto out; | |||
|
|||
if (rxd_ep->peers[rxd_addr].peer_addr != FI_ADDR_UNSPEC) | |||
peer_entry = ofi_bufpool_get_ibuf(rxd_ep->peer_pool.pool, rxd_addr); | |||
if (peer_entry->peer_addr != FI_ADDR_UNSPEC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rxd_fiaddr_hash_lookup should only need to be called in calls where the app should have already inserted the address into the AV and therefore initialized the peer so we should be able to merge these calls. If the lookup fails and we can't return a peer, we will never get to rxd_send_rts_if_needed or rxd_tx_entry_init_rma.
Same thing with the rx path - if the address is not ADDR_UNSPEC, then it should have been inserted into the AV so it should be able to return the peer.
@@ -221,6 +227,8 @@ static ssize_t rxd_atomic_inject(struct fid_ep *ep_fid, const void *buf, | |||
struct fi_rma_iov rma_iov; | |||
fi_addr_t rxd_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dest_addr refers to the peer. It's the same as the fi_addr.
The addr refers to the memory address where the data is going. This can either be an offset in bytes (starting at 0) or the virtual address if the app/provider is using FI_MR_VIRT_ADDR.
You're right - this is incorrect. The lookup should be using the dest_addr, not the addr. The reason this is probably working is because most apps will be using offsets and typically start at 0 which will always exist.
Please fix this and all other errors in a separate patch. Good catch!
prov/rxd/src/rxd_av.c
Outdated
flags, context); | ||
if (ret) | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra line here
prov/rxd/src/rxd_ep.c
Outdated
FI_WARN(&rxd_prov, FI_LOG_AV, "Unable to find inserted dg address\n"); | ||
goto err; | ||
} | ||
rxd_fi_update_peer_entry(ep, addr, fi_addr, (fi_addr_t*)dg_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the word "update" here is a bit confusing and misleading. The peer can be initialized but the word update suggests that you are changing information from a peer that was already initialized. Consider renaming and reorganizing the functions rxd_fi_update_peer_entry, rxd_update_av_peers, and rxd_get_peer_entry to clarify which ones are doing initialization, updating, or getting. Otherwise, it's unclear where the peer is actually getting created/updated/looked up.
dlist_init(&(peer_entry->rx_list)); | ||
dlist_init(&(peer_entry->rma_rx_list)); | ||
dlist_init(&(peer_entry->buf_pkts)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra line here
prov/rxd/src/rxd.h
Outdated
struct fi_info *core_info); | ||
int rxd_info_to_rxd(uint32_t version, const struct fi_info *core_info, | ||
struct fi_info *info); | ||
struct fi_info *info); | ||
|
||
int rxd_fabric(struct fi_fabric_attr *attr, | ||
struct fid_fabric **fabric, void *context); | ||
struct fid_fabric **fabric, void *context); | ||
int rxd_domain_open(struct fid_fabric *fabric, struct fi_info *info, | ||
struct fid_domain **dom, void *context); | ||
struct fid_domain **dom, void *context); | ||
int rxd_av_create(struct fid_domain *domain_fid, struct fi_av_attr *attr, | ||
struct fid_av **av, void *context); | ||
struct fid_av **av, void *context); | ||
int rxd_endpoint(struct fid_domain *domain, struct fi_info *info, | ||
struct fid_ep **ep, void *context); | ||
struct fid_ep **ep, void *context); | ||
int rxd_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr, | ||
struct fid_cq **cq_fid, void *context); | ||
struct fid_cq **cq_fid, void *context); | ||
int rxd_cntr_open(struct fid_domain *domain, struct fi_cntr_attr *attr, | ||
struct fid_cntr **cntr_fid, void *context); | ||
struct fid_cntr **cntr_fid, void *context); | ||
int rxd_query_atomic(struct fid_domain *domain, enum fi_datatype datatype, | ||
enum fi_op op, struct fi_atomic_attr *attr, uint64_t flags); | ||
enum fi_op op, struct fi_atomic_attr *attr, uint64_t flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example these indentation fixes --> should they be dropped? or are they alright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fixes can be dropped. The rule is not never spaces, only tabs. It's tabs where there could be a tab/for every 8 spaces. So these spaces (and a lot of the others you've replaced) are here to lineup the arguments. Those are fine.
f75d3c1
to
af24451
Compare
3d17dbb
to
d2da72f
Compare
90c0d47
to
13adbf2
Compare
If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 #10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 #11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <sean.hefty@intel.com>
If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 #10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 #11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <sean.hefty@intel.com>
0ac3eb2
to
924f47b
Compare
37e7d7f
to
8c3817b
Compare
7893cad
to
170d2a4
Compare
review for dynamic av handling.