Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat rxd av #2

Closed
wants to merge 1 commit into from
Closed

Feat rxd av #2

wants to merge 1 commit into from

Conversation

nikhilnanal
Copy link

review for dynamic av handling.

@nikhilnanal nikhilnanal marked this pull request as draft April 20, 2020 22:26
Copy link
Owner

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments. There are a lot of indenting/spacing things to fix. I commented on a few but not all.
Main thing that I think is missing is where a peer gets initialized/looked up in the ep_rxd hash. These also need to be stored in an rxd_ep list of all initialized peers for cleanup when the endpoint is closed.
Overall, good first draft!

fi_addr_t fi_addr;
fi_addr_t dg_addr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indenting here

@@ -172,6 +171,7 @@ struct rxd_cq {
enum rxd_pool_type {
RXD_BUF_POOL_RX,
RXD_BUF_POOL_TX,
RXD_BUF_POOL_PEERS
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indenting

Comment on lines 138 to 140
HASH_FIND(hh, rxd_ep->fi_rxdaddr_hash, (void*)&addr, sizeof(fi_addr_t), entry);
if (entry == NULL)
goto out;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create an inline function for this lookup. Maybe just return the rxd_addr and check for FI_ADDR_UNSPEC if we don't need the entry

Comment on lines 260 to 256
peer_entry = ofi_bufpool_get_ibuf(rxd_ep->peer_pool.pool, rxd_addr);
if (peer_entry->peer_addr == FI_ADDR_UNSPEC)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here - let's create an inline function to check this and return the peer_addr so we don't need to save the peer_entry and can just check the peer_addr.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases when we get the peer entry, we also use or manipulate other fields of the peer_entry struct. (for ex : rxd_cq.c rxd_ep_recv_data we use rx_seq_no and rx_window). should we then write inline getters and setters for every field that is being used or written to of the peer_entry? or should we do it only for peer_addr, because there are a few functions which only use the peer_addr.

Comment on lines 108 to 125
int ret;
size_t len;
len = RXD_NAME_LENGTH;
uint8_t tmp_addr[RXD_NAME_LENGTH];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here -
You can just directly set the len instead of setting it on a separate line.
And the second is just a nit for me (I don't think this is actually a thing haha) but I like to organize longest to shortest variables so it looks cleaner in the declarations ie

    uint8_t tmp_addr[RXD_NAME_LENGTH];
    size_t len = RXD_NAME_LENGTH;
    int ret;

@@ -206,51 +209,59 @@ void rxd_ep_recv_data(struct rxd_ep *ep, struct rxd_x_entry *x_entry,
static void rxd_verify_active(struct rxd_ep *ep, fi_addr_t addr, fi_addr_t peer_addr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't write this code haha but could you rename addr to be rxd_addr for clarity's sake?

Copy link
Author

@nikhilnanal nikhilnanal Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is changed to pass the peer entry instead since the function calling this was also using the same peer_entry.
rxd_verify_active(struct rxd_ep *ep, struct rxd_peer *peer_entry,
fi_addr_t peer_addr)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxd_update_peer(struct rxd_ep *ep, fi_addr_t peer, fi_addr_t peer_addr) calls this function. so making this change in rxd_update_peer to use rxd_addr as argument name.

@@ -329,8 +343,12 @@ void rxd_progress_tx_list(struct rxd_ep *ep, struct rxd_peer *peer)

static void rxd_update_peer(struct rxd_ep *ep, fi_addr_t peer, fi_addr_t peer_addr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - I know you didn't write this but let's update to rxd_peer instead of peer for consistency

if (ret)
return;
}
peer_entry = rxd_get_peer_entry(ep, pkt->source);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - not sure where this is defined. I'm guessing this is part of the stuff you're working on. I understand now the addr/name being passed in but let's rename it to make it clear that we're looking up by name especially because at this point we're not guaranteed that the peer entry exists so it's more of a lookup than a get. If it's not there then we need to do some sort of initialization of the peer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxd_ep.c


if (rxd_send_cts(ep, pkt, rxd_addr)) {
if (rxd_send_cts(ep, pkt, peer_entry->peer_addr)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rxd_addr is our local rxd_addr here, not the peer_addr (buf pool index, not peer_addr)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this

if (!rx_entry) {
FI_WARN(&rxd_prov, FI_LOG_EP_CTRL, "could not get tx entry\n");
FI_WARN(&rxd_prov, FI_LOG_EP_CTRL, "could not get rx entry\n");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate this into a separate typo patch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted for now will create a patch for this.

Comment on lines 266 to 275
rxd_ep = container_of(util_ep, struct rxd_ep, util_ep);

// function to get peer_entry from ep->rxd_peer_pool
peer_entry = ofi_bufpool_get_ibuf(rxd_ep->peer_pool.pool, rxd_addr);
assert(peer_entry->peer_addr == rxd_addr);

peer_entry->fi_addr = FI_ADDR_UNSPEC;

HASH_DELETE(hh, rxd_ep->fi_rxdaddr_hash, entry);
free(entry);
Copy link
Author

@nikhilnanal nikhilnanal Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second look i feel that the setting and unsetting of fi_addr in the peer entry is redundant I don't see peer_entry->fi_addr being used anywhere in the code. since we always use the hash( fi_addr-->rxd_addr) to fetch the peer entry from the pool we cannot use the fi_addr stored inside the peer entry. Also it seems that the removal of the fi_addr-->rxd_addr mapping in the rxd_av_remove as it may cause issues in the underlying protocol for fetching the peer if some communication is still underway?
Effectively anything which is still a part of the ep structure still probably needs to be retained and so the loop over the endpoint list is probably not required in this function.

@@ -221,6 +227,8 @@ static ssize_t rxd_atomic_inject(struct fid_ep *ep_fid, const void *buf,
struct fi_rma_iov rma_iov;
fi_addr_t rxd_addr;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two function arguments fi_addr_t dest_addr and uint64_t addr, the dest_addr argument is not being used. the man pages give addr as destination for remote memory while dest_addr as dest address for connectionless operations. what is the difference between the two. are they being used correctly in this function.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dest_addr refers to the peer. It's the same as the fi_addr.
The addr refers to the memory address where the data is going. This can either be an offset in bytes (starting at 0) or the virtual address if the app/provider is using FI_MR_VIRT_ADDR.
You're right - this is incorrect. The lookup should be using the dest_addr, not the addr. The reason this is probably working is because most apps will be using offsets and typically start at 0 which will always exist.
Please fix this and all other errors in a separate patch. Good catch!

@nikhilnanal nikhilnanal requested a review from aingerson April 24, 2020 22:45
@nikhilnanal nikhilnanal force-pushed the feat_rxdAV branch 2 times, most recently from 67203dc to c7ed89d Compare April 26, 2020 07:23
Comment on lines 150 to 151
uint8_t addr[RXD_NAME_LENGTH]; //key
fi_addr_t dg_addr; //value
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually get rid of these comments for the actual PR.
Let's also change the order here to

UT_hash_handle hh;
fi_addr_t dg_addr;
uint8_t addr[RXD_NAME_LENGTH];

I like to have arrays at the end because if we change the size of the array for some reason, then it doesn't affect the placement of the other fields within the struct. Putting the handle first with this and the other entry helps with consistency (handle is always first).

Comment on lines 184 to 192
struct rxd_fiaddr_entry {

fi_addr_t fi_addr; //key
fi_addr_t rxd_addr; //value
UT_hash_handle hh;
};
struct rxd_epname_entry {

uint8_t addr[RXD_NAME_LENGTH]; //key
fi_addr_t rxd_addr; //value
UT_hash_handle hh;
};

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment about field order and comments

@@ -758,10 +903,16 @@ static int rxd_ep_bind(struct fid *ep_fid, struct fid *bfid, uint64_t flags)
ret = ofi_ep_bind_av(&ep->util_ep, &av->util_av);
if (ret)
return ret;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra spaces

Comment on lines 455 to 456
int rxd_update_av_peers(struct util_av *util_av, void *dg_addr,
fi_addr_t fi_addr, void *arg);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only called in rxd_ep.c so it should be able to be static

struct rxd_fiaddr_entry *entry = NULL;
HASH_FIND(hh, ep->fi_rxdaddr_hash, (void*)&fi_addr, sizeof(fi_addr), entry);
return entry ? entry->rxd_addr : FI_ADDR_UNSPEC;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space

static void rxd_peer_init_fn(struct ofi_bufpool_region *region, void *buf)
{
struct rxd_peer *entry = (struct rxd_peer*) buf;
rxd_init_peer(entry);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just move the entire init_peer code here since this is only called here.

Comment on lines 1387 to 1389
//initialize hash as NULL
rxd_ep->fi_rxdaddr_hash = NULL;
rxd_ep->ep_rxdaddr_hash = NULL;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need these since you calloc'ed the rxd_ep

struct rxd_peer *peer_entry;

peer_entry = ofi_bufpool_get_ibuf(ep->peer_pool.pool, unexp_msg->base_hdr->peer);
uint16_t curr_id = peer_entry->curr_rx_id;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move declaration above and add an initialization line here


rxd_addr = rxd_ep_av(rxd_ep)->fi_addr_table[addr];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra spaces

@@ -659,7 +699,7 @@ static int rxd_ep_close(struct fid *fid)
struct slist_entry *entry;
struct rxd_peer *peer;

ep = container_of(fid, struct rxd_ep, util_ep.ep_fid.fid);
ep = container_of(fid, struct rxd_ep, util_ep.ep_fid.fid);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove spaces

@nikhilnanal nikhilnanal force-pushed the feat_rxdAV branch 3 times, most recently from d6cfcfb to 9012c37 Compare May 4, 2020 22:51
Copy link
Owner

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is pretty close to being done!
Main things:

  • Either separate out or remove all the indenting and spacing changes. Most of them are unnecessary. Some of them are helpful so if you want to keep those, you can separate them into a separate patch and include them in this PR. If you include them in this patch, it makes the code really difficult to review and also Sean is going to scold you ;)
  • Renaming/refactoring of peer lookup, update, and initialization
  • Still think you could merge the fiaddr has lookup with the get peer
  • Please fix that bug you found regarding the dest_addr!
  • Lastly, include more in your commit message to explain the flow of the new addressing. Explain the need and purpose of the new structs and hashes that you've added. It's easier for me to follow because we've discussed this so much, but it won't be that clear for someone like Sean who will be looking at this for the first time and have a hard time understanding why this has to be so complicated!

Once you've done all that, I say go for it and open up a PR upstream!

Comment on lines 62 to 67
#define RXD_PROTOCOL_VERSION (2)

#define RXD_MAX_MTU_SIZE 4096

#define RXD_MAX_TX_BITS 10
#define RXD_MAX_RX_BITS 10
#define RXD_MAX_TX_BITS 10
#define RXD_MAX_RX_BITS 10
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these (and any other unrelated formatting issues throughout the patch) need to be fixed for some reason, separate all of them into a separate formatting patch when you open a PR upstream. If they don't need to be fixed, just drop the changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an example, On vim these lines look exactly the same 1 tab distance. how to make such a change. also what do you mean by drop that which don't need to be fixed? can you point out what you are referring to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While fixing indentation. I also removed spaces in many parts of the code and replaced them with tabs, since there should be no spaces only tabs. I had some confusion when a single line of code spans multiple lines. The lines after the first in some parts of the code were formatted differently, in different parts of the code, so I tried to make them uniform (for ex all second lines which are not function headers in a file to have the same number of tabs in a file from the left margin if it is not a RHS of an expression). Is this what you are referring to as changes which are not need to be fixed?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below. The spacing here is also ok. This one is here so that all the define values line up since RXD_MAX_MTU_SIZE is an extra character and requires an extra tab for the values below but it shows up as a space instead of a tab.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just read your first comment here - if they look the same and it was just a space before and now a tab, this is an ok fix. Thanks!

Comment on lines 449 to 452
int rxd_epaddr_hash_insert(struct rxd_ep *ep, void *addr,
fi_addr_t *rxd_addr);
int rxd_update_av_peers(struct util_av *util_av, void *dg_addr,
fi_addr_t fi_addr, void *arg);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions I think are only called once each in rxd_ep. Remove them here and make them static in rxd_ep

@@ -145,7 +149,8 @@ static ssize_t rxd_generic_atomic(struct rxd_ep *rxd_ep,
if (!tx_entry)
goto out;

if (rxd_ep->peers[rxd_addr].peer_addr != FI_ADDR_UNSPEC)
peer_entry = ofi_bufpool_get_ibuf(rxd_ep->peer_pool.pool, rxd_addr);
if (peer_entry->peer_addr != FI_ADDR_UNSPEC)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rxd_fiaddr_hash_lookup should only need to be called in calls where the app should have already inserted the address into the AV and therefore initialized the peer so we should be able to merge these calls. If the lookup fails and we can't return a peer, we will never get to rxd_send_rts_if_needed or rxd_tx_entry_init_rma.
Same thing with the rx path - if the address is not ADDR_UNSPEC, then it should have been inserted into the AV so it should be able to return the peer.

@@ -221,6 +227,8 @@ static ssize_t rxd_atomic_inject(struct fid_ep *ep_fid, const void *buf,
struct fi_rma_iov rma_iov;
fi_addr_t rxd_addr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dest_addr refers to the peer. It's the same as the fi_addr.
The addr refers to the memory address where the data is going. This can either be an offset in bytes (starting at 0) or the virtual address if the app/provider is using FI_MR_VIRT_ADDR.
You're right - this is incorrect. The lookup should be using the dest_addr, not the addr. The reason this is probably working is because most apps will be using offsets and typically start at 0 which will always exist.
Please fix this and all other errors in a separate patch. Good catch!

flags, context);
if (ret)
break;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra line here

FI_WARN(&rxd_prov, FI_LOG_AV, "Unable to find inserted dg address\n");
goto err;
}
rxd_fi_update_peer_entry(ep, addr, fi_addr, (fi_addr_t*)dg_addr,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the word "update" here is a bit confusing and misleading. The peer can be initialized but the word update suggests that you are changing information from a peer that was already initialized. Consider renaming and reorganizing the functions rxd_fi_update_peer_entry, rxd_update_av_peers, and rxd_get_peer_entry to clarify which ones are doing initialization, updating, or getting. Otherwise, it's unclear where the peer is actually getting created/updated/looked up.

dlist_init(&(peer_entry->rx_list));
dlist_init(&(peer_entry->rma_rx_list));
dlist_init(&(peer_entry->buf_pkts));

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra line here

Comment on lines 418 to 435
struct fi_info *core_info);
int rxd_info_to_rxd(uint32_t version, const struct fi_info *core_info,
struct fi_info *info);
struct fi_info *info);

int rxd_fabric(struct fi_fabric_attr *attr,
struct fid_fabric **fabric, void *context);
struct fid_fabric **fabric, void *context);
int rxd_domain_open(struct fid_fabric *fabric, struct fi_info *info,
struct fid_domain **dom, void *context);
struct fid_domain **dom, void *context);
int rxd_av_create(struct fid_domain *domain_fid, struct fi_av_attr *attr,
struct fid_av **av, void *context);
struct fid_av **av, void *context);
int rxd_endpoint(struct fid_domain *domain, struct fi_info *info,
struct fid_ep **ep, void *context);
struct fid_ep **ep, void *context);
int rxd_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr,
struct fid_cq **cq_fid, void *context);
struct fid_cq **cq_fid, void *context);
int rxd_cntr_open(struct fid_domain *domain, struct fi_cntr_attr *attr,
struct fid_cntr **cntr_fid, void *context);
struct fid_cntr **cntr_fid, void *context);
int rxd_query_atomic(struct fid_domain *domain, enum fi_datatype datatype,
enum fi_op op, struct fi_atomic_attr *attr, uint64_t flags);
enum fi_op op, struct fi_atomic_attr *attr, uint64_t flags);
Copy link
Author

@nikhilnanal nikhilnanal May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example these indentation fixes --> should they be dropped? or are they alright?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fixes can be dropped. The rule is not never spaces, only tabs. It's tabs where there could be a tab/for every 8 spaces. So these spaces (and a lot of the others you've replaced) are here to lineup the arguments. Those are fine.

@nikhilnanal nikhilnanal force-pushed the feat_rxdAV branch 16 times, most recently from f75d3c1 to af24451 Compare May 12, 2020 21:14
@aingerson aingerson force-pushed the main branch 6 times, most recently from 3d17dbb to d2da72f Compare November 3, 2022 15:35
@aingerson aingerson force-pushed the main branch 6 times, most recently from 90c0d47 to 13adbf2 Compare December 12, 2022 18:56
aingerson pushed a commit that referenced this pull request Feb 9, 2023
If a posted receive matches with a saved receive, we may need to
increment the rx counter.  Set the rx counter increment callback
to match that of the posted receive.  This fixes an assert in
xnet_cntr_inc() accessing a NULL cntr_inc function pointer.

Program received signal SIGABRT, Aborted.
0x0000155552d4d37f in raise () from /lib64/libc.so.6
#0  0x0000155552d4d37f in raise () from /lib64/libc.so.6
#1  0x0000155552d37db5 in abort () from /lib64/libc.so.6
#2  0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6
#4  0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347
#5  0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354
#6  0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153
#7  0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188
#8  0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445
#9  0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558
#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91
#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
aingerson pushed a commit that referenced this pull request Feb 21, 2023
If a posted receive matches with a saved receive, we may need to
increment the rx counter.  Set the rx counter increment callback
to match that of the posted receive.  This fixes an assert in
xnet_cntr_inc() accessing a NULL cntr_inc function pointer.

Program received signal SIGABRT, Aborted.
0x0000155552d4d37f in raise () from /lib64/libc.so.6
#0  0x0000155552d4d37f in raise () from /lib64/libc.so.6
#1  0x0000155552d37db5 in abort () from /lib64/libc.so.6
#2  0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6
#4  0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347
#5  0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354
#6  0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153
#7  0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188
#8  0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445
#9  0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558
#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91
#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
@aingerson aingerson force-pushed the main branch 2 times, most recently from 0ac3eb2 to 924f47b Compare February 21, 2023 20:28
@aingerson aingerson force-pushed the main branch 4 times, most recently from 37e7d7f to 8c3817b Compare April 6, 2023 23:57
@aingerson aingerson force-pushed the main branch 5 times, most recently from 7893cad to 170d2a4 Compare July 7, 2023 22:44
@aingerson aingerson closed this Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants