Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Separated sys crate out and applied changes from nevenoomo #8

Closed
wants to merge 38 commits into from

Conversation

rdelfin
Copy link
Contributor

@rdelfin rdelfin commented Apr 9, 2021

I've been using ibverbs for the last few months for an application I went shopping around for Rust wrappers. I found this one, but I found the lack of examples made it difficult to use and fully understand. From there I found @nevenoomo's fork together with his netCAT application, which helped show a lot more clear examples of how to make use of this. I then forked off of that, separated out the direct bindings into a separate crate (which made it a whole lot easier to import this into a Bazel project), switched the build system to the cmake crate (which as an added bonus makes this work when cross-compiling) and have been working off of that branch since.

I also had to add a few minor changes to rdma-core since it has... Compilation errors if the path you build from is too nested (which seems to be an issue with the cmake crate).

This PR is by no means ready to merge, but I wanted to get some feedback on whether this change would even be palatable. If so, I can work on cleaning up further. Let me know what you think!

Comment on lines +403 to 408
let ok = if !self.ctx.is_null() {
unsafe { ffi::ibv_close_device(self.ctx) }
} else {
0
};
assert_eq!(ok, 0);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let ok = if !self.ctx.is_null() {
unsafe { ffi::ibv_close_device(self.ctx) }
} else {
0
};
assert_eq!(ok, 0);
if !self.ctx.is_null() {
let ok = unsafe { ffi::ibv_close_device(self.ctx) };
assert_eq!(ok, 0);
}

}
}

impl Drop for Context {
fn drop(&mut self) {
let ok = unsafe { ffi::ibv_close_device(self.ctx) };
let ok = if !self.ctx.is_null() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can ctx even be NULL here? Wouldn't that mean the context failed to be constructed in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not 100% sure on. Let me see if I can find the original commit and whether that explains it

pub struct CompletionQueue<'ctx> {
_phantom: PhantomData<&'ctx ()>,
pub struct CompletionQueue {
// _phantom: PhantomData<&'ctx ()>,
Copy link
Owner

@jonhoo jonhoo Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that a CQ must outlive the context, which is what the lifetime bound used to enforce — is that not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double check this one too. The changes on this file are mostly from nevenmoo, so this might just have been them getting around lifetime requirements for ease of use.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we tag them in to this PR? They may be able to provide some additional input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I did, it's @nevenoomo. With another tag they might see it :P

Comment on lines +473 to +477
let errno = if !self.cq.is_null() {
unsafe { ffi::ibv_destroy_cq(self.cq) }
} else {
0
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let errno = if !self.cq.is_null() {
unsafe { ffi::ibv_destroy_cq(self.cq) }
} else {
0
};
if self.cq.is_null() {
return;
}
let errno = unsafe { ffi::ibv_destroy_cq(self.cq) };

fn drop(&mut self) {
let errno = unsafe { ffi::ibv_destroy_cq(self.cq) };
let errno = if !self.cq.is_null() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too, how can self.cq be NULL — that should make the constructor fail.

@@ -769,14 +804,57 @@ pub struct PreparedQueuePair<'res> {
/// An identifier for the network endpoint of a `QueuePair`.
///
/// Internally, this contains the `QueuePair`'s `qp_num`, as well as the context's `lid` and `gid`.
#[derive(Copy, Clone)]
#[derive(Clone)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove Copy from this type?

/// A wrapper around `QueuePairEndpoint` with addintional `rkey` and `raddr` field. Intended to be sent to the peer.
// TODO: write more thoughout documentation for fields.
#[derive(Serialize, Deserialize, Clone)]
pub struct EndpointMsg {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably make this nicer with the changes that landed in #7.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! That's some good timing

Comment on lines +1061 to +1065
let errno = if !self.mr.is_null() {
unsafe { ffi::ibv_dereg_mr(self.mr) }
} else {
0
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let errno = if !self.mr.is_null() {
unsafe { ffi::ibv_dereg_mr(self.mr) }
} else {
0
};
if self.mr.is_null() {
return;
}
let errno = unsafe { ffi::ibv_dereg_mr(self.mr) };


impl<T> Drop for MemoryRegion<T> {
fn drop(&mut self) {
let errno = unsafe { ffi::ibv_dereg_mr(self.mr) };
let errno = if !self.mr.is_null() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here as well, it feels like NULL should be handled by the constructor.

@@ -1019,7 +1160,7 @@ impl<'ctx> ProtectionDomain<'ctx> {
ffi::ibv_reg_mr(
self.pd,
data.as_mut_ptr() as *mut _,
n * mem::size_of::<T>(),
(n * mem::size_of::<T>()) as u64,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary on master.

Comment on lines +1186 to +1190
let errno = if !self.pd.is_null() {
unsafe { ffi::ibv_dealloc_pd(self.pd) }
} else {
0
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let errno = if !self.pd.is_null() {
unsafe { ffi::ibv_dealloc_pd(self.pd) }
} else {
0
};
if self.pd.is_null() {
return;
}
let errno = unsafe { ffi::ibv_dealloc_pd(self.pd) };

fn drop(&mut self) {
let errno = unsafe { ffi::ibv_dealloc_pd(self.pd) };
let errno = if !self.pd.is_null() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And same re NULL

qp: *mut ffi::ibv_qp,
}

unsafe impl<'a> Send for QueuePair<'a> {}
unsafe impl<'a> Sync for QueuePair<'a> {}
impl Default for QueuePair {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for implementing Default for QueuePair?

@@ -1226,15 +1384,173 @@ impl<'res> QueuePair<'res> {
Ok(())
}
}

/// Write N values from the specified MR to the remote memory.
Copy link
Owner

@jonhoo jonhoo Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some more documentation. Especially with regards to why it is unsafe.

}
}

/// Write a single value from the top of the memory region
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some more documentation. Especially with regards to why it is unsafe.

}
}

/// Write a single value from the top of the memory region
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some more documentation. Especially with regards to why it is unsafe.

self.post_write_buf(mr, 1, addr, key, wr_id, sig)
}

/// Write a single value from the top of the memory region
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some more documentation. Especially with regards to why it is unsafe.


/// Write a single value from the top of the memory region
#[inline]
pub unsafe fn post_write_single<T>(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for having this helper method, given that it's longer to type out than just calling the non-single variant with an extra argument of 1?

wr_id: u64,
sig: bool,
) -> io::Result<()> {
self.post_read_buf(mr, 1, addr, key, wr_id, sig)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for having this helper method, given that it's longer to type out than just calling the non-single variant with an extra argument of 1?

Comment on lines +1542 to +1546
let errno = if !self.qp.is_null() {
unsafe { ffi::ibv_destroy_qp(self.qp) }
} else {
0
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let errno = if !self.qp.is_null() {
unsafe { ffi::ibv_destroy_qp(self.qp) }
} else {
0
};
if self.qp.is_null() {
return;
}
let errno = unsafe { ffi::ibv_destroy_qp(self.qp) };

fn drop(&mut self) {
// TODO: ibv_destroy_qp() fails if the QP is attached to a multicast group.
let errno = unsafe { ffi::ibv_destroy_qp(self.qp) };
let errno = if !self.qp.is_null() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think the value can be NULL if we want QueuePair to implement Default, but I'd still like to know what the use-case for that is. If there isn't one, we should instead enforce in the constructor that QP is never NULL.

@@ -0,0 +1,67 @@
# ibverbs
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably still want a top-level README as well. Not sure how other projects go about this? Maybe symlink one to the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look around how projects with workspaces handle this. IIRC they usually just have identical readme's on the root and main crate, but a symlink might be nicer.

@@ -0,0 +1,21 @@
[package]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ibverbs-sys should also get a short README that points to ibverbs.

@rdelfin
Copy link
Contributor Author

rdelfin commented Apr 10, 2021

By the looks of it, the biggest single thing to send out in a separate PR is the sys vs regular crate change. I'll send out a separate PR with just that first and leave this one open as a reference mostly

@jonhoo
Copy link
Owner

jonhoo commented Apr 10, 2021

That sounds like a great plan — I think that is indeed a very neatly separable change.

@rdelfin
Copy link
Contributor Author

rdelfin commented Nov 4, 2021

I'm really sorry for taking so long on this. I've been a bit swamped with work (ironically using this library!) and forgot to get back to this.
@jonhoo am I reading correctly that a lot of the changes I put in here have already made it to master, but with correct lifetimes instead of arcs everywhere?

@jonhoo
Copy link
Owner

jonhoo commented Nov 6, 2021

No worries at all — open-source is basically the act of trying to organize lots of people's volunteer time 😅

Am I reading correctly that a lot of the changes I put in here have already made it to master, but with correct lifetimes instead of arcs everywhere?

Hard for me to say. Some definitely have, but I don't know if all of them have. You'd have to walk through the diff and see unfortunately.

@rdelfin
Copy link
Contributor Author

rdelfin commented Nov 7, 2021

No worries, let me go through it and see what is missing. I did separate out the split creates too!

@jonhoo jonhoo closed this in 6332a3a Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants