Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make IpcSender::send take a reference #138

Open
nox opened this issue Feb 8, 2017 · 31 comments
Open

Make IpcSender::send take a reference #138

nox opened this issue Feb 8, 2017 · 31 comments

Comments

@nox
Copy link
Contributor

nox commented Feb 8, 2017

For IPC, a sender never needs to take ownership of the sent value.

@antrik
Copy link
Contributor

antrik commented Feb 8, 2017

@nox well, yes and no. It's true that the IPC transfer (as well as the serde dance) implicitly clones the data; so taking it by value is inefficient in cases where the caller wants to keep the original data, currently needing to perform an extra copy that is not strictly necessary.

However, ipc-channel can also deal with pure move data -- notably IpcReceiver itself. Admittedly, the implementation for this relies on inner mutability to explicitly "consume" the original value, since the serde API doesn't provide a way to take the input by value AIUI; and this would still work if send() took the data by reference too -- except that currently the consumed sender is dropped after serialisation, before returning to the caller; so it's just an implementation detail the caller doesn't need to care about. If we decided to pass the data by reference to send() on the other hand, that would mean the consumed sender could live on, possibly causing confusing bugs.

On a more fundamental level, ipc-channel was conceived as a drop-in replacement for mpsc channels. There is a similar consideration here as with #115 : it's also a property that does not strictly make sense for IPC, but is important to enable switching between IPC channels and MPSC channels seamlessly.

Admittedly, I'm having second thoughts about that change as well... But if we start going down that road, there is actually a ton of things we could do differently to make IPC more efficient, if we disregard mpsc compatibility. Most notably, we would probably toss serde (which always adds a bunch of extra data copies), using something like Cap'n Proto instead.

In fact ipc-channel already added BytesChannel as a special copy-free fast path for raw data, with no semblance of mpsc compatibility. (This one indeed takes the input data by reference.) I guess we could enhance that into a more generic alternative interface if we deem that worthwhile. (Using Cap`n Proto etc.) Or maybe it would be better to create a separate crate for that -- though that might pose interoperability problems I fear...

@nox
Copy link
Contributor Author

nox commented Feb 8, 2017

I would gladly abandon all mpsc compatibility.

I'm not sure what you are thinking about when you say that serde always adds a bunch of extra data copies though, especially when compared with capnproto, could you develop?

@antrik
Copy link
Contributor

antrik commented Feb 8, 2017

@nox well, I'm all in favour of adding another API; I'm just not convinced it would be a good idea to change/abandon the existing drop-in API. That would require huge changes throughout Servo -- I suppose that's why @pcwalton conceived ipc-channel as a drop-in replacement for mpsc in the first place...

As for serde, it always creates a new buffer for the serialised data AIUI (correct me if I'm wrong); and also during de-serialisation. Cap'n Proto on the other hand is based on the idea of keeping data in its serialised representation in the first place -- so sending/receiving doesn't actually need to change the representation at all.

The worst case here is that a data structure sees a lot of access before or after transfer, in which case working on the serialised representation would be inefficient; so you still need to translate between serialised and "native" representations, thus not buying (or loosing) anything. For data structures that see only little access however (or are in fact assembled right before sending, which is a very common case I believe), this is strictly more efficient.

Note that Cap'n Proto was designed for network RPC AIUI, and isn't necessarily optimal for local IPC. (It deals with endianness for example, which is irrelevant for ipc-channel.) The fundamental approach though makes sense for all IPC I believe.

@nox
Copy link
Contributor Author

nox commented Feb 8, 2017

How does cap'n'proto cope with opaque types like Vec<T>?

That may interest you, btw: serde-rs/serde#492

@antrik
Copy link
Contributor

antrik commented Feb 9, 2017

@nox not sure in what sense Vec<T> is an opaque type?

If the actual question is how dynamically sized types are handled, I have to admit that I don't know (yet)... I haven't actually worked with Cap'n Proto so far -- I just know (and like) the fundamental concept. If we are serious about adding an alternative interface to ipc-channel though, I shall learn the details of course :-)

@nox
Copy link
Contributor Author

nox commented Feb 9, 2017

@antrik We don't know what the memory representation of Vec<T> is, and capnproto is basically about leaking the memory representation. I don't see how the fundamental concept helps at all for us. I would rather improve on serde.

@antrik
Copy link
Contributor

antrik commented Feb 9, 2017

@nox one fundamental thing to understand about the Cap'n Proto approach is that it doesn't just take the memory representation of any old native type and calls it serialised. That wouldn't work for Vec for example, because memory addresses have no meaning across process boundaries -- so pointers in Cap'n Proto use relative offsets within a compact memory region instead.

In other words, the serialised representations of data are often actually distinct from native ones -- the idea is just that in many cases, you can do all work on the serialised ones directly, rather then converting between native and serialised.

(To make this ergonomic, the Rust implementation of Cap'n Proto -- or some custom format based on the same ideas -- would have to implement many of Rust's standard library functions on the serialised types I guess...)

Regarding serde, there is only so much you can do when always having to convert. The idea for zero-copy de-serialisation in serde-rs/serde#492 is only useful for large data blocks; and it can only yield borrowed pointers anyway. (No Vec or Box etc.) For serialisation, limited zero-copy would be possible, if the result of serialisation is presented as a vector of slice references -- but that would only be useful if both the input consists largely of large consecutive blocks of trivially serialiseable data, and the I/O mechanisms used on the serialised data provide efficient scatter/gather interfaces. (Which we could have with the Unix back-end of ipc-channel; but not the upcoming Windows one AIUI, nor the MacOS one IIRC.)

Of course we could also try to implement Cap'n Proto concepts in serde: using a tailored binary format, and providing accessor interfaces for the serialised representation... But with serde being all about converting, and Cap'n Proto being all about working on the serialised representation, I'm not sure how much potential for code or idea sharing there really is.

@nox
Copy link
Contributor Author

nox commented Feb 9, 2017

What I meant about capnproto is that I find quite unfortunate that you wouldn't be able to do IPC anymore with plain old Rust types such as Vec<T>, which is quite useful. We have a lot of IPC across Servo, and having to use codegen for all the types involved sounds quite like a huge task.

@antrik
Copy link
Contributor

antrik commented Feb 9, 2017

Correction: the lack of scatter/gather costs an extra copy regardless of the serialisation approach I guess; so zero-copy serialisation would still be useful there... The other conditions stand though.

@antrik
Copy link
Contributor

antrik commented Feb 9, 2017

@nox depends on where the Vec comes from, and where it goes. Worst case is that it's used in many places before and after the transfer; so you'd want to convert between the Rust Vec for use, and Cap'n Proto's serialised equivalent for transfer... But that's not any different than using serde -- nothing gained, nothing lost in this case.

However, if the Vec is only constructed for the transfer, it would usually be more efficient to construct the equivalent serialised type directly, rather than constructing a Rust Vec and then translating it.

Same for any other type.

@nox
Copy link
Contributor Author

nox commented Feb 9, 2017

Yes, I'm just saying it's more codegen to have, this time with external files (AFAIK) in a specific language to feed it to capnproto, right?

@antrik
Copy link
Contributor

antrik commented Feb 9, 2017

@nox I don't think so? I haven't checked how the existing Cap'n Proto implementation for Rust works -- but I don't see any fundamental reason why we couldn't just derive the serialised equivalents from native Rust types...

(Just like serde, it will need to handle container types such as Vec explicitly of course.)

@asajeffrey
Copy link
Member

I hit this problem too, having to clone data just in order to hand it to an ipc sender and then discard it. If we had send_ref(&data) as well as send(data) I'd be done. If we want to keep the interface to be a subset of the MSPC interface, we could add send_ref(&data) there too, implemented as send(data.clone()).

@antrik
Copy link
Contributor

antrik commented Mar 8, 2017

@asajeffrey I'd actually call it send_clone() I think; but other than that, I think I'm fine with this. (With the caveat of any embedded IpcSender being "consumed", in spite of having only a shared reference, as I pointed out in my first reply... Perhaps we should take &mut data instead?)

It's probably not a huge win, but should be noticeable (in the order of 10% I'd guess?) -- and it's very low hanging fruit :-)

@antrik
Copy link
Contributor

antrik commented Mar 8, 2017

Actually, calling it send_clone() made me realise that the right thing to do here is probably simply not to drop embedded senders in this case. I'd have to check whether this poses any implementation problems...

@asajeffrey
Copy link
Member

I realized after I'd written this that this deals with the case where you want to send an &T, but not with the case I actually have, which is sending a T which implements ToOwned, and deserializing it at the other end as T::Owned. We can get very close using Cow, but then we send a Cow<'a,T> and receive a Cow<'static,T> so the lifetimes don't quite line up.

We could work round this by defining a ToStatic trait, something like:

unsafe trait ToStatic {
  type Static: 'static;
  fn to_static(&self) -> Self::Static;
}

with the requirement that serializing x and x.to_static() is the same. Then we could have sender.send_static(&data) which serializes data: &T but deserializes it at type T::Static.

In particular, we'd have:

  unsafe impl<'a,T> ToStatic for Cow<'a,T> where T: ToStatic {
    type Static = Cow<'static,T::Static>;
    fn to_static(&self) -> Self::Static { Cow::Owned(self.deref().to_static()) }
  }

In order to make this usable, though, we'd have to support deriving ToStatic, and this is using quite a bit of the complexity budget.

@asajeffrey
Copy link
Member

On thinking about it, perhaps the cleanest interface would be something like a DeepClone trait:

trait DeepClone {
  type DeepCloned;
  fn deep_clone(&self) -> Self::DeepCloned;
}

for example:

  impl DeepClone for str {
    type DeepCloned = String;
    fn deep_clone(&self) -> String { self.to_owned() }
  }

This would be different from Clone because it would clone through Rc rather than copying the reference:

  impl<T> DeepClone for Rc<T> where T: DeepClone {
    type DeepCloned = Rc<T::DeepCloned>;
    fn deep_clone(&self) -> Rc<T::DeepCloned> { Rc::new(self.deref().deep_clone()) }
  }

The expectation is that a serialization then deserialization is the same as a deep clone, in psrticular the serialization format of T should be compatible with the deserializer of T::DeepCloned.

The change to IpcSender<T> would be to add: send_deep_clone<S>(&self, data: &S) where S: DeepClone<DeepCloned=T>.

@asajeffrey
Copy link
Member

Some mucking around in the rust playground: https://play.rust-lang.org/?gist=c5e14d5a9ab2a4ac7dd2aa7190011125

@asajeffrey
Copy link
Member

And some more mucking around, this time making sure that implementing #[derive(DeepClone)] isn't too bad: https://github.com/asajeffrey/deep-clone.

@nox
Copy link
Contributor Author

nox commented Mar 10, 2017 via email

@asajeffrey
Copy link
Member

We can then implement send_deep_clone as well as send_clone. The problem is that send_clone doesn't allow sending data whose borrowing involves a change of type, for example sending a &[T] and receiving it as a Vec.

If you'd rather have a separate issue for this, I can open a new one.

@antrik
Copy link
Contributor

antrik commented Mar 10, 2017

@asajeffrey I'm not sure I follow. Why do you need deep_clone()? I thought the idea is that you do not actually want to convert? Aren't you really just after a way to define equivalence relationships of types that can be substituted for the declared sender type, since they serialise to the same representation?

(And wouldn't you say this should be handled within the Serde framework?...)

@nox
Copy link
Contributor Author

nox commented Mar 10, 2017

Let's make a new issue.

@nox
Copy link
Contributor Author

nox commented Mar 10, 2017

Cc @dtolnay for the concept of serialisation equivalence in serde.

@asajeffrey
Copy link
Member

IRC conversation with @nox: http://logs.glob.uno/?c=mozilla%23servo&s=10+Mar+2017&e=10+Mar+2017#c628689

TL;DR: we agree with send_clone, the discussions are a) do we need send_deep_clone, and b) should we introduce a breaking change and rename send_clone to send?

@asajeffrey
Copy link
Member

@antrik yes, the idea is to have a way to express that you can serialize a T, and deserialize a U. One way to express that is by something like T: DeepClone<DeepCloned=U>. Not sure where we'd put this in serde, is there a round-tripping API in there somewhere?

@asajeffrey
Copy link
Member

Created a new issue for reducing copying in the case that this requires changing type: #156.

@antrik
Copy link
Contributor

antrik commented Mar 10, 2017

I just remembered that the problem is not with embedded IpcSender, but rather with embedded IpcReceiver: since receivers cannot be cloned, send_clone() on a type with embedded receivers poses semantic problems.

(It's not a technical problem, since IpcReceiver implements inner mutability for that purpose; but it's still weird -- and potentially error-prone -- when a method that takes a shared reference, and otherwise clones everything, does actually move out and invalidate any embedded receivers...)

@asajeffrey
Copy link
Member

Hmm, and even putting a T:Clone requirement doesn't fix this, because there might be an embedded Rc<IpcReceiver>.

@antrik
Copy link
Contributor

antrik commented May 27, 2017

@asajeffrey I have been thinking about this on and off; and this is my current idea: how about adding both a send_copy() method that takes &T where T: Copy; and also a send_mut() method that takes &mut T for any T?...

send_copy() is the most straightforward one, since technically, that's what the sending actually does -- but it doesn't work with embedded IpcReceiver (or Rc<>). send_mut() on the other hand would reflect the fact that the sending process for non-copy objects modifies the originals in some way...

While each of them is limited in certain ways, my current impression is that together they should actually cover all use cases, while avoiding the semantic problems.

The plethora of variants might seem confusing at first: but together with the original send() -- which is a send_move() really -- I think they would indeed quite naturally reflect the typical &T/&mut T/T trifecta...

What do you think?

@antrik
Copy link
Contributor

antrik commented May 27, 2017

Correction: send_ref() is supposed to be send_mut() -- sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants