Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Arc<Vec<u8>> to replace Bytes in PeerId #1865

Closed
wants to merge 1 commit into from

Conversation

jolestar
Copy link
Contributor

@jolestar jolestar commented Nov 28, 2020

For avoiding cargo clippy raise mutable_key_type error when using PeerId as HashSet/HashMap's key.

@romanb
Copy link
Contributor

romanb commented Nov 30, 2020

From https://rust-lang.github.io/rust-clippy/master/index.html#mutable_key_type (emphasis mine):

Known problems

It’s correct to use a struct, that contains interior mutability as a key, when its Hash implementation doesn’t access any of the interior mutable types. However, this lint is unable to recognize this, so it causes a false positive in theses cases. The bytes crate is a great example of this.

See also rust-lang/rust-clippy#5812.

@mxinden
Copy link
Member

mxinden commented Nov 30, 2020

For the reason mentioned by Roman above, we silence the warning in our CI:

args: -- -A clippy::mutable_key_type -A clippy::type_complexity

@dvc94ch
Copy link
Contributor

dvc94ch commented Nov 30, 2020

Well this affects all downstream users too, and it's not clear why using Bytes is more efficient or necessary in this case. Perhaps you mean to use an Arc<[u8]>?

@jolestar
Copy link
Contributor Author

jolestar commented Dec 1, 2020

Well this affects all downstream users too, and it's not clear why using Bytes is more efficient or necessary in this case. Perhaps you mean to use an Arc<[u8]>?

The Bytes in PeerId is converted from Vec, I think using Vec at here is more efficient, for avoiding call into().

@romanb
Copy link
Contributor

romanb commented Dec 3, 2020

Well this affects all downstream users too, and it's not clear why using Bytes is more efficient or necessary in this case. Perhaps you mean to use an Arc<[u8]>?

The situation is simply that we have a Vec<u8> that we get from multihash and want cheap cloning for PeerIds. Bytes is just Arc<Vec<u8>> on steroids. If we don't need any of the additional features of Bytes for the implementation of PeerId, we can certainly go with an Arc<Vec<u8>>. I don't see what the conversion to Arc<[u8]> would buy us, given that it does not even look like it can reuse the vector allocation and needs to copy the data again. In any case, just using the Vec<u8> is I think not a good idea, as one has to assume that every clone allocates a new small vector.

The Bytes in PeerId is converted from Vec, I think using Vec at here is more efficient, for avoiding call into().

The important part is cheap cloning.

@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 3, 2020

I just looked at how bytes works, as I assumed it would require an additional allocation (as Arc<Vec> would). It delays constructing the Arc<Vec<u8>> until the first clone. I guess an Arc<[u8]> may improve data cache utilization, since it's only a single allocation, but that's probably a stretch. I guess wrapping the Vec<u8> in an Arc could work?

For avoid cargo clippy raise `mutable_key_type error when use PeerId as HashSet/HashMap 's key.
@jolestar jolestar changed the title Use Vec<u8> to replace Bytes in PeerId Use Arc<Vec<u8>> to replace Bytes in PeerId Dec 4, 2020
@jolestar
Copy link
Contributor Author

jolestar commented Dec 4, 2020

I just looked at how bytes works, as I assumed it would require an additional allocation (as Arc would). It delays constructing the Arc<Vec<u8>> until the first clone. I guess an Arc<[u8]> may improve data cache utilization, since it's only a single allocation, but that's probably a stretch. I guess wrapping the Vec<u8> in an Arc could work?

I try to using Arc<Vec> to replace Bytes.

@romanb

@@ -123,7 +123,7 @@ impl PeerId {
/// equality of peer IDs. That is, two peer IDs may be considered equal
/// while having a different byte representation as per `into_bytes`.
pub fn into_bytes(self) -> Vec<u8> {
self.multihash.to_vec()
self.multihash.as_ref().clone()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should be replaced with an AsRef<[u8]> or fn as_bytes(&self) -> &[u8] and defer cloning to the call site if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and an even better solution would be to use Multihash internally, since it is stack allocated and use an Arc<PeerId> if efficient cloning is required. For cid we just implement Copy. This also has the advantage that Vec<PeerId> only requires a single allocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the reason for this is that the kademlia buckets require something that implements Borrow<[u8]> so I guess this is the best that can be done for now

@vorot93
Copy link
Contributor

vorot93 commented Dec 5, 2020

What is the problem with continuing to use Bytes and just silence the clippy warning instead? bytes is an established crate in the ecosystem, used all the way from tokio to hyper, so there's a very high chance that libp2p downstream will depend on bytes anyway - and libp2p might as well take advantage of it.

@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 5, 2020

It's an annoyance for all projects that use clippy and the only advantage you get is you defer allocating the arc to the first clone. Something like peeid should be stack allocated and implement the copy trait so that it can be used in collections or boxes or arcs, and a lot of work was done on multihash to make it stack allocated so that it composes well with other rust types.

Many projects don't use Tokio or hyper, but as I said, no one would have cared it if clippy weren't forcing buggy checks on people. I already disabled it in the codebase I'm working on.

@vorot93
Copy link
Contributor

vorot93 commented Dec 5, 2020

I agree that if it makes semantic sense, types should be stack allocated. It's just that replacing Bytes with Arc is not the step in this direction (and the right one at all).

@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 5, 2020

It's just that replacing Bytes with Arc is not the step in this direction (and the right one at all).

I don't see it being a worse solution. As I explained I think it should be stack allocated, but since Bytes is already creating an Arc<Vec> internally it barely makes a difference. Do you have a reason to believe it will lead to a measurable performance degradation in your codebase?

@dvc94ch dvc94ch mentioned this pull request Dec 6, 2020
@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 6, 2020

Opened a PR for stack allocating PeerId which also makes clippy happy in #1874

@vorot93
Copy link
Contributor

vorot93 commented Dec 6, 2020

  1. Bytes does not enforce a certain storage representation (like Arc<Vec<u8>>). For example, you can actually create a Bytes based on a 'static slice - and it will not cause allocation (see Bytes::from_static).
  2. Bytes directly stores the data point in it, and even updates it when you slice it. With Arc<Vec<u8>>, the pointer to the start of the data is stored inside Vec - which means already one indirection away. And then you have to add the offset to it. Which means Bytes should be a little bit more efficient to access.

@dvc94ch
Copy link
Contributor

dvc94ch commented Dec 6, 2020

@vorot93 good to know, I wasn't aware of the second point. However I think #1874 is something everyone could get behind. Wdyt?

@jolestar
Copy link
Contributor Author

jolestar commented Dec 7, 2020

  1. Bytes does not enforce a certain storage representation (like Arc<Vec<u8>>). For example, you can actually create a Bytes based on a 'static slice - and it will not cause allocation (see Bytes::from_static).
  2. Bytes directly stores the data point in it, and even updates it when you slice it. With Arc<Vec<u8>>, the pointer to the start of the data is stored inside Vec - which means already one indirection away. And then you have to add the offset to it. Which means Bytes should be a little bit more efficient to access.

But in PeerId, we do not support convert `static slice to PeerId, and also does not support update, so the two points are not a good reason to keep Bytes. I think #1874 is a good solution.

@jolestar
Copy link
Contributor Author

jolestar commented Dec 8, 2020

#1874 is approved, so I close this pr.

@jolestar jolestar closed this Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants