Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu load by prepare_uninitialized_buffer in async_read of tokio-io #269

Closed
gin66 opened this issue Mar 30, 2018 · 17 comments
Closed

cpu load by prepare_uninitialized_buffer in async_read of tokio-io #269

gin66 opened this issue Mar 30, 2018 · 17 comments

Comments

@gin66
Copy link

gin66 commented Mar 30, 2018

The current implementation zeros out a buffer, which consumes a lot of time. In my application the CPU actually spends 86% in this routine, which only listens to a tls web socket stream. The traffic is not high (10-100 kB/s), but this buffer zeroing has a huge hit on the application performance.

The application makes use of the websocket crate, which imports tokio-io. It uses the framed interface, so the application has no direct access. Possibly there could be some (complex) solution be implemented to solve this problem. Best would be to add a feature-flag, which decides about zero out the buffer or not.

Any other solution ?

@carllerche
Copy link
Member

Do you know which type’s AsyncRead implementation is doing this?

@gin66
Copy link
Author

gin66 commented Mar 30, 2018

Actually I am not sure, which of those is used via the websocket crate. I am using websocket with TlsStream aka a wss:// address. Independent from this, manually zeroing should be avoided in general. Perhaps using bytes::BytesMut in tokio-io would avoid this problem totally. As I understand, zeroing is exclusively used to avoid lack of sensitive data (which I doubt makes sense in this context).

In the meantime I have solved it by avoiding tokio and the websocket-future. Instead I am using the sync-version of the websocket. Now the CPU load of the application is OK.

@carllerche
Copy link
Member

The issue is most likely that some type that implements AsyncRead is not correctly forwarding the prepare_uninitialized_buffer implementation.

@olix0r
Copy link
Member

olix0r commented Mar 30, 2018

i just did a quick search through the websocket crate and found:

https://github.com/cyderize/rust-websocket/blob/2253a0aee965e95290c0426b56d621f964e8dd04/src/stream.rs#L80-L84

perhaps this needs to be updated to proxy calls onto the reader?

@gin66
Copy link
Author

gin66 commented Mar 30, 2018

If this would be sufficient, then IMHO overriding prepare_uninitialized() like below should help

impl<R, W> AsyncRead for ReadWritePair<R, W>
		where R: AsyncRead,
		      W: Write
{
    unsafe fn prepare_uninitialized_buffer(&self, _buf: &mut [u8]) -> bool {
      false
   }
}

But it doesn't. I have tried without effect:

      - 91,19% <tokio_io::framed::Framed<T, U> as futures::stream::Stream>::poll                                                      ▒
         - 91,16% <tokio_io::framed_read::FramedRead2<T> as futures::stream::Stream>::poll                                            ▒
            - 87,97% tokio_io::async_read::AsyncRead::read_buf                                                                        ▒
               - 87,88% <&'a mut T as tokio_io::async_read::AsyncRead>::prepare_uninitialized_buffer                                  ▒
                    <tokio_io::framed_write::FramedWrite2<T> as tokio_io::async_read::AsyncRead>::prepare_uninitialized_buffer        ▒
                  - <tokio_io::framed::Fuse<T, U> as tokio_io::async_read::AsyncRead>::prepare_uninitialized_buffer                   ▒
                     - 87,20% tokio_io::async_read::AsyncRead::prepare_uninitialized_buffer                                           ▒
                        - 67,59% core::iter::range::<impl core::iter::iterator::Iterator for core::ops::range::Range<A>>::next        ▒
                           + 32,56% <usize as core::iter::range::Step>::add_usize                                                     ▒
                           + 25,53% core::mem::swap                                                                                   ▒
                             1,23% core::cmp::impls::<impl core::cmp::PartialOrd for usize>::lt                                       ▒
                             0,81% core::num::<impl usize>::checked_add                                                               ▒
                          0,75% core::cmp::impls::<impl core::cmp::PartialOrd for usize>::lt                                          ▒
                       0,68% core::iter::range::<impl core::iter::iterator::Iterator for core::ops::range::Range<A>>::next            ▒
            + 2,12% <tokio_io::framed_write::FramedWrite2<T> as tokio_io::codec::decoder::Decoder>::decode                            ▒
            + 0,77% <fern::log_impl::Dispatch as log::Log>::log                                                                       ▒

If the CPU hogging can be fixed in websocket, then I can raise an issue there. But I think, the zeroing is not good in tokio-io in the first place.

@carllerche
Copy link
Member

Zeroing out is required for safety. It would be unsafe to pass in uninitialized memory. This is why implementations of AsyncRead must opt-in to the unsafety.

If all your AsyncRead implementations opt-into uninitialized buffers, then there will be no zeroing out.

@kpp
Copy link
Contributor

kpp commented Mar 31, 2018

Zeroing out is required for safety. It would be unsafe to pass in uninitialized memory.

Why? AsyncRead actually makes a write op into the given region of memory inside the buffer. So there is no need to set it to 0 before writeing actually a "good" data inside it.

It is forbidden to read uninitialized data, but nobody reads it before the AsyncRead operation completes (if your BufMut data struct correctly handles this situation).

@carllerche
Copy link
Member

@kpp std::io::Ready has the same problem. You can see here: rust-lang/rust#42002

The problem is that it is technically possible to read the data in a read fn, which is silly, but still possible. Because of this, you enter unsafe territory.

@carllerche
Copy link
Member

It is forbidden to read uninitialized data

It is not forbidden by the compiler. This is the problem.

@gin66
Copy link
Author

gin66 commented Mar 31, 2018

Why not use one of these alternatives:

  • Provide an interface to hand in a buffer factory like the pool crate, which even helps to avoid allocations (comes close to tokio's bare metal performance claim). Leave the decision making to the tokio user
  • Give documentation hints, how to opt out the performance hog of buffer zeroing out
  • Mandatory opt-in for tokio-io user to use slow safe or fast unsafe code
  • Rely on BufMut, which may be used anyway at other places in the application unsafely

Anyway I do not understand the problem in this case. The malicious user is the application tokio-io is linked in. If it cannot be trusted, then what else to trust ?

BTW: If the tokio-io user (aka websocket crate) has a means to avoid the zeroing, then I miss totally the point of safety. If the application cannot be trusted, then for sure the application will not use tokio-io's zeroing. If it is about safety for the application, then application should not use unsafe ways to read out data outside of the filled in data in the BufMut

@kpp
Copy link
Contributor

kpp commented Mar 31, 2018

then application should not use unsafe ways to read out data outside of the filled in data in the BufMut

👍

@carllerche
Copy link
Member

@gin66 The malicious user is not the application using tokio-io

When safety is discussed here, it is in terms of Rust's memory model. Informally, when writing Rust code, it should be impossible to be able to read uninitialized memory without writing unsafe. So, Tokio will never pass uninitialized memory into a read fn w/o that implementation opting in to it w/ an unsafe block.

However, this discussion isn't limited to Tokio, and if you wish to continue it, I'd recommend rust-lang/rust#42788. Tokio will end up following what the Rust team goes with.

@gin66
Copy link
Author

gin66 commented Apr 3, 2018

@carllerche Thanks for clarification. Before I have thought, that BufMut prohibits access to uninitialized area except unsafe is used. Need to read the code more detailed and learn rust better.

My current solution is to use the synchronous version of websocket and enjoy single digit CPU load. For me is OK to wait for that rust-discussion and close this issue here.

@gin66 gin66 closed this as completed Apr 3, 2018
@carllerche
Copy link
Member

@gin66 To be clear, if large amounts of CPU is being used to zero out memory, it is a bug, but it would be in the library that has a type that incorrectly implements AsyncRead.

@hweom
Copy link

hweom commented Jul 5, 2018

Sorry for posting in a closed issue, but I have the same problem:

1,944,830,127  ???:tokio_io::async_read::AsyncRead::prepare_uninitialized_buffer
  180,366,539  ???:<std::thread::local::LocalKey<T>>::with'2
  111,590,434  ???:<std::thread::local::LocalKey<T>>::with
  103,859,700  ???:futures::task_impl::current
   81,984,780  ???:tokio_timer::timer::level::Level::next_expiration
   74,222,670  ???:tokio_reactor::registration::Registration::poll_read_ready

(Yes, I use websocket crate too.)

I'd like to continue to use the async version of websocket crate though.

To be clear, if large amounts of CPU is being used to zero out memory, it is a bug, but it would be in the library that has a type that incorrectly implements AsyncRead.

As stated before in

https://github.com/cyderize/rust-websocket/blob/2253a0aee965e95290c0426b56d621f964e8dd04/src/stream.rs#L80-L84

the websocket implementation of AsyncRead doesn't override any trait methods, so it's not clear why it's implemented incorrectly.

@carllerche
Copy link
Member

@hweom it needs to delegate prepare_uninitialized_buffer to the inner AsyncRead type. This is the source of trouble.

@athre0z
Copy link

athre0z commented Apr 9, 2019

I ran into the same problem with the websockets lib earlier today and after a good amount of profiling and debugging, I figured out that:

  • The AsyncRead impl referred to by the previous posters is not even used in my current code path
    • When adding a prepare_uninitialized_buffer func in the trait and making it unreachable!(), no panic is raised
  • The actual issue appears to be in the tokio-tls crate -- the AsyncRead impl there is empty

When adding a prepare_uninitialized_buffer returning false in this impl, I notice a 5x performance improvement of my websocket app. However, this unsafe impl makes the assumption that native_tls (where the reads are delegated to) does in fact not read the buffer. Would this be acceptable? Should I open a PR for further discussion of this matter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants