-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working toward a fully zero copy receive path #320
Comments
Could you give me some example? I think https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901100 publish is better for the example. |
I expect as follows:
|
In your example:
Not needed, since we only need that data on the stack.
Right, this is how it's currently done.
Also unnecessary, since the length is only used on the stack, like we currently do things.
Right, this is how it's currently done.
For a publish message, there are 3 categories of things that would be given an std::shared_ptr<char[]>
But not all properties would involve allocating a std::shared_ptr<char[]>; Only the properties that have variable length contents, such as ContentType, UserProperty, CorrelationID, and so on, would have a std::shared_ptr<char[]> created. For Connect messages we also have
What motivates me to discuss this is
Copying:We mitigate copying being an expensive operation by ensuring that data enters the program (from boost::asio) directly into the one and only buffer that it will ever need to live in. Having the buffer stored as a std::shared_ptr<char[]> allows for access to the buffer to be shared among multiple data structures cheaply. For example, we store topics in test_broker.hpp in multiple places, and those topics are stored for potentially a long time. Messages and properties are stored in test_broker.hpp if they are retained, or as part of a will. All of the data that's passed to a user program (server, client, whatever) can potentially be used for a long time as well, and allocating storage for them in this way allows for the program using mqtt_cpp to never need to copy the data, only pass the shared_ptr<> to the data around, which is significantly less expensive than copying large multi megabyte strings. (The max size of a topic is 65536 bytes (64KB) and the max message size is 268,435,456 (256MB) ) Copying data that approach those sizes is very cpu intensive. Allocations:By having endpoint accept as one of it's class template, and constructor parameters, an allocator (defaulting to std::allocator, of course), and then allocating all memory allocations through the provided allocator, end user code can achieve better control over the allocation behavior of the program. As I mentioned in my initial post: A good allocator design in this situation is to always return regions of memory in powers of two (Probably no smaller than 32bytes). You just round up to the nearest power of two any time user code requests an allocation. Further, any time you need to allocate from the OS, you allocate many multiples of the requested size. E.g., if the requested allocation is X bytes, you round X to the next power of two, and then allocate 10x the result, which is stored in your memory pool as 9 chunks saved for later, and 1 chunk returned to the code requesting the allocation. Finally, when your std::shared_ptr<char[]> is destructed, the custom deleter you provide to std::shared_ptr<char[]> returns the freed memory back to your allocator, instead of to the global allocator. You do this by calling std::allocate_shared, instead of std::make_shared. The memory returned like that becomes available for use in the next message processed. But mqtt_cpp doesn't need to know anything about how the provided allocator behaves. All it needs to do is use it to allocate the storage for the memory buffers used to hold the various chunks of messages received from the network. The way I would control the allocation behavior in my broker is like this:
A single allocator would be provided to all mqtt_cpp::endpoints, and the broker. When the allocator reaches a pre-specified low-water mark, a signal would be sent to the management thread indicating that more buffers are needed. The management thread would then allocate more memory, and feed the new buffers to the allocator. The reason this would be done on the management thread is to avoid waiting on the system allocator on the main processing thread. If an mqtt_cpp::endpoint requests memory from the allocator and the allocator doesn't have enough available, the allocator does an allocation from the system immediately, since continuing processing is more important than waiting on another thread to provide memory. Eventually the system would reach a steady state, where no allocations come from the system allocator, and would instead be served from the pooled buffers in the custom allocator. What we need to do in order to support this is to
|
I think that basically this approach is good. Let me clarify that I understand the concept correctly. And I have some comments.
|
I didn't know that shared_ptr to array type was C++17. It looks like there's a way to do this without using boost::shared_ptr, but I think boost::shared_ptr will support "allocate_shared" better, and that's probably important. So the mqtt::shared_ptr approach is reasonable. https://stackoverflow.com/questions/13061979/shared-ptr-to-an-array-should-it-be-used
How about something like this? (Not a complete implementation...)
|
struct size_sp {
mqtt::string_view as_string_view() const {
std::string_view(size, data.get());
}
std::size_t size;
mqtt::shared_ptr<char[]> data;
}; is good enough. What do you think? |
BTW, It's important that the data is const after it's read out of boost::asio. If we dont, then users might edit the data in a way that isn't thread safe. |
I tried yet another implementation of 2. See What do you think? |
Yea that looks like a good approach. Lets use that! |
Though, I think for the constructor, it would be wise to do something like:
We don't want the buffer class to ever have a writable buffer. It's basically a read only "view" class. Passing in a shared_ptr<const char[]> (instead of having the buffer class allocate it's own) guarantees that immutability property. We could actually extend this a bit to have the buffer class contain not only a size, but also an offset. E.g. maybe someone only wants a portion of the buffer, after we've read it from boost::asio. If we can let them provide an offset / start position, then this buffer class can represent any arbitrary range inside of the buffer owned by ptr_. Imagine a message that is comma separated values, for example. User code may want to grab the 3rd value out of the comma separated values, but that value is 10MB out of a 200MB string. If we support an offset indicator, the user can make a read-only substring that's attached to the same shared_ptr<const char[]> as the buffer that mqtt_cpp::endpoint provided to the callback, but which only represents the specific part of the string that they want. |
Ah, actually, now that I think about it. We maybe shouldn't inherit from mqtt::string_view. There is a lot of code out there that accepts std::string_view by value and not by reference. Inheriting from mqtt::string_view like this will cause slicing of the object. The std::shared_ptr<const char[]> won't be passed to the function, only the mqtt::string_view that's being inherited from. If we use private inheritance to hide the fact that mqtt::buffer (or whatever we name it) is an mqtt::string_view, then all of those functions won't be called automatically. But then again, that's kind of the whole point of mqtt::string_view, that it doesn't have the ownership information. So, it's probably totally fine. |
Could you show me actual working code ? It is easy to understand for me. You can edit https://wandbox.org/permlink/duR5ze8iicTJms9X and then click run and share button, then update the permalink. |
https://wandbox.org/permlink/8ohhfDwKHernxJhT
|
Thank you! I think that this approach is very good.
Keeping original length and modify the copy of : std::string_view(ptr.get()+offset, length-offset) It's so elegant! |
I think there is a bug in my constructor.
Should be Because |
I updated your example. See https://wandbox.org/permlink/v2ktJq5GsVeGdlbl I think that your original constructor is right. But the arguments order at buffer substr(std::size_t offset, std::size_t length)
{
// if offset and length result in going out of bounds, throw an exception.
return buffer(ptr_, offset_+offset, length);
} should be buffer substr(std::size_t offset, std::size_t length)
{
// if offset and length result in going out of bounds, throw an exception.
return buffer(ptr_, length_, offset);
} |
Still something wrong... |
Finally, I understand your constructor fix is right. In addition substr implementation should be fixed. First, I wrote a test for buffer: https://wandbox.org/permlink/YgLC8uTB6lKbq625 Then updated the code: https://wandbox.org/permlink/2n19bFiohTsBymbY It works as I expedted. class buffer : public std::string_view {
public:
buffer(std::shared_ptr<const char[]> ptr, std::size_t const length, std::size_t const offset = 0)
: std::string_view(ptr.get()+offset, length)
, ptr_(std::move(ptr))
, length_(length)
, offset_(offset)
{
// If offset > size, throw exception.
}
buffer substr(std::size_t offset, std::size_t length)
{
// if offset and length result in going out of bounds, throw an exception.
return buffer(ptr_, length, offset_+offset);
}
public:
std::shared_ptr<const char[]> ptr_;
std::size_t length_;
std::size_t offset_;
}; |
In order to correct out of range checking, I updated the buffer as follows:
But I'd like to confirm the purpose of offset. If offset is only a convenience for user reading the buffer, then If offset is for What do you think? |
The basic concept of my approach is member variables are always keep the initial values. And |
Offset would not be used by endpoint.jpp. Actually, we don't even need to store length (aside from informing string_view) because shared_ptr knows the size of the allocation internally, no need to carry that around on our own. |
How about this?
We could make There's no substantial reason that mqtt::buffer's shared_ptr needs to be what the provided string_view points to. It would be really strange if it wasn't, but any situation where mqtt::buffer converts to mqtt::string_view has no ownership semantics anyway, and there it doesn't matter what buffer the mqtt::string_view points to. So we don't need to force users of the code to construct the mqtt::buffer as a string_view to the entire buffer if they really don't want to. I'd say we can go even further and remove |
I suppose that we could also technically do this:
Not saying that we should. Just that I think that would work. Doing this provides user level code direct access to member functions of std::shared_ptr. Things like .get(), .reset(), and so on. |
I'm still thinking about the comment: This is a response to below 2 comments:
I don't understand above yet. Consider the publish_handler. Current code is using v5_publish_handler = std::function<
bool(std::uint8_t fixed_header,
mqtt::optional<packet_id_t> packet_id,
mqtt::string_view topic_name,
mqtt::string_view contents,
std::vector<v5::property_variant> props)
>; and our idea before the comment is using v5_publish_handler = std::function<
bool(std::uint8_t fixed_header,
mqtt::optional<packet_id_t> packet_id,
mqtt::buffer topic_name,
mqtt::buffer contents,
std::vector<v5::property_variant> props) // props contains mqtt::buffer
>; so far, so good. But if we removed shared_ptr from mqtt:buffer, then we need to provide shared_ptr to users as follows: using v5_publish_handler = std::function<
bool(std::uint8_t fixed_header,
mqtt::optional<packet_id_t> packet_id,
mqtt::buffer topic_name,
mqtt::shared_ptr_const_array sp_topic_name,
mqtt::buffer contents,
mqtt::shared_ptr_const_array sp_topic_name,
std::vector<v5::property_variant> props, // props contains mqtt::buffer
std::vector<mqtt::shared_ptr_const_array> sp_props)
>; because when we implement broker, publish message is copied to all subscribers, and in order to avoid copy, we need mqtt::shared_ptr_const_array. Am I missing something? |
Ah, I looked over this multiple inheritance. Never mind my comment above. class buffer : public mqtt::shared_ptr_const_array, public mqtt::string_view { |
I'm not sure it is worth to have. But I don't strongly disagree about that. It might help something unpredictable (at least for me currently) use case. However, as you add the following operator, template<class E, class T>
std::basic_ostream<E, T> & operator<< (std::basic_ostream<E, T> & os, buffer const & p)
{
os << static_cast<mqtt::string_view const&>(p);
return os;
} some of operator should be provided. Consider auto b = mqtt::buffer(sp, 8);
b == b; If we insert above comparison, a compile error is occurred. But advanced user might want to compare original pointer. It is the same as the ownership as long as we on't use shared_ptr's aliasing constructor https://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr (8) So I think that
|
See I added back length check and ownership comparison. The name |
BTW, if you don't mind, could you tell me where do you live? I'd like to know your timezone to communicate smoother. I live in Tokto, Japan. My timezone is JST. |
I'm in Chicago. United States. (CST) |
I think we can implement this using a finite state machine.
|
Thank you for the advice. I think it is a good approach too. In addition, it could help avoiding code duplication. If the message is small enough to receive all at once, we can call something like this I restart implementing by this way. |
How would we detect that the message is small enough to receive all at once? Is there some function that can be called from boost::asio to ask how much data is available to read immediately? The way I envisioned this working, is that each packet type has a known structure. So for packets like the connect packet, we would read the packet in this way:
|
It's unfortunate that the mqtt protocol doesn't list all of the length values at the beginning of the packet. Since that would allow us to call boost::asio::async_read with multiple different buffers all at once. But there's no way to do that with the mqtt protocol :( |
How about remaining length ? We need at most 5 times 1 byte read in the first phase. If If In chunk read mode, some case could be inefficient. However, same thing could happen properties. It has more various length. Fortunately, before each properties, we can read property length field https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901028. It is the same format as remaining length. If it is small enough, we can allocate all properties on one shared_ptr. That is recursive pattern of MQTT message. It is still just idea. I will write more PoC code and share with you. |
My answer is
It is remaining length. In addition we can use property length.
I don't understand why we need to know that. The version of |
Perfect! I understand now. If the total message size is below $limit, as specified by end-user code, then read the entire message as a single allocation. If the total message size is above $limit, then allocate each variable length object separately.
I didn't understand before. It is now clear that there is no need to do this. Using the "remaining_length" field of the packet fixed header will work the way you explained. |
I noticed that my PoC code "chunk read" and "buld read" print is upside down, sorry. Anyways, I want to share a subtle design choice. See void handle_receive(as::io_context& ioc, std::size_t size, std::size_t lim, cb_t cb) {
if (size < lim) {
std::cout << "chunk read" << std::endl; // misprint, actually buld read
auto buf = std::make_shared<std::string>("", size);
ioc.post(
[buf, cb = std::move(cb)] {
// Let's say buf is fullfilled
std::string_view v1(*buf);
auto r1 = step1(v1);
auto v2 = v1.substr(r1.consumed);
auto r2 = step2(v2);
auto v3 = v2.substr(r2.consumed);
auto r3 = step3(v3);
assert(r3.consumed == v3.size());
cb(r1.result, r2.result1, r2.result2, r3.result);
}
);
}
else {
std::cout << "bulk read" << std::endl;
handle_receive_impl(ioc, size, std::move(cb));
}
} I considered three choices here. At first, I want to use callback based approach both bulk read and chunk read because code duplication is minimized. However, there is problem. From where the callback should be called on bulk read. In the previous handler? It is not good because callback is stacking. If there are many properties or topic_filters on subscribe, then stack overflow could happen. The next idea is call Finally, I chose simple return value based approach. Sharing code is limited as |
I think 2. is the best. In the chunk read model, we're always using callbacks, so no matter what we need to have the various infrastructure to support the callbacks. At this link, the documentation of async-read says https://www.boost.org/doc/libs/1_65_0/doc/html/boost_asio/reference/async_read/overload1.html
Which implies that the function call does no reading of data from the stream until after it returns, before it calls the provided callback. This should mean that the operation for asynchronously reading the data is being added to the end of the io_context's queue, just like any other operation that uses
I don't believe that this is possible. boost::asio tcp sockets are "streams" of data. So if any new data arrives before we've finished reading from the stream, the data that has arrived will be en-queued to the back of the queue. The data that we are currently reading will stay in place. As long as we don't have any of the following:
Then I believe we are guaranteed to have the data read from the stream in order, with no possibility of messages overlapping. |
Nice comment and nice timing! I can save a lot of time :) I choose approach 2. |
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`. Added `""_mb` user defined literals for `buffer`. Added boost type_erasure. `mqtt::socket` is type erased socket. It covers tcp, tls, ws, and wss. Compile times become faster. Replaced static_cast with boost::numeric_cast if overflow could happen. Removed redundant static_cast. Implemented efficient shared_ptr_array allocation. If use boost (default), then use `boost::make_shared<char[]>(size)`. If user defines MQTT_STD_SHARED_PTR_ARRAY, if __cplusplus is greater than 201703L (means C++20 or later), then `std::make_shared<char[]>(size)`, otherwise `std::shared_ptr<char[]>(new size)` is called.
Added buffer class that supports `mqtt::string_view` compatible behavior and life keekping mechanism (optional). Callback functions for users hold receive buffer directly via `buffer`. Removed `*_ref` properties. Ref or not ref is hidden by `buffer`. Added `""_mb` user defined literals for `buffer`. Added boost type_erasure. `mqtt::socket` is type erased socket. It covers tcp, tls, ws, and wss. Compile times become faster. Replaced static_cast with boost::numeric_cast if overflow could happen. Removed redundant static_cast. Implemented efficient shared_ptr_array allocation. If use boost (default), then use `boost::make_shared<char[]>(size)`. If user defines MQTT_STD_SHARED_PTR_ARRAY, if __cplusplus is greater than 201703L (means C++20 or later), then `std::make_shared<char[]>(size)`, otherwise `std::shared_ptr<char[]>(new size)` is called.
This is related to #248, #191, and #289
I was working more with the v5 properties, and realized that when we receive the properties from the network we allocate a new buffer for the std::vector<> to store the mqtt::v5::property_varients, even if the property_varient only holds ref properties.
I'd like to see mqtt_cpp support zero allocations for handling the data from incoming packets.
We can do that with the following changes:
A more complicated, but "better" way to handle this is to:
Further, for the above handling of properties we have two ways to avoid allocating that std::vector.
We can either have std::vector<> pull it's storage from the same memory pool of char[]'s, or we can create a new custom data type "property_cursor" that has a pointer to the entire message, and iterates over the message on an as-needed basis to construct the property objects on the fly.
If we implement this by having each chunk of the message given to it's own buffer (more complicated, but would be better over all in my opinion), then we should have the std::vector<> use the memory pool as an allocator.
If we have each message stored in a single buffer, we should implement this using the "property_cursor" concept.
@redboltz I'd like to hear your thoughts on the matter.
The text was updated successfully, but these errors were encountered: