Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for raw gather buffer formatting #1271

Closed
vitaut opened this issue Aug 23, 2019 · 12 comments
Closed

Add support for raw gather buffer formatting #1271

vitaut opened this issue Aug 23, 2019 · 12 comments

Comments

@vitaut
Copy link
Contributor

vitaut commented Aug 23, 2019

This will make {fmt} work with llfio.

From ned14/llfio#32:

While I have you as the author of {fmt}, can I persuade you to add support for raw gather buffer formatting to {fmt}? Then one can do cool stuff like:

llfio::file_handle &fh;
fh.write(offset, fmt::out("System error code = {}\n", errno));

Here out(...) simply emits an iterable which iterates something smelling like span.

I appreciate your print() i/o function et al, but we can go much lower level without programmer inconvenience. Plus out() works immediately with Networking, LLFIO, anything which consumes gather buffers etc, no extra work.

@ned14
Copy link
Contributor

ned14 commented Aug 23, 2019

This will make {fmt} work with llfio.

And Networking TS. And Beast. And anything which takes raw gather buffers.

If you need to be convinced of the power of this design pattern, consider {fmt} as the backend for a binary logger:

binary_logger &log;
log.write("System error code = {}\n", errno);

This appends the logged items as binary to the logger. Such a logger would be a ring buffer stored in a llfio::mapped_file_handle.

To later convert binary logged items into a text representation:

llfio::mapped_file_handle &fh;  // opened append-only
for(auto &item : log)
{
  fh.write(0, fmt::out(item));
}

I vaguely remember that you intended to split the constexpr string format parser into a standalone library. Such a layer could parse the binary log statements for {fmt} correctness, and emit the gather buffers list as a list of variants to the binary logger, or perhaps something more compact. That gets appended to the logger mapped file.

I guess what I'm actually saying here is that the true power of this design pattern is polymorphic gather buffer formatting, rather than gather buffers of byte arrays. Then you can "squirrel away" a {fmt} into an efficient intermediate representation in deterministic time, and later complete the {fmt} at a time when indeterministic execution time is okay.

I hope I have explained myself okay. It is Friday, and I have a chest infection, so rather looking forward to end of work day. It's been a long day.

@vitaut
Copy link
Contributor Author

vitaut commented Aug 25, 2019

@ned14, what will be the difference between fmt::out and fmt::format? The latter returns std::string which is already an iterable.

I vaguely remember that you intended to split the constexpr string format parser into a standalone library.

Not a separate library - I am planning to make it part of the public API. Currently it's internal to {fmt}.

@ned14
Copy link
Contributor

ned14 commented Aug 26, 2019

std::string involves a dynamic memory allocation, which is unacceptable to a lot of folk, as worst case execution time is seconds.

fmt::out would return a type sufficiently large to store all the gather buffers, and thus one avoids the dynamic memory allocation. If formatting to hexadecimal or octal, formatting numbers ought to have good worst case execution times due to avoiding integer division.

You may think that i/o would be a lot more expensive than malloc, but while this may be true for the average case, it may not be for the worst case. i/o can be configured to have worst case execution times of about 400-500 CPU cycles, which is the TLB shootdown overhead used to detect page dirtying.

Avoiding dynamic memory allocation may seem a bit niche, but I would regularly work on codebases where a link time check checks for any use of dynamic memory allocation, and it fails the build. Thus if {fmt} relies on dynamic memory allocation, people in those codebases can never use it.

Edit: Copy and pasting notes from Reddit discussion thread here:

How can we always know the size (and therefore avoid a dynamic allocation and use a fixed size type) for the above fmt::out return value if for example errno was replaced by a std::string that we only knew the value of at runtime?

For input of type int formatted as decimal, we know for a fact that its individual formatted buffer will never exceed 11 bytes (INT_MIN).

We thus can store that individual buffer internally in the type returned by fmt::out().

I was assuming the iterable-like object returned by fmt::out doesn't for example have a pointer into the string (we know for this fmt string there is only 1 so we know size at compile time)?

fmt::out("System error code = {}\n", errno) would output a gather buffer list similar to:

{
  { "System error code =", 19 },
  { errno_buffer, errno_buffer_len },
  { "\n", 1}
}

Does this make more sense?

Or as you mention later in the github comments is this only meant to be avoiding dynamic allocation using a variant of known bounded types where we know the max size of a formatted int, float, bool or some such?

Partially. But remember gather buffers themselves can contain dynamic length. So, if you format a string with a std::string insert, the gather buffers output simply refer to {std::string::c_str(), std::string::size()}.

Me personally, I'd restrict fmt::out() to inputs with formatting not requiring dynamic memory allocation, and if people want to write out to i/o more complex types, they'd write {fmt::format(...).c_str(), fmt::format(...).size()} i.e. format to std::string, then write out that buffer.

@Sarcasm
Copy link

Sarcasm commented Aug 27, 2019

Out of curiosity, isn't possible to plug in LLFIO and {fmt} with an output iterator and format_to?

Very likely incorrect code:

llfio::file_handle &fh;
fmt::format_to(back_inserter_at_offset(fh, offset),
               "System error code = {}\n", errno));

I guess while it may be possible, it would inefficient, right?

@ned14
Copy link
Contributor

ned14 commented Aug 27, 2019

I guess while it may be possible, it would inefficient, right?

llfio::file_handle is deliberately not iterable in order to prevent people iterating on it :)

llfio::file_handle is like basic_socket in the Networking TS, it provides scatter-gather buffer i/o functions, and nothing more. This is because the OS kernel provides scatter-gather buffer i/o functions, and at the low level i/o layer, we add nothing to what the OS kernel provides.

Any buffering is best implemented by higher level code, for example forthcoming Ranges i/o, you see. There Ranges i/o might keep two 4Kb buffers per i/o stream, and double buffer each with O_DIRECT in combination with coroutines for maximum efficiency. But Ranges i/o is probably C++ 26, and in any case in my opinion a lot of codebase will refuse to use Ranges until their compile-time impact becomes acceptable. So, as a stopgap, if {fmt} could "speak gather buffer", especially non-dynamically-allocating, that would be quite useful indeed.

@vitaut
Copy link
Contributor Author

vitaut commented Sep 2, 2019

fmt::out would return a type sufficiently large to store all the gather buffers

Could you point me to the documentation of the gather buffer API to help me understand what exactly fmt::out should return?

and thus one avoids the dynamic memory allocation.

It is possible to avoid dynamic memory allocation with format_to and format_to_n. For example

fmt::memory_buffer buf;
fmt::format_to(buf, "System error code = {}\n", errno);
fh.write(buf); // possibly need to adapt buf to whatever write expects

doesn't do any memory allocations because the output fits into the inline storage of memory_buffer.

avoiding integer division

Decent compilers replace integer division by a constant with an integer multiplication (although that is not very fast either) and {fmt} further minimizes the number of such costly operations.

Avoiding dynamic memory allocation may seem a bit niche, but I would regularly work on codebases where a link time check checks for any use of dynamic memory allocation, and it fails the build. Thus if {fmt} relies on dynamic memory allocation, people in those codebases can never use it.

With {fmt} you can avoid memory allocations. However, the output can be arbitrarily large and not known at compile time, so in general to avoid allocation you'd need to truncate or severely limit the inputs. There are parts of the library that use memory allocation and I am not interested in making such linkage work on those niche systems, but would accept a PR if it's not too intrusive.

For input of type int formatted as decimal, we know for a fact that its individual formatted buffer will never exceed 11 bytes (INT_MIN).

This is only true in trivial cases. If you specify width or precision (for FP) then the output can be arbitrarily large.

I'd restrict fmt::out() to inputs with formatting not requiring dynamic memory allocation

I think this is a non-starter. We shouldn't be butchering the formatting API for some esoteric use case. At most we should provide an infra to make it possible to do it themselves.

@ned14
Copy link
Contributor

ned14 commented Sep 2, 2019

I quickly threw together a mockup at https://wandbox.org/permlink/l3bSQTYBZDOvd92D.

This is only true in trivial cases. If you specify width or precision (for FP) then the output can be arbitrarily large.

Can you give an example of where formatting would have an unbounded length temporary?

Actually, lest wandbox vanish, better to copy & paste it here:

#include <variant>
#include <string>
#include <tuple>
#include <memory>
#include <array>
#include <iostream>

#include <sys/uio.h>
#include <string.h>

struct const_buffer_type { const char *data; size_t len; };

template<class T> struct out_holder_item;
template<> struct out_holder_item<int>
{
  const int &value;
  char _buffer[11];
  size_t _length{0};
  out_holder_item(const int &v) : value(v) {}
  const_buffer_type render() {
    _length = sprintf(_buffer, "%d", value);
    return get();
  }
  const_buffer_type get() const { return {_buffer, _length};}
};
template<> struct out_holder_item<std::string>
{
  const std::string &value;
  out_holder_item(const std::string &v) : value(v) {}
  const_buffer_type render() { return get(); }
  const_buffer_type get() const { return { value.c_str(), value.size()};}
};
template<> struct out_holder_item<const char *>
{
  const char * value;
  size_t _length{0};
  out_holder_item(const char * v) : value(v) {}
  const_buffer_type render() { _length = strlen(value); return get(); }
  const_buffer_type get() const { return { value, _length};}
};

template<class... Args> using snapshot_holder = std::tuple<out_holder_item<Args>...>;

template<class Array, class... Args, std::size_t... Is> void render(Array &buffers, snapshot_holder<Args...> &snapshot, std::index_sequence<Is...>)
{
  ((buffers[Is] = std::get<Is>(snapshot).render()), ...);
}


template<class Arg> snapshot_holder<std::decay_t<Arg>> snapshot(Arg&& arg)
{
  return snapshot_holder<std::decay_t<Arg>>(out_holder_item<std::decay_t<Arg>>(std::forward<Arg>(arg)));
}
template<class Arg, class... Args> snapshot_holder<std::decay_t<Arg>, std::decay_t<Args>...> snapshot(Arg&&arg, Args&&... args)
{
  return std::tuple_cat(snapshot(std::forward<Arg>(arg)), snapshot(std::forward<Args>(args)...));
}

template<class...Args> struct out_holder
{
  using snapshot_type = snapshot_holder<std::decay_t<Args>...>;
  snapshot_type snapshot;
  // Gather buffers MUST be contiguous in memory
  std::array<const_buffer_type, sizeof...(Args)> buffers;

  out_holder(snapshot_type &&s) : snapshot(std::move(s)) {}
  const_buffer_type *data() const { return buffers.data();}
  size_t size() const { return buffers.size();}
  auto begin() const { return buffers.begin(); }
  auto end() const { return buffers.end(); }
};

template<class...Args> out_holder<Args...> out(Args&&... args)
{
  out_holder<Args...> ret (snapshot(std::forward<Args>(args)...));
  render(ret.buffers, ret.snapshot, std::index_sequence_for<Args...>{});
  return ret;
}

int main()
{
  auto buffers = out("System error code = ", 5, ", and my string is: ", std::string("hello"));
  std::cout << buffers.size() << std::endl;
  for(auto &buffer : buffers)
  {
    std::cout << std::string_view(buffer.data, buffer.len);
  }
  return 0;
}

@Sarcasm
Copy link

Sarcasm commented Sep 2, 2019

Could you point me to the documentation of the gather buffer API to help me understand what exactly fmt::out should return?

Some documentation for the Posix API:

  • a struct iovec to hold a pointer + size
  • a writev() system call, to write an array of iovec structures to a file descriptor

@vitaut
Copy link
Contributor Author

vitaut commented Sep 3, 2019

@ned14, thanks a lot for the example, it is super helpful. I think we can do something along those lines in {fmt} and avoid memory allocations in almost all cases.

Can you give an example of where formatting would have an unbounded length temporary?

auto s = fmt::format("{:1000}", 42);

This will print 42 padded to the total width of 1000 with spaces, i.e. s.size() == 1000. The width can be arbitrarily large and either come from the format string (which can be dynamic) or another argument. Same for precision although large precision doesn't make much sense.

Thanks for the pointers, @Sarcasm.

@ned14
Copy link
Contributor

ned14 commented Sep 3, 2019

@ned14, thanks a lot for the example, it is super helpful. I think we can do something along those lines in {fmt} and avoid memory allocations in almost all cases.

In my example, I poorly tried to split the implementation into three layers. There is a "value recording snapshot layer", which simply dumps what would be formatted. The idea is that a binary logger can deterministically dump what would be formatted very quickly. The next layer .render() does the expensive step, which is the formatting of values into text. It could be done in an async thread, or whatever. The final layer is the i/o layer, which hands the array of struct iovec of gather buffers to whatever will do the i/o write. I apologise for the poor quality of the example, it was rushed, but you get the idea.

This will print 42 padded to the total width of 1000 with spaces, i.e. s.size() == 1000. The width can be arbitrarily large and either come from the format string (which can be dynamic) or another argument. Same for precision although large precision doesn't make much sense.

Ah, you've convinced me. If the formatted value exceeds some reasonable bound (256 bytes?), dynamic memory allocation looks very reasonable.

@gabime
Copy link
Contributor

gabime commented Dec 20, 2019

Not sure if this worth the extra complexity in fmt.
For example In spdlog I am perfectly fine with current api to avoid allocations , by using “format_to” into existing buffer.

@vitaut
Copy link
Contributor Author

vitaut commented Jan 12, 2021

So to summarize: the suggestion here is to use multiple buffers instead of a single one to avoid copy in some cases (or rather move copy further down the stack). Closing as it will likely be a regression for the common case of usual message formatting and will add significant complexity. The async formatting can be done without any of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants