[libc] Template the writing mode for the writer class #111559

jhuber6 · 2024-10-08T17:06:45Z

Summary:
Currently we dispatch the writing mode off of a runtime enum passed in
by the constructor. This causes very unfortunate codegen for the GPU
targets where we get worst-case codegen because of the unused function
pointer for sprintf. Instead, this patch moves all of this to a
template so it can be masked out. This results in no dynamic stack and
uses 60 VGPRs instead of 117. It also compiles about 5x as fast.

llvmbot · 2024-10-08T17:07:24Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-libc

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently we dispatch the writing mode off of a runtime enum passed in
by the constructor. This causes very unfortunate codegen for the GPU
targets where we get worst-case codegen because of the unused function
pointer for sprintf. Instead, this patch moves all of this to a
template so it can be masked out. This results in no dynamic stack and
uses 60 VGPRs instead of 117. It also compiles about 5x as fast.

Patch is 51.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111559.diff

31 Files Affected:

(modified) libc/src/stdio/printf_core/CMakeLists.txt (+3-9)
(modified) libc/src/stdio/printf_core/char_converter.h (+3-1)
(removed) libc/src/stdio/printf_core/converter.cpp (-105)
(modified) libc/src/stdio/printf_core/converter.h (+84-1)
(modified) libc/src/stdio/printf_core/fixed_converter.h (+3-1)
(modified) libc/src/stdio/printf_core/float_dec_converter.h (+41-19)
(modified) libc/src/stdio/printf_core/float_hex_converter.h (+2-1)
(modified) libc/src/stdio/printf_core/float_inf_nan_converter.h (+3-1)
(modified) libc/src/stdio/printf_core/int_converter.h (+3-1)
(removed) libc/src/stdio/printf_core/printf_main.cpp (-43)
(modified) libc/src/stdio/printf_core/printf_main.h (+22-2)
(modified) libc/src/stdio/printf_core/ptr_converter.h (+3-1)
(modified) libc/src/stdio/printf_core/strerror_converter.h (+3-1)
(modified) libc/src/stdio/printf_core/string_converter.h (+3-1)
(modified) libc/src/stdio/printf_core/vasprintf_internal.h (+8-7)
(modified) libc/src/stdio/printf_core/vfprintf_internal.h (+3-3)
(modified) libc/src/stdio/printf_core/write_int_converter.h (+2-1)
(removed) libc/src/stdio/printf_core/writer.cpp (-46)
(modified) libc/src/stdio/printf_core/writer.h (+67-40)
(modified) libc/src/stdio/snprintf.cpp (+3-2)
(modified) libc/src/stdio/sprintf.cpp (+3-2)
(modified) libc/src/stdio/vsnprintf.cpp (+3-2)
(modified) libc/src/stdio/vsprintf.cpp (+3-2)
(modified) libc/src/stdlib/str_from_util.h (+2-2)
(modified) libc/src/stdlib/strfromd.cpp (+3-2)
(modified) libc/src/stdlib/strfromf.cpp (+3-2)
(modified) libc/src/stdlib/strfroml.cpp (+3-2)
(modified) libc/test/src/stdio/printf_core/converter_test.cpp (+8-4)
(modified) libc/test/src/stdio/printf_core/writer_test.cpp (+43-37)
(modified) libc/utils/gpu/server/CMakeLists.txt (+1-5)
(modified) libc/utils/gpu/server/rpc_server.cpp (+7-6)

diff --git a/libc/src/stdio/printf_core/CMakeLists.txt b/libc/src/stdio/printf_core/CMakeLists.txt
index 8172fda1e866ed..4c9a9f5ceddb2f 100644
--- a/libc/src/stdio/printf_core/CMakeLists.txt
+++ b/libc/src/stdio/printf_core/CMakeLists.txt
@@ -59,10 +59,8 @@ add_header_library(
     libc.src.__support.common
 )
 
-add_object_library(
+add_header_library(
   writer
-  SRCS
-    writer.cpp
   HDRS
     writer.h
   DEPENDS
@@ -73,10 +71,8 @@ add_object_library(
     libc.src.string.memory_utils.inline_memset
 )
 
-add_object_library(
+add_header_library(
   converter
-  SRCS
-    converter.cpp
   HDRS
     converter.h
     converter_atlas.h
@@ -110,10 +106,8 @@ add_object_library(
     libc.src.__support.StringUtil.error_to_string
 )
 
-add_object_library(
+add_header_library(
   printf_main
-  SRCS
-    printf_main.cpp
   HDRS
     printf_main.h
   DEPENDS
diff --git a/libc/src/stdio/printf_core/char_converter.h b/libc/src/stdio/printf_core/char_converter.h
index 2596cba813c2e9..fd2eb2553887a8 100644
--- a/libc/src/stdio/printf_core/char_converter.h
+++ b/libc/src/stdio/printf_core/char_converter.h
@@ -17,7 +17,9 @@
 namespace LIBC_NAMESPACE_DECL {
 namespace printf_core {
 
-LIBC_INLINE int convert_char(Writer *writer, const FormatSection &to_conv) {
+template <WriteMode write_mode>
+LIBC_INLINE int convert_char(Writer<write_mode> *writer,
+                             const FormatSection &to_conv) {
   char c = static_cast<char>(to_conv.conv_val_raw);
 
   constexpr int STRING_LEN = 1;
diff --git a/libc/src/stdio/printf_core/converter.cpp b/libc/src/stdio/printf_core/converter.cpp
deleted file mode 100644
index b1c66451f53f0f..00000000000000
--- a/libc/src/stdio/printf_core/converter.cpp
+++ /dev/null
@@ -1,105 +0,0 @@
-//===-- Format specifier converter implmentation for printf -----*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#include "src/stdio/printf_core/converter.h"
-
-#include "src/__support/macros/config.h"
-#include "src/stdio/printf_core/core_structs.h"
-#include "src/stdio/printf_core/printf_config.h"
-#include "src/stdio/printf_core/strerror_converter.h"
-#include "src/stdio/printf_core/writer.h"
-
-// This option allows for replacing all of the conversion functions with custom
-// replacements. This allows conversions to be replaced at compile time.
-#ifndef LIBC_COPT_PRINTF_CONV_ATLAS
-#include "src/stdio/printf_core/converter_atlas.h"
-#else
-#include LIBC_COPT_PRINTF_CONV_ATLAS
-#endif
-
-#include <stddef.h>
-
-namespace LIBC_NAMESPACE_DECL {
-namespace printf_core {
-
-int convert(Writer *writer, const FormatSection &to_conv) {
-  if (!to_conv.has_conv)
-    return writer->write(to_conv.raw_string);
-
-#if !defined(LIBC_COPT_PRINTF_DISABLE_FLOAT) &&                                \
-    defined(LIBC_COPT_PRINTF_HEX_LONG_DOUBLE)
-  if (to_conv.length_modifier == LengthModifier::L) {
-    switch (to_conv.conv_name) {
-    case 'f':
-    case 'F':
-    case 'e':
-    case 'E':
-    case 'g':
-    case 'G':
-      return convert_float_hex_exp(writer, to_conv);
-    default:
-      break;
-    }
-  }
-#endif // LIBC_COPT_PRINTF_DISABLE_FLOAT
-
-  switch (to_conv.conv_name) {
-  case '%':
-    return writer->write("%");
-  case 'c':
-    return convert_char(writer, to_conv);
-  case 's':
-    return convert_string(writer, to_conv);
-  case 'd':
-  case 'i':
-  case 'u':
-  case 'o':
-  case 'x':
-  case 'X':
-  case 'b':
-  case 'B':
-    return convert_int(writer, to_conv);
-#ifndef LIBC_COPT_PRINTF_DISABLE_FLOAT
-  case 'f':
-  case 'F':
-    return convert_float_decimal(writer, to_conv);
-  case 'e':
-  case 'E':
-    return convert_float_dec_exp(writer, to_conv);
-  case 'a':
-  case 'A':
-    return convert_float_hex_exp(writer, to_conv);
-  case 'g':
-  case 'G':
-    return convert_float_dec_auto(writer, to_conv);
-#endif // LIBC_COPT_PRINTF_DISABLE_FLOAT
-#ifdef LIBC_INTERNAL_PRINTF_HAS_FIXED_POINT
-  case 'r':
-  case 'R':
-  case 'k':
-  case 'K':
-    return convert_fixed(writer, to_conv);
-#endif // LIBC_INTERNAL_PRINTF_HAS_FIXED_POINT
-#ifndef LIBC_COPT_PRINTF_DISABLE_STRERROR
-  case 'm':
-    return convert_strerror(writer, to_conv);
-#endif // LIBC_COPT_PRINTF_DISABLE_STRERROR
-#ifndef LIBC_COPT_PRINTF_DISABLE_WRITE_INT
-  case 'n':
-    return convert_write_int(writer, to_conv);
-#endif // LIBC_COPT_PRINTF_DISABLE_WRITE_INT
-  case 'p':
-    return convert_pointer(writer, to_conv);
-  default:
-    return writer->write(to_conv.raw_string);
-  }
-  return -1;
-}
-
-} // namespace printf_core
-} // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdio/printf_core/converter.h b/libc/src/stdio/printf_core/converter.h
index 2b3f06d0aa7a36..f26ed727f05f40 100644
--- a/libc/src/stdio/printf_core/converter.h
+++ b/libc/src/stdio/printf_core/converter.h
@@ -11,8 +11,18 @@
 
 #include "src/__support/macros/config.h"
 #include "src/stdio/printf_core/core_structs.h"
+#include "src/stdio/printf_core/printf_config.h"
+#include "src/stdio/printf_core/strerror_converter.h"
 #include "src/stdio/printf_core/writer.h"
 
+// This option allows for replacing all of the conversion functions with custom
+// replacements. This allows conversions to be replaced at compile time.
+#ifndef LIBC_COPT_PRINTF_CONV_ATLAS
+#include "src/stdio/printf_core/converter_atlas.h"
+#else
+#include LIBC_COPT_PRINTF_CONV_ATLAS
+#endif
+
 #include <stddef.h>
 
 namespace LIBC_NAMESPACE_DECL {
@@ -21,7 +31,80 @@ namespace printf_core {
 // convert will call a conversion function to convert the FormatSection into
 // its string representation, and then that will write the result to the
 // writer.
-int convert(Writer *writer, const FormatSection &to_conv);
+template <WriteMode write_mode>
+int convert(Writer<write_mode> *writer, const FormatSection &to_conv) {
+  if (!to_conv.has_conv)
+    return writer->write(to_conv.raw_string);
+
+#if !defined(LIBC_COPT_PRINTF_DISABLE_FLOAT) &&                                \
+    defined(LIBC_COPT_PRINTF_HEX_LONG_DOUBLE)
+  if (to_conv.length_modifier == LengthModifier::L) {
+    switch (to_conv.conv_name) {
+    case 'f':
+    case 'F':
+    case 'e':
+    case 'E':
+    case 'g':
+    case 'G':
+      return convert_float_hex_exp(writer, to_conv);
+    default:
+      break;
+    }
+  }
+#endif // LIBC_COPT_PRINTF_DISABLE_FLOAT
+
+  switch (to_conv.conv_name) {
+  case '%':
+    return writer->write("%");
+  case 'c':
+    return convert_char(writer, to_conv);
+  case 's':
+    return convert_string(writer, to_conv);
+  case 'd':
+  case 'i':
+  case 'u':
+  case 'o':
+  case 'x':
+  case 'X':
+  case 'b':
+  case 'B':
+    return convert_int(writer, to_conv);
+#ifndef LIBC_COPT_PRINTF_DISABLE_FLOAT
+  case 'f':
+  case 'F':
+    return convert_float_decimal(writer, to_conv);
+  case 'e':
+  case 'E':
+    return convert_float_dec_exp(writer, to_conv);
+  case 'a':
+  case 'A':
+    return convert_float_hex_exp(writer, to_conv);
+  case 'g':
+  case 'G':
+    return convert_float_dec_auto(writer, to_conv);
+#endif // LIBC_COPT_PRINTF_DISABLE_FLOAT
+#ifdef LIBC_INTERNAL_PRINTF_HAS_FIXED_POINT
+  case 'r':
+  case 'R':
+  case 'k':
+  case 'K':
+    return convert_fixed(writer, to_conv);
+#endif // LIBC_INTERNAL_PRINTF_HAS_FIXED_POINT
+#ifndef LIBC_COPT_PRINTF_DISABLE_STRERROR
+  case 'm':
+    return convert_strerror(writer, to_conv);
+#endif // LIBC_COPT_PRINTF_DISABLE_STRERROR
+#ifndef LIBC_COPT_PRINTF_DISABLE_WRITE_INT
+  case 'n':
+    return convert_write_int(writer, to_conv);
+#endif // LIBC_COPT_PRINTF_DISABLE_WRITE_INT
+  case 'p':
+    return convert_pointer(writer, to_conv);
+  default:
+    return writer->write(to_conv.raw_string);
+  }
+  return -1;
+}
 
 } // namespace printf_core
 } // namespace LIBC_NAMESPACE_DECL
diff --git a/libc/src/stdio/printf_core/fixed_converter.h b/libc/src/stdio/printf_core/fixed_converter.h
index c8812d77b62e34..6013c36b4cf3e9 100644
--- a/libc/src/stdio/printf_core/fixed_converter.h
+++ b/libc/src/stdio/printf_core/fixed_converter.h
@@ -62,7 +62,9 @@ LIBC_INLINE constexpr uint32_t const_ten_exp(uint32_t exponent) {
     }                                                                          \
   } while (false)
 
-LIBC_INLINE int convert_fixed(Writer *writer, const FormatSection &to_conv) {
+template <WriteMode write_mode>
+LIBC_INLINE int convert_fixed(Writer<write_mode> *writer,
+                              const FormatSection &to_conv) {
   // Long accum should be the largest type, so we can store all the smaller
   // numbers in things sized for it.
   using LARep = fixed_point::FXRep<unsigned long accum>;
diff --git a/libc/src/stdio/printf_core/float_dec_converter.h b/libc/src/stdio/printf_core/float_dec_converter.h
index e39ba6ecea8d48..fd729f67d4dda5 100644
--- a/libc/src/stdio/printf_core/float_dec_converter.h
+++ b/libc/src/stdio/printf_core/float_dec_converter.h
@@ -92,7 +92,7 @@ zero_after_digits(int32_t base_2_exp, int32_t digits_after_point, T mantissa,
   return has_trailing_zeros;
 }
 
-class PaddingWriter {
+template <WriteMode write_mode> class PaddingWriter {
   bool left_justified = false;
   bool leading_zeroes = false;
   char sign_char = 0;
@@ -106,7 +106,8 @@ class PaddingWriter {
         sign_char(init_sign_char),
         min_width(to_conv.min_width > 0 ? to_conv.min_width : 0) {}
 
-  LIBC_INLINE int write_left_padding(Writer *writer, size_t total_digits) {
+  LIBC_INLINE int write_left_padding(Writer<write_mode> *writer,
+                                     size_t total_digits) {
     // The pattern is (spaces) (sign) (zeroes), but only one of spaces and
     // zeroes can be written, and only if the padding amount is positive.
     int padding_amount =
@@ -129,7 +130,8 @@ class PaddingWriter {
     return 0;
   }
 
-  LIBC_INLINE int write_right_padding(Writer *writer, size_t total_digits) {
+  LIBC_INLINE int write_right_padding(Writer<write_mode> *writer,
+                                      size_t total_digits) {
     // If and only if the conversion is left justified, there may be trailing
     // spaces.
     int padding_amount =
@@ -154,7 +156,7 @@ class PaddingWriter {
   This FloatWriter class does the buffering and counting, and writes to the
   output when necessary.
 */
-class FloatWriter {
+template <WriteMode write_mode> class FloatWriter {
   char block_buffer[BLOCK_SIZE]; // The buffer that holds a block.
   size_t buffered_digits = 0;    // The number of digits held in the buffer.
   bool has_written = false;      // True once any digits have been output.
@@ -163,8 +165,9 @@ class FloatWriter {
   size_t digits_before_decimal = 0; // The # of digits to write before the '.'
   size_t total_digits_written = 0;  // The # of digits that have been output.
   bool has_decimal_point;           // True if the number has a decimal point.
-  Writer *writer;                   // Writes to the final output.
-  PaddingWriter padding_writer; // Handles prefixes/padding, uses total_digits.
+  Writer<write_mode> *writer;       // Writes to the final output.
+  PaddingWriter<write_mode>
+      padding_writer; // Handles prefixes/padding, uses total_digits.
 
   LIBC_INLINE int flush_buffer(bool round_up_max_blocks = false) {
     const char MAX_BLOCK_DIGIT = (round_up_max_blocks ? '0' : '9');
@@ -244,8 +247,9 @@ class FloatWriter {
   static_assert(fputil::FPBits<long double>::EXP_LEN < (sizeof(int) * 8));
 
 public:
-  LIBC_INLINE FloatWriter(Writer *init_writer, bool init_has_decimal_point,
-                          const PaddingWriter &init_padding_writer)
+  LIBC_INLINE FloatWriter(Writer<write_mode> *init_writer,
+                          bool init_has_decimal_point,
+                          const PaddingWriter<write_mode> &init_padding_writer)
       : has_decimal_point(init_has_decimal_point), writer(init_writer),
         padding_writer(init_padding_writer) {}
 
@@ -465,12 +469,24 @@ class FloatWriter {
   }
 };
 
+// Class-template auto deduction helpers.
+FloatWriter(Writer<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW>, bool,
+            const PaddingWriter<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW>)
+    -> FloatWriter<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW>;
+FloatWriter(Writer<WriteMode::RESIZE_AND_FILL_BUFF>, bool,
+            const PaddingWriter<WriteMode::RESIZE_AND_FILL_BUFF>)
+    -> FloatWriter<WriteMode::RESIZE_AND_FILL_BUFF>;
+FloatWriter(Writer<WriteMode::FLUSH_TO_STREAM>, bool,
+            const PaddingWriter<WriteMode::FLUSH_TO_STREAM>)
+    -> FloatWriter<WriteMode::FLUSH_TO_STREAM>;
+
 // This implementation is based on the Ryu Printf algorithm by Ulf Adams:
 // Ulf Adams. 2019. Ryū revisited: printf floating point conversion.
 // Proc. ACM Program. Lang. 3, OOPSLA, Article 169 (October 2019), 23 pages.
 // https://doi.org/10.1145/3360595
-template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
-LIBC_INLINE int convert_float_decimal_typed(Writer *writer,
+template <typename T, WriteMode write_mode,
+          cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
+LIBC_INLINE int convert_float_decimal_typed(Writer<write_mode> *writer,
                                             const FormatSection &to_conv,
                                             fputil::FPBits<T> float_bits) {
   // signed because later we use -FRACTION_LEN
@@ -497,7 +513,7 @@ LIBC_INLINE int convert_float_decimal_typed(Writer *writer,
   // ignored.
   bool nonzero = false;
 
-  PaddingWriter padding_writer(to_conv, sign_char);
+  PaddingWriter<write_mode> padding_writer(to_conv, sign_char);
   FloatWriter float_writer(writer, has_decimal_point, padding_writer);
   FloatToString<T> float_converter(float_bits.get_val());
 
@@ -578,8 +594,9 @@ LIBC_INLINE int convert_float_decimal_typed(Writer *writer,
   return WRITE_OK;
 }
 
-template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
-LIBC_INLINE int convert_float_dec_exp_typed(Writer *writer,
+template <typename T, WriteMode write_mode,
+          cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
+LIBC_INLINE int convert_float_dec_exp_typed(Writer<write_mode> *writer,
                                             const FormatSection &to_conv,
                                             fputil::FPBits<T> float_bits) {
   // signed because later we use -FRACTION_LEN
@@ -604,7 +621,7 @@ LIBC_INLINE int convert_float_dec_exp_typed(Writer *writer,
   bool has_decimal_point =
       (precision > 0) || ((to_conv.flags & FormatFlags::ALTERNATE_FORM) != 0);
 
-  PaddingWriter padding_writer(to_conv, sign_char);
+  PaddingWriter<write_mode> padding_writer(to_conv, sign_char);
   FloatWriter float_writer(writer, has_decimal_point, padding_writer);
   FloatToString<T> float_converter(float_bits.get_val());
 
@@ -740,8 +757,9 @@ LIBC_INLINE int convert_float_dec_exp_typed(Writer *writer,
   return WRITE_OK;
 }
 
-template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
-LIBC_INLINE int convert_float_dec_auto_typed(Writer *writer,
+template <typename T, WriteMode write_mode,
+          cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
+LIBC_INLINE int convert_float_dec_auto_typed(Writer<write_mode> *writer,
                                              const FormatSection &to_conv,
                                              fputil::FPBits<T> float_bits) {
   // signed because later we use -FRACTION_LEN
@@ -1107,7 +1125,9 @@ LIBC_INLINE int convert_float_dec_auto_typed(Writer *writer,
 }
 
 // TODO: unify the float converters to remove the duplicated checks for inf/nan.
-LIBC_INLINE int convert_float_decimal(Writer *writer,
+
+template <WriteMode write_mode>
+LIBC_INLINE int convert_float_decimal(Writer<write_mode> *writer,
                                       const FormatSection &to_conv) {
   if (to_conv.length_modifier == LengthModifier::L) {
     fputil::FPBits<long double>::StorageType float_raw = to_conv.conv_val_raw;
@@ -1128,7 +1148,8 @@ LIBC_INLINE int convert_float_decimal(Writer *writer,
   return convert_inf_nan(writer, to_conv);
 }
 
-LIBC_INLINE int convert_float_dec_exp(Writer *writer,
+template <WriteMode write_mode>
+LIBC_INLINE int convert_float_dec_exp(Writer<write_mode> *writer,
                                       const FormatSection &to_conv) {
   if (to_conv.length_modifier == LengthModifier::L) {
     fputil::FPBits<long double>::StorageType float_raw = to_conv.conv_val_raw;
@@ -1149,7 +1170,8 @@ LIBC_INLINE int convert_float_dec_exp(Writer *writer,
   return convert_inf_nan(writer, to_conv);
 }
 
-LIBC_INLINE int convert_float_dec_auto(Writer *writer,
+template <WriteMode write_mode>
+LIBC_INLINE int convert_float_dec_auto(Writer<write_mode> *writer,
                                        const FormatSection &to_conv) {
   if (to_conv.length_modifier == LengthModifier::L) {
     fputil::FPBits<long double>::StorageType float_raw = to_conv.conv_val_raw;
diff --git a/libc/src/stdio/printf_core/float_hex_converter.h b/libc/src/stdio/printf_core/float_hex_converter.h
index 0b3ff3dd1cbfdc..3d41f8a45b0bb3 100644
--- a/libc/src/stdio/printf_core/float_hex_converter.h
+++ b/libc/src/stdio/printf_core/float_hex_converter.h
@@ -24,7 +24,8 @@
 namespace LIBC_NAMESPACE_DECL {
 namespace printf_core {
 
-LIBC_INLINE int convert_float_hex_exp(Writer *writer,
+template <WriteMode write_mode>
+LIBC_INLINE int convert_float_hex_exp(Writer<write_mode> *writer,
                                       const FormatSection &to_conv) {
   using LDBits = fputil::FPBits<long double>;
   using StorageType = LDBits::StorageType;
diff --git a/libc/src/stdio/printf_core/float_inf_nan_converter.h b/libc/src/stdio/printf_core/float_inf_nan_converter.h
index a7da682b835bee..8988d6474bf409 100644
--- a/libc/src/stdio/printf_core/float_inf_nan_converter.h
+++ b/libc/src/stdio/printf_core/float_inf_nan_converter.h
@@ -23,7 +23,9 @@ namespace printf_core {
 
 using StorageType = fputil::FPBits<long double>::StorageType;
 
-LIBC_INLINE int convert_inf_nan(Writer *writer, const FormatSection &to_conv) {
+template <WriteMode write_mode>
+LIBC_INLINE int convert_inf_nan(Writer<write_mode> *writer,
+                                const FormatSection &to_conv) {
   // All of the letters will be defined relative to variable a, which will be
   // the appropriate case based on the case of the conversion.
   const char a = (to_conv.conv_name & 32) | 'A';
diff --git a/libc/src/stdio/printf_core/int_converter.h b/libc/src/stdio/printf_core/int_converter.h
index f345e86b97a691..22f4dd627a104b 100644
--- a/libc/src/stdio/printf_core/int_converter.h
+++ b/libc/src/stdio/printf_core/int_converter.h
@@ -65,7 +65,9 @@ num_to_strview(uintmax_t num, cpp::span<char> bufref, char conv_name) {
 
 } // namespace details
 
-LIBC_INLINE int convert_int(Writer *writer, const FormatSection &to_conv) {
+template <WriteMode write_mode>
+LIBC_INLINE int convert_int(Writer<write_mode> *writer,
+                            const FormatSection &to_conv) {
   static constexpr size_t BITS_IN_BYTE = 8;
   static constexpr size_t BITS_IN_NUM = sizeof(uintmax_t) * BITS_IN_BYTE;
 
diff --git a/libc/src/stdio/printf_core/printf_main.cpp b/libc/src/stdio/printf_core/printf_main.cpp
deleted file mode 100644
index bd4a5a168bd23e..00000000000000
--- a/libc/src/stdio/printf_core/printf_main.cpp
+++ /dev/null
@@ -1,43 +0,0 @@
-//===-- Starting point for printf -------------------------------*- C++ -*-===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#include "src/stdio/printf_core/printf_main.h"
-
-#include "src/__support/arg_list.h"
-#include "src/__support/macros/config.h"
-#include "src/stdio/printf_core/converter.h"
-#include "src/stdio/printf_core/core_structs.h"
-#include "src/stdio/printf_core/parser.h"
-#include "src/stdio/printf_core/writer.h"
-
-#include <stddef.h>
-
-namespace LIBC_NAMESPACE_DECL {
-namespace printf_core {
-
-int printf_main(Writer *writer, const char *__restrict str,
-                internal::ArgList &args) {
-  Parser<internal::ArgList> parser(str, args);
-  int result = 0;
-  fo...
[truncated]

nickdesaulniers

patch LGTM, but I'll defer to @michaelrj-google to check that the correct modes are used explicitly in each case.

I'm guessing this PR comes with a tradeoff to code size?

nickdesaulniers · 2024-10-08T17:19:57Z

libc/utils/gpu/server/rpc_server.cpp

@@ -167,8 +167,9 @@ static void handle_printf(rpc::Server::Port &port, TempStorage &temp_storage) {
      continue;

    char *buffer = temp_storage.alloc(buffer_size[lane]);
-    WriteBuffer wb(buffer, buffer_size[lane]);
-    Writer writer(&wb);
+    WriteBuffer<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW> wb(buffer,


If this enum is the most common, should we make it the default? Or is it better to be explicit every time?

Figured it's best to keep it explicit as they do widly different things depending on the use-case.

jhuber6 · 2024-10-08T17:31:58Z

patch LGTM, but I'll defer to @michaelrj-google to check that the correct modes are used explicitly in each case.

I'm guessing this PR comes with a tradeoff to code size?

Yes, since we put everything in the headers it'll now be copied between the unique users. however, most of these already go through a common case (i.e. vfprintf_internal) so I'm not too concerned. This makes it actually usable from a GPU context however, and allows for more optimizations.

michaelrj-google · 2024-10-08T21:53:55Z

I'm not a fan of templating the writer for basically the reasons described: It means that the converter code is duplicated for each printf variant, of which we currently have three but there may be more (see syslog and dprintf). Also, templating the converter just feels wrong to me since the converter isn't actually changing based on the template parameter.

That being said, I recognize that this is an important optimization for GPUs, so I have an alternate suggestion. Instead of templating the whole writer, only template overflow_write, then call it with a switch statement. This may seem like it's the same as before, but it has an important distinction: If we remove a case from the switch then that entire code path gets removed. Next, we add compile options to remove the cases that call the function pointer, and turn it on for GPUs, which makes it just a direct call.

Would that solution work for you?

jhuber6 · 2024-10-09T00:18:01Z

I'm not a fan of templating the writer for basically the reasons described: It means that the converter code is duplicated for each printf variant, of which we currently have three but there may be more (see syslog and dprintf). Also, templating the converter just feels wrong to me since the converter isn't actually changing based on the template parameter.

That being said, I recognize that this is an important optimization for GPUs, so I have an alternate suggestion. Instead of templating the whole writer, only template overflow_write, then call it with a switch statement. This may seem like it's the same as before, but it has an important distinction: If we remove a case from the switch then that entire code path gets removed. Next, we add compile options to remove the cases that call the function pointer, and turn it on for GPUs, which makes it just a direct call.

Would that solution work for you?

It's not just the function pointer, but letting the compiler do more optimizations results in much better results. Without this it takes several seconds to compile a use of sprintf because there's so much unnecessary code in the .o that's never used. With this it's just a few seconds. If I template the overflow write, then I'd still need to propagate it somehow from the use, I tried simply guarding everything and hoping that the optimizer would figure it out, but no luck.

michaelrj-google · 2024-10-09T00:28:39Z

I'm confused about the unnecessary code. The writer's pretty small, even including the parts that are only used for the function pointer side. Could it be that the linking is slow? The proposed solution makes everything header-only, making it one large .o file whereas currently the writer and converter are separate objects.

jhuber6 · 2024-10-09T01:20:59Z

I'm confused about the unnecessary code. The writer's pretty small, even including the parts that are only used for the function pointer side. Could it be that the linking is slow? The proposed solution makes everything header-only, making it one large .o file whereas currently the writer and converter are separate objects.

Yes, the link step is very slow on AMDGPU and NVPTX as they're very sensitive to large SCC's in the optimizer (people love to put quadratic optimizations in there). Checking the IR before and after this patch, it's 7500 lines of IR versus 4500 respectively with a 2.5s link time versus a 0.5s link time.

That being said, I definitely see your concern because the CPU libllvmlibc.a went from 2.8 MB to 4.6 MB before and after. I'm going to guess like 95% of that is the long double table.

michaelrj-google · 2024-10-09T20:09:18Z

The long double table isn't even included, it's going to be from multiple copies of the regular double table. I did some size analysis yesterday and the big thing that needs to happen for size improvement is moving the ryu tables into a .cpp file so that the same .o with the table is linked into everything that needs it, instead of it getting inlined repeatedly.

Even with that optimization I'm seeing about a 29 KiB increase in the size of the final linked binary when I have both snprintf and printf in my file (with -O3, only ~3 KiB with -O2). I've put instruction on replicating this at the bottom.

My main concern is this: This patch would make printf only support compile-time writer selection, but for CPU use cases it's better to do runtime writer selection. I think the best way to handle this is to allow GPUs to disable the writer modes that it doesn't need, which should hopefully solve your long compile times, while also allowing the CPU targets to have the behavior they want.

Here's how to replicate my size comparison:
Build libc:
cmake ../runtimes -G Ninja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_RUNTIMES="libc" -DCMAKE_BUILD_TYPE=Release -DLIBC_COMPILE_OPTIONS_DEFAULT="-O2"
ninja libc

building size_test.c:

#include <stdio.h>

int main() {
  char buff[100];
  snprintf(buff, 99, "%f", 123.456);
  printf("%s, %f\n", buff, 789.012);
}

(replace the -L path with the correct one for you)

clang size_test.c -static -Lllvm-project/build-overlay/libc/lib -lllvmlibc -O2 -o size_test_templated.out

(rebuild libc on main branch)

clang size_test.c -static -Lllvm-project/build-overlay/libc/lib -lllvmlibc -O2 -o size_test_main.out

Now for size comparison:

bloaty size_test_templated.out -- size_test_main.out 
    FILE SIZE        VM SIZE    
 --------------  -------------- 
   +87% +3.31Ki  [ = ]       0    [Unmapped]
  +0.1%    +704  +0.1%    +704    .text
  +0.3%     +88  [ = ]       0    .strtab
  +0.0%     +24  [ = ]       0    .symtab
  +0.5% +4.11Ki  +0.1%    +704    TOTAL

jhuber6 · 2024-10-09T20:13:03Z

I think the best way to handle this is to allow GPUs to disable the writer modes that it doesn't need, which should hopefully solve your long compile times, while also allowing the CPU targets to have the behavior they want.

The GPU target uses vasprintf. Maybe I could just make vasprintf just use snprintf and disable the others? Possible that could work since I think the stream case is unused due to printf being via RPC directly.

The compile time improvements, I'm pretty sure it's because it optimizes out unused formats better but I'd need to look into it more.

jhuber6 · 2024-10-09T20:45:52Z

Just for a simple call to sprintf, this patch has
VGPRs: 63
ScratchSize [bytes/lane]: 160
Dynamic Stack: False
Occupancy [waves/SIMD]: 16
Time to link 0.416s total

A version where I just deleted all the code associated with the other modes
VGPRs: 87
ScratchSize [bytes/lane]: 272
Dynamic Stack: False
Occupancy [waves/SIMD]: 10
Time to link 1.720s total

So this patch definitely does something to improve it, likely just running more optimizations on the conversion code. It's definitely unfortunate that it creates so much bloat because the tables are huge. The issue is that we're using templates on all the conversion functions, this is only necessary because it contains a writer which needs to know which callback to use. I wonder if there's some way to avoid the template directly on the conversion part, or to pull out part of it into a common implementation function.

michaelrj-google · 2024-10-10T17:37:34Z

Given what you're seeing, could you try a patch where you just move all of the printf_core code into headers without making any other changes? That might be the main thing making a difference.

jhuber6 · 2024-10-17T00:53:02Z

Accidentally closed this while clearing up some other useless branches.

michaelrj-google

From the most recent libc meeting I've been convinced that this patch is useful and we should move forward with it.

That being said there are some changes that to make before it lands:

Rebase and update the strftime internals, since those use the printf writer.
Create a template of Writer and WriteBuffer that does runtime dispatch based on an enum.
Create a compile flag that switches printf_main to take a specific templated writer and create a runtime dispatch version of it. A rough outline of what I'm thinking below:

#ifdef PRINTF_MAIN_RUNTIME_DISPATCH //name TBD
template <WriteMode write_mode>
int printf_main(Writer<write_mode> *writer, const char *__restrict str,
                internal::ArgList &args) {
  WriteBuffer<RUNTIME_DISPATCH> generic_wb(writer->wb.buff, ..., write_mode); //TODO: handle the fprintf overflow_write at the end
  Writer<RUNTIME_DISPATCH> generic_writer(generic_wb);
  Parser<internal::ArgList> parser(str, args);
  int result = 0;
  for (FormatSection cur_section = parser.get_next_section();
       !cur_section.raw_string.empty();
       cur_section = parser.get_next_section()) {
    if (cur_section.has_conv)
      result = convert(generic_writer, cur_section);
    else
      result = generic_writer->write(cur_section.raw_string);

    if (result < 0)
      return result;
  }

  return generic_writer->get_chars_written();
}

What do you think of this as a plan to move forward?

michaelrj-google · 2025-02-19T18:06:36Z

libc/src/stdio/printf_core/float_dec_converter.h

+// Class-template auto deduction helpers.
+FloatWriter(Writer<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW>, bool,
+            const PaddingWriter<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW>)
+    -> FloatWriter<WriteMode::FILL_BUFF_AND_DROP_OVERFLOW>;
+FloatWriter(Writer<WriteMode::RESIZE_AND_FILL_BUFF>, bool,
+            const PaddingWriter<WriteMode::RESIZE_AND_FILL_BUFF>)
+    -> FloatWriter<WriteMode::RESIZE_AND_FILL_BUFF>;
+FloatWriter(Writer<WriteMode::FLUSH_TO_STREAM>, bool,
+            const PaddingWriter<WriteMode::FLUSH_TO_STREAM>)
+    -> FloatWriter<WriteMode::FLUSH_TO_STREAM>;


Ideally I'd like to avoid having these since they seem likely to get forgotten when adding a new WriteMode. I may need to look into redesigning the FloatWriter.

It'll error without them I think, so a new one would be obvious.

jhuber6 · 2025-02-19T19:16:57Z

From the most recent libc meeting I've been convinced that this patch is useful and we should move forward with it.

That being said there are some changes that to make before it lands:
1. Rebase and update the `strftime` internals, since those use the printf writer.

2. Create a template of `Writer` and `WriteBuffer` that does runtime dispatch based on an enum.

3. Create a compile flag that switches `printf_main` to take a specific templated writer and create a runtime dispatch version of it. 
What do you think of this as a plan to move forward?

Does this address your concerns with binary bloat? This would allow me to move some of these into shared/ under a non-exported interface so I could then remove the weird GPU loader / exporting to offload situation as well. Plus, i think some people would like to use the printf internals in other places. (Is anything in shared/ besides rpc.h intended to be installed? If it's just for cross-project like libc++, clang, and offload then it should be fine to put whatever we want there with the right controls.)

michaelrj-google · 2025-02-19T19:21:35Z

Yes, having a dynamic dispatch option and making it the default for linux targets would address my concerns around binary bloat. Other targets can use the same mechanism if they want the same tradeoff.

The intent for the headers in /shared is that they provide a minimal interface for users within the LLVM project to reuse libc code. Ideally nothing from there should be installed, if it's installed it should go in /include. Given that, yes I'm okay with making some of the printf interface available through /shared.

jhuber6 · 2025-02-19T19:22:28Z

Yes, having a dynamic dispatch option and making it the default for linux targets would address my concerns around binary bloat. Other targets can use the same mechanism if they want the same tradeoff.

The intent for the headers in /shared is that they provide a minimal interface for users within the LLVM project to reuse libc code. Ideally nothing from there should be installed, if it's installed it should go in /include. Given that, yes I'm okay with making some of the printf interface available through /shared.

Alright, I'll try to pick this up again when I have time. Thanks.

jhuber6 · 2025-03-10T20:31:07Z

@michaelrj-google I've updated this as you requested (I believe). The default value for this is now runtime dispatch, with the GPU setting it to compile time in its config. This moves the printf handling so that it is header-only, allowing us to use printf stuff externally. I'll follow-up on that later.

Now, because this gets rid of the converter.cpp files, those files are now copied into every user of the converter interface. This bloats the .a reasonably, about 3.1 MB to 4.9 MB for the non-full build Linux. However, this is only the static library, the actual definitions are all weak so they should be merged by the linker. Had to do a little bit of a hack to make the runtime case use the same template value, but it works and avoids unnecessary duplication.

> llvm-readelf -s fprintf.cpp.o | grep '_ZN22__llvm_libc_21_0_0_git11printf_core15convert_inf_nanILNS0_9WriteModeE0EEEiPNS0_6WriterIXT_EEERKNS0_13FormatSectionE'
   145: 0000000000000000   487 FUNC    WEAK   HIDDEN    122 _ZN22__llvm_libc_21_0_0_git11printf_core15convert_inf_nanILNS0_9WriteModeE0EEEiPNS0_6WriterIXT_EEERKNS0_13FormatSectionE
> llvm-readelf -s asprintf.cpp.o | grep '_ZN22__llvm_libc_21_0_0_git11printf_core15convert_inf_nanILNS0_9WriteModeE0EEEiPNS0_6WriterIXT_EEERKNS0_13FormatSectionE' 
   148: 0000000000000000   487 FUNC    WEAK   HIDDEN    129 _ZN22__llvm_libc_21_0_0_git11printf_core15convert_inf_nanILNS0_9WriteModeE0EEEiPNS0_6WriterIXT_EEERKNS0_13FormatSectionE

github-actions · 2025-03-10T20:31:25Z

✅ With the latest revision this PR passed the C/C++ code formatter.

michaelrj-google

A brief explanation for anyone curious: I talked with Joseph about a new idea I had for templating the scanf reader which I think will work better than templating on an enum and having if constexprs for each option. My new design focuses on templating the reader/writer on a class which provides static methods for handling a filled/emptied buffer. This allows for straightforward static dispatch without needing to pull in the dependencies for every option. Dynamic dispatch would be handled by an indirection reader/writer that holds an enum+union or void* for the specific reader/writer.

I'm planning to move to that design in the future, but I'd prefer not to block the GPU runtime more than I have to so this patch may end up landing mostly as is, with a refactoring coming later.

libc/docs/configure.rst

libc/src/stdio/printf_core/writer.h

jhuber6 · 2025-03-11T21:18:31Z

I'm planning to move to that design in the future, but I'd prefer not to block the GPU runtime more than I have to so this patch may end up landing mostly as is, with a refactoring coming later.

There's definitely a lot of places that required the writer template that could be compressed, but mainly we just want the printf handling to be header only as well.

libc/src/stdio/printf_core/writer.h

Summary: Currently we dispatch the writing mode off of a runtime enum passed in by the constructor. This causes very unfortunate codegen for the GPU targets where we get worst-case codegen because of the unused function pointer for `sprintf`. Instead, this patch moves all of this to a template so it can be masked out. This results in no dynamic stack and uses 60 VGPRs instead of 117. It also compiles about 5x as fast. WIP Update

michaelrj-google

LGTM, tests pass locally

This allows specializing the implementation for different targets without including unnecessary logic and is similar to llvm#111559 which did the same for printf Writer interface.

PiJoules · 2025-03-13T18:02:53Z

It looks like this broke our builders at https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8720577119613178561/overview

[706/2837](61) Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj
FAILED: libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj 
/b/s/w/ir/x/w/llvm_build/./bin/clang++ --target=armv6m-none-eabi -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -I/b/s/w/ir/x/w/llvm-llvm-project/libc -isystem /b/s/w/ir/x/w/llvm_build/include/armv6m-unknown-none-eabi --target=armv6m-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" -D_LIBCPP_PRINT=1 -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-armv6m-none-eabi-bins=../../../llvm-llvm-project -ffile-prefix-map=/b/s/w/ir/x/w/llvm-llvm-project/= -no-canonical-prefixes -Os -DNDEBUG -std=gnu++17 --target=armv6m-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -DLIBC_COPT_PRINTF_DISABLE_FLOAT -DLIBC_COPT_PRINTF_DISABLE_INDEX_MODE -DLIBC_COPT_PRINTF_DISABLE_WRITE_INT -DLIBC_COPT_PRINTF_DISABLE_FIXED_POINT -DLIBC_COPT_PRINTF_DISABLE_STRERROR -DLIBC_COPT_PRINTF_RUNTIME_DISPATCH -MD -MT libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj -MF libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj.d -o libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj -c /b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp:41:28: error: no viable constructor or deduction guide for deduction of template arguments of 'printf_core::WriteBuffer'
   41 |   printf_core::WriteBuffer wb(buffer, BUFF_SIZE, &raw_write_hook, nullptr);
      |                            ^
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:57:15: note: candidate template ignored: couldn't infer template argument 'write_mode'
   57 |   LIBC_INLINE WriteBuffer(char *buff, size_t buff_len, StreamWriter hook,
      |               ^
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:57:15: note: implicit deduction guide declared as 'template <WriteMode write_mode> WriteBuffer(char *buff, size_t buff_len, StreamWriter hook, void *target) -> WriteBuffer<write_mode>'
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:67:15: note: candidate function template not viable: requires 3 arguments, but 4 were provided
   67 |   LIBC_INLINE WriteBuffer(char *buff, size_t buff_len, StreamWriter hook)
      |               ^           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:67:15: note: implicit deduction guide declared as 'template <WriteMode write_mode> WriteBuffer(char *buff, size_t buff_len, StreamWriter hook) -> WriteBuffer<write_mode>'
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:62:15: note: candidate function template not viable: requires 2 arguments, but 4 were provided
   62 |   LIBC_INLINE WriteBuffer(char *buff, size_t buff_len)
      |               ^           ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:62:15: note: implicit deduction guide declared as 'template <WriteMode write_mode> WriteBuffer(char *buff, size_t buff_len) -> WriteBuffer<write_mode>'
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:41:40: note: candidate function template not viable: requires 1 argument, but 4 were provided
   41 | template <WriteMode write_mode> struct WriteBuffer {
      |                                        ^~~~~~~~~~~
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:41:40: note: implicit deduction guide declared as 'template <WriteMode write_mode> WriteBuffer(WriteBuffer<write_mode>) -> WriteBuffer<write_mode>'
1 error generated.

Would you be able to take a look and send out a fix or revert? Thanks.

jhuber6 · 2025-03-13T18:27:57Z

Hm, I'm wondering why the baremetal is seeing this but not any other target. I don't have much experience there, could you tell me what's different? There should be some template autodeduction helpers. But maybe I accidentally changed those after adding the runtime value back in?

michaelrj-google · 2025-03-13T19:26:34Z

You forgot to update the initialization in the baremetal printf.cpp:

llvm-project/libc/src/stdio/baremetal/printf.cpp

Line 41 in 9d7e1d9

printf_core::WriteBuffer wb(buffer, BUFF_SIZE, &raw_write_hook, nullptr);

I'll fix it

jhuber6 · 2025-03-13T19:27:48Z

Should be fixed already, i merged the PR that did it.

michaelrj-google · 2025-03-13T19:29:42Z

Oh huh, the test failures will probably be fixed by cherry-picking 6fea340 then

PiJoules · 2025-03-13T21:26:19Z

I think that fix led to a different failure mode

[709/2837](61) Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj
FAILED: libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj 
/b/s/w/ir/x/w/llvm_build/./bin/clang++ --target=armv6m-none-eabi -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -I/b/s/w/ir/x/w/llvm-llvm-project/libc -isystem /b/s/w/ir/x/w/llvm_build/include/armv6m-unknown-none-eabi --target=armv6m-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" -D_LIBCPP_PRINT=1 -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-armv6m-none-eabi-bins=../../../llvm-llvm-project -ffile-prefix-map=/b/s/w/ir/x/w/llvm-llvm-project/= -no-canonical-prefixes -Os -DNDEBUG -std=gnu++17 --target=armv6m-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -DLIBC_COPT_PRINTF_DISABLE_FLOAT -DLIBC_COPT_PRINTF_DISABLE_INDEX_MODE -DLIBC_COPT_PRINTF_DISABLE_WRITE_INT -DLIBC_COPT_PRINTF_DISABLE_FIXED_POINT -DLIBC_COPT_PRINTF_DISABLE_STRERROR -DLIBC_COPT_PRINTF_RUNTIME_DISPATCH -MD -MT libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj -MF libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj.d -o libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj -c /b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp:41:33: error: use of undeclared identifier 'WriteMode'; did you mean 'printf_core::WriteMode'?
   41 |   printf_core::WriteBuffer<Mode<WriteMode::FLUSH_TO_STREAM>::value> wb(
      |                                 ^~~~~~~~~
      |                                 printf_core::WriteMode
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:24:12: note: 'printf_core::WriteMode' declared here
   24 | enum class WriteMode {
      |            ^
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp:41:28: error: no template named 'Mode'; did you mean 'printf_core::Mode'?
   41 |   printf_core::WriteBuffer<Mode<WriteMode::FLUSH_TO_STREAM>::value> wb(
      |                            ^~~~
      |                            printf_core::Mode
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:33:40: note: 'printf_core::Mode' declared here
   33 | template <WriteMode write_mode> struct Mode {
      |                                        ^

Seen on https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8720479269949913185/overview which includes 6fea340.

jhuber6 · 2025-03-13T21:27:30Z

I think that fix led to a different failure mode

[709/2837](61) Building CXX object libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj
FAILED: libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj 
/b/s/w/ir/x/w/llvm_build/./bin/clang++ --target=armv6m-none-eabi -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -I/b/s/w/ir/x/w/llvm-llvm-project/libc -isystem /b/s/w/ir/x/w/llvm_build/include/armv6m-unknown-none-eabi --target=armv6m-none-eabi -Wno-atomic-alignment "-Dvfprintf(stream, format, vlist)=vprintf(format, vlist)" "-Dfprintf(stream, format, ...)=printf(format)" -D_LIBCPP_PRINT=1 -mthumb -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -ffunction-sections -fdata-sections -ffile-prefix-map=/b/s/w/ir/x/w/llvm_build/runtimes/runtimes-armv6m-none-eabi-bins=../../../llvm-llvm-project -ffile-prefix-map=/b/s/w/ir/x/w/llvm-llvm-project/= -no-canonical-prefixes -Os -DNDEBUG -std=gnu++17 --target=armv6m-none-eabi -DLIBC_QSORT_IMPL=LIBC_QSORT_HEAP_SORT -DLIBC_TYPES_TIME_T_IS_32_BIT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES)" -fpie -ffreestanding -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -DLIBC_COPT_PRINTF_DISABLE_FLOAT -DLIBC_COPT_PRINTF_DISABLE_INDEX_MODE -DLIBC_COPT_PRINTF_DISABLE_WRITE_INT -DLIBC_COPT_PRINTF_DISABLE_FIXED_POINT -DLIBC_COPT_PRINTF_DISABLE_STRERROR -DLIBC_COPT_PRINTF_RUNTIME_DISPATCH -MD -MT libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj -MF libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj.d -o libc/src/stdio/baremetal/CMakeFiles/libc.src.stdio.baremetal.printf.dir/printf.cpp.obj -c /b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp:41:33: error: use of undeclared identifier 'WriteMode'; did you mean 'printf_core::WriteMode'?
   41 |   printf_core::WriteBuffer<Mode<WriteMode::FLUSH_TO_STREAM>::value> wb(
      |                                 ^~~~~~~~~
      |                                 printf_core::WriteMode
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:24:12: note: 'printf_core::WriteMode' declared here
   24 | enum class WriteMode {
      |            ^
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/baremetal/printf.cpp:41:28: error: no template named 'Mode'; did you mean 'printf_core::Mode'?
   41 |   printf_core::WriteBuffer<Mode<WriteMode::FLUSH_TO_STREAM>::value> wb(
      |                            ^~~~
      |                            printf_core::Mode
/b/s/w/ir/x/w/llvm-llvm-project/libc/src/stdio/printf_core/writer.h:33:40: note: 'printf_core::Mode' declared here
   33 | template <WriteMode write_mode> struct Mode {
      |                                        ^

Seen on https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8720479269949913185/overview which includes 6fea340.

That looks trivial to fix, could you verify it since you have a local build working?

This addresses build errors from llvm#111559.

PiJoules · 2025-03-13T22:26:40Z

I think #131232 catches the rest of them. I can build the toolchain locally with it.

This addresses build errors from #111559.

…1232) This addresses build errors from llvm/llvm-project#111559.

This allows specializing the implementation for different targets without including unnecessary logic and is similar to #111559 which did the same for printf Writer interface.

Summary: Currently we dispatch the writing mode off of a runtime enum passed in by the constructor. This causes very unfortunate codegen for the GPU targets where we get worst-case codegen because of the unused function pointer for `sprintf`. Instead, this patch moves all of this to a template so it can be masked out. This results in no dynamic stack and uses 60 VGPRs instead of 117. It also compiles about 5x as fast.

This addresses build errors from llvm#111559.

jhuber6 requested review from JonChesterfield, lntue, michaelrj-google, nickdesaulniers and SchrodingerZhu October 8, 2024 17:06

llvmbot added the libc label Oct 8, 2024

nickdesaulniers reviewed Oct 8, 2024

View reviewed changes

jhuber6 closed this Oct 14, 2024

jhuber6 deleted the printf_templates branch October 14, 2024 19:20

jhuber6 restored the printf_templates branch October 17, 2024 00:52

jhuber6 reopened this Oct 17, 2024

michaelrj-google reviewed Feb 19, 2025

View reviewed changes

jhuber6 force-pushed the printf_templates branch from c7394d2 to 5286d35 Compare March 10, 2025 20:27

llvmbot added the backend:AMDGPU label Mar 10, 2025

jhuber6 force-pushed the printf_templates branch from 5286d35 to ca4de1b Compare March 10, 2025 20:31

michaelrj-google reviewed Mar 11, 2025

View reviewed changes

libc/docs/configure.rst Outdated Show resolved Hide resolved

libc/src/stdio/printf_core/writer.h Outdated Show resolved Hide resolved

jhuber6 force-pushed the printf_templates branch from ca4de1b to aa832dd Compare March 11, 2025 18:30

michaelrj-google reviewed Mar 12, 2025

View reviewed changes

libc/src/stdio/printf_core/writer.h Outdated Show resolved Hide resolved

jhuber6 added 2 commits March 12, 2025 12:14

comments

0bbcf14

jhuber6 force-pushed the printf_templates branch from aa832dd to 0bbcf14 Compare March 12, 2025 17:20

michaelrj-google approved these changes Mar 12, 2025

View reviewed changes

jhuber6 merged commit 598e882 into llvm:main Mar 12, 2025
15 of 16 checks passed

petrhosek mentioned this pull request Mar 12, 2025

[libc] Templatize the scanf Reader interface #131037

Merged

PiJoules added a commit to PiJoules/llvm-project that referenced this pull request Mar 13, 2025

[stdio][baremetal] Fix templating for print functions

c900173

This addresses build errors from llvm#111559.

PiJoules mentioned this pull request Mar 13, 2025

[stdio][baremetal] Fix templating for print functions #131232

Merged

PiJoules added a commit that referenced this pull request Mar 13, 2025

[stdio][baremetal] Fix templating for print functions (#131232)

662bd4c

This addresses build errors from #111559.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 13, 2025

Automerge: [stdio][baremetal] Fix templating for print functions (#13…

379d513

…1232) This addresses build errors from llvm/llvm-project#111559.

frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025

[stdio][baremetal] Fix templating for print functions (llvm#131232)

272db00

This addresses build errors from llvm#111559.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc] Template the writing mode for the writer class #111559

[libc] Template the writing mode for the writer class #111559

jhuber6 commented Oct 8, 2024

llvmbot commented Oct 8, 2024 •

edited

Loading

nickdesaulniers left a comment

nickdesaulniers Oct 8, 2024

jhuber6 Oct 8, 2024

jhuber6 commented Oct 8, 2024

michaelrj-google commented Oct 8, 2024

jhuber6 commented Oct 9, 2024

michaelrj-google commented Oct 9, 2024

jhuber6 commented Oct 9, 2024

michaelrj-google commented Oct 9, 2024

jhuber6 commented Oct 9, 2024 •

edited

Loading

jhuber6 commented Oct 9, 2024

michaelrj-google commented Oct 10, 2024

jhuber6 commented Oct 17, 2024

michaelrj-google left a comment

michaelrj-google Feb 19, 2025

jhuber6 Feb 19, 2025 •

edited

Loading

jhuber6 commented Feb 19, 2025

michaelrj-google commented Feb 19, 2025

jhuber6 commented Feb 19, 2025

jhuber6 commented Mar 10, 2025

github-actions bot commented Mar 10, 2025 •

edited

Loading

michaelrj-google left a comment

jhuber6 commented Mar 11, 2025

michaelrj-google left a comment

PiJoules commented Mar 13, 2025

jhuber6 commented Mar 13, 2025

michaelrj-google commented Mar 13, 2025

jhuber6 commented Mar 13, 2025

michaelrj-google commented Mar 13, 2025

PiJoules commented Mar 13, 2025

jhuber6 commented Mar 13, 2025

PiJoules commented Mar 13, 2025

[libc] Template the writing mode for the writer class #111559

[libc] Template the writing mode for the writer class #111559

Conversation

jhuber6 commented Oct 8, 2024

llvmbot commented Oct 8, 2024 • edited Loading

nickdesaulniers left a comment

Choose a reason for hiding this comment

nickdesaulniers Oct 8, 2024

Choose a reason for hiding this comment

jhuber6 Oct 8, 2024

Choose a reason for hiding this comment

jhuber6 commented Oct 8, 2024

michaelrj-google commented Oct 8, 2024

jhuber6 commented Oct 9, 2024

michaelrj-google commented Oct 9, 2024

jhuber6 commented Oct 9, 2024

michaelrj-google commented Oct 9, 2024

jhuber6 commented Oct 9, 2024 • edited Loading

jhuber6 commented Oct 9, 2024

michaelrj-google commented Oct 10, 2024

jhuber6 commented Oct 17, 2024

michaelrj-google left a comment

Choose a reason for hiding this comment

michaelrj-google Feb 19, 2025

Choose a reason for hiding this comment

jhuber6 Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

jhuber6 commented Feb 19, 2025

michaelrj-google commented Feb 19, 2025

jhuber6 commented Feb 19, 2025

jhuber6 commented Mar 10, 2025

github-actions bot commented Mar 10, 2025 • edited Loading

michaelrj-google left a comment

Choose a reason for hiding this comment

jhuber6 commented Mar 11, 2025

michaelrj-google left a comment

Choose a reason for hiding this comment

PiJoules commented Mar 13, 2025

jhuber6 commented Mar 13, 2025

michaelrj-google commented Mar 13, 2025

jhuber6 commented Mar 13, 2025

michaelrj-google commented Mar 13, 2025

PiJoules commented Mar 13, 2025

jhuber6 commented Mar 13, 2025

PiJoules commented Mar 13, 2025

llvmbot commented Oct 8, 2024 •

edited

Loading

jhuber6 commented Oct 9, 2024 •

edited

Loading

jhuber6 Feb 19, 2025 •

edited

Loading

github-actions bot commented Mar 10, 2025 •

edited

Loading