Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] Refactor the Windows and MinGW implementation of the locale base API #115752

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Nov 11, 2024

This patch reimplements the locale base support for Windows flavors in a way that is more modules-friendly and without defining non-internal names.

Since this changes the name of some types and entry points in the built library, this is effectively an ABI break on Windows (which is acceptable since we don't promise ABI stability on that platform).

@ldionne ldionne requested a review from a team as a code owner November 11, 2024 18:23
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Nov 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2024

@llvm/pr-subscribers-github-workflow

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

This patch reimplements the locale base support for Windows flavors in a way that is more modules-friendly and without defining non-internal names.

Since this changes the name of some types and entry points in the built library, this is effectively an ABI break on Windows (which is acceptable since we don't promise ABI stability on that platform).


Patch is 35.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115752.diff

8 Files Affected:

  • (modified) libcxx/docs/ReleaseNotes/20.rst (+3)
  • (modified) libcxx/include/CMakeLists.txt (+1-1)
  • (modified) libcxx/include/__locale_dir/locale_base_api.h (+3-3)
  • (removed) libcxx/include/__locale_dir/locale_base_api/win32.h (-235)
  • (added) libcxx/include/__locale_dir/support/windows.h (+279)
  • (modified) libcxx/include/module.modulemap (+1-1)
  • (modified) libcxx/src/support/win32/locale_win32.cpp (+115-68)
  • (modified) libcxx/src/support/win32/support.cpp (-33)
diff --git a/libcxx/docs/ReleaseNotes/20.rst b/libcxx/docs/ReleaseNotes/20.rst
index 9039c6f046445b..62dbebd6801bca 100644
--- a/libcxx/docs/ReleaseNotes/20.rst
+++ b/libcxx/docs/ReleaseNotes/20.rst
@@ -141,6 +141,9 @@ ABI Affecting Changes
   - When using the MSVC ABI, this change results in some classes having a completely different memory layout, so this is
     a genuine ABI break. However, the library does not currently guarantee ABI stability on MSVC platforms.
 
+- The localization support base API has been reimplemented, leading to different functions being exported from the
+  libc++ built library on Windows and Windows-like platforms.
+
 Build System Changes
 --------------------
 
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 6dd392685c18ee..6364f2960a90c4 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -501,11 +501,11 @@ set(files
   __locale_dir/locale_base_api/ibm.h
   __locale_dir/locale_base_api/musl.h
   __locale_dir/locale_base_api/openbsd.h
-  __locale_dir/locale_base_api/win32.h
   __locale_dir/locale_guard.h
   __locale_dir/support/apple.h
   __locale_dir/support/bsd_like.h
   __locale_dir/support/freebsd.h
+  __locale_dir/support/windows.h
   __math/abs.h
   __math/copysign.h
   __math/error_functions.h
diff --git a/libcxx/include/__locale_dir/locale_base_api.h b/libcxx/include/__locale_dir/locale_base_api.h
index 5cbe91207ca74e..6042eb1416667e 100644
--- a/libcxx/include/__locale_dir/locale_base_api.h
+++ b/libcxx/include/__locale_dir/locale_base_api.h
@@ -98,15 +98,15 @@
 #  include <__locale_dir/support/apple.h>
 #elif defined(__FreeBSD__)
 #  include <__locale_dir/support/freebsd.h>
+#elif defined(_LIBCPP_MSVCRT_LIKE)
+#  include <__locale_dir/support/windows.h>
 #else
 
 // TODO: This is a temporary definition to bridge between the old way we defined the locale base API
 //       (by providing global non-reserved names) and the new API. As we move individual platforms
 //       towards the new way of defining the locale base API, this should disappear since each platform
 //       will define those directly.
-#  if defined(_LIBCPP_MSVCRT_LIKE)
-#    include <__locale_dir/locale_base_api/win32.h>
-#  elif defined(_AIX) || defined(__MVS__)
+#  if defined(_AIX) || defined(__MVS__)
 #    include <__locale_dir/locale_base_api/ibm.h>
 #  elif defined(__ANDROID__)
 #    include <__locale_dir/locale_base_api/android.h>
diff --git a/libcxx/include/__locale_dir/locale_base_api/win32.h b/libcxx/include/__locale_dir/locale_base_api/win32.h
deleted file mode 100644
index f488a0dc0d69b3..00000000000000
--- a/libcxx/include/__locale_dir/locale_base_api/win32.h
+++ /dev/null
@@ -1,235 +0,0 @@
-// -*- C++ -*-
-//===-----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef _LIBCPP___LOCALE_DIR_LOCALE_BASE_API_WIN32_H
-#define _LIBCPP___LOCALE_DIR_LOCALE_BASE_API_WIN32_H
-
-#include <__config>
-#include <cstddef>
-#include <locale.h> // _locale_t
-#include <stdio.h>
-#include <string>
-
-#define _X_ALL LC_ALL
-#define _X_COLLATE LC_COLLATE
-#define _X_CTYPE LC_CTYPE
-#define _X_MONETARY LC_MONETARY
-#define _X_NUMERIC LC_NUMERIC
-#define _X_TIME LC_TIME
-#define _X_MAX LC_MAX
-#define _X_MESSAGES 6
-#define _NCAT (_X_MESSAGES + 1)
-
-#define _CATMASK(n) ((1 << (n)) >> 1)
-#define _M_COLLATE _CATMASK(_X_COLLATE)
-#define _M_CTYPE _CATMASK(_X_CTYPE)
-#define _M_MONETARY _CATMASK(_X_MONETARY)
-#define _M_NUMERIC _CATMASK(_X_NUMERIC)
-#define _M_TIME _CATMASK(_X_TIME)
-#define _M_MESSAGES _CATMASK(_X_MESSAGES)
-#define _M_ALL (_CATMASK(_NCAT) - 1)
-
-#define LC_COLLATE_MASK _M_COLLATE
-#define LC_CTYPE_MASK _M_CTYPE
-#define LC_MONETARY_MASK _M_MONETARY
-#define LC_NUMERIC_MASK _M_NUMERIC
-#define LC_TIME_MASK _M_TIME
-#define LC_MESSAGES_MASK _M_MESSAGES
-#define LC_ALL_MASK                                                                                                    \
-  (LC_COLLATE_MASK | LC_CTYPE_MASK | LC_MESSAGES_MASK | LC_MONETARY_MASK | LC_NUMERIC_MASK | LC_TIME_MASK)
-
-class __lconv_storage {
-public:
-  __lconv_storage(const lconv* __lc_input) {
-    __lc_ = *__lc_input;
-
-    __decimal_point_     = __lc_input->decimal_point;
-    __thousands_sep_     = __lc_input->thousands_sep;
-    __grouping_          = __lc_input->grouping;
-    __int_curr_symbol_   = __lc_input->int_curr_symbol;
-    __currency_symbol_   = __lc_input->currency_symbol;
-    __mon_decimal_point_ = __lc_input->mon_decimal_point;
-    __mon_thousands_sep_ = __lc_input->mon_thousands_sep;
-    __mon_grouping_      = __lc_input->mon_grouping;
-    __positive_sign_     = __lc_input->positive_sign;
-    __negative_sign_     = __lc_input->negative_sign;
-
-    __lc_.decimal_point     = const_cast<char*>(__decimal_point_.c_str());
-    __lc_.thousands_sep     = const_cast<char*>(__thousands_sep_.c_str());
-    __lc_.grouping          = const_cast<char*>(__grouping_.c_str());
-    __lc_.int_curr_symbol   = const_cast<char*>(__int_curr_symbol_.c_str());
-    __lc_.currency_symbol   = const_cast<char*>(__currency_symbol_.c_str());
-    __lc_.mon_decimal_point = const_cast<char*>(__mon_decimal_point_.c_str());
-    __lc_.mon_thousands_sep = const_cast<char*>(__mon_thousands_sep_.c_str());
-    __lc_.mon_grouping      = const_cast<char*>(__mon_grouping_.c_str());
-    __lc_.positive_sign     = const_cast<char*>(__positive_sign_.c_str());
-    __lc_.negative_sign     = const_cast<char*>(__negative_sign_.c_str());
-  }
-
-  lconv* __get() { return &__lc_; }
-
-private:
-  lconv __lc_;
-  std::string __decimal_point_;
-  std::string __thousands_sep_;
-  std::string __grouping_;
-  std::string __int_curr_symbol_;
-  std::string __currency_symbol_;
-  std::string __mon_decimal_point_;
-  std::string __mon_thousands_sep_;
-  std::string __mon_grouping_;
-  std::string __positive_sign_;
-  std::string __negative_sign_;
-};
-
-class locale_t {
-public:
-  locale_t() : __locale_(nullptr), __locale_str_(nullptr), __lc_(nullptr) {}
-  locale_t(std::nullptr_t) : __locale_(nullptr), __locale_str_(nullptr), __lc_(nullptr) {}
-  locale_t(_locale_t __xlocale, const char* __xlocale_str)
-      : __locale_(__xlocale), __locale_str_(__xlocale_str), __lc_(nullptr) {}
-  locale_t(const locale_t& __l) : __locale_(__l.__locale_), __locale_str_(__l.__locale_str_), __lc_(nullptr) {}
-
-  ~locale_t() { delete __lc_; }
-
-  locale_t& operator=(const locale_t& __l) {
-    __locale_     = __l.__locale_;
-    __locale_str_ = __l.__locale_str_;
-    // __lc_ not copied
-    return *this;
-  }
-
-  friend bool operator==(const locale_t& __left, const locale_t& __right) {
-    return __left.__locale_ == __right.__locale_;
-  }
-
-  friend bool operator==(const locale_t& __left, int __right) { return __left.__locale_ == nullptr && __right == 0; }
-
-  friend bool operator==(const locale_t& __left, long long __right) {
-    return __left.__locale_ == nullptr && __right == 0;
-  }
-
-  friend bool operator==(const locale_t& __left, std::nullptr_t) { return __left.__locale_ == nullptr; }
-
-  friend bool operator==(int __left, const locale_t& __right) { return __left == 0 && nullptr == __right.__locale_; }
-
-  friend bool operator==(std::nullptr_t, const locale_t& __right) { return nullptr == __right.__locale_; }
-
-  friend bool operator!=(const locale_t& __left, const locale_t& __right) { return !(__left == __right); }
-
-  friend bool operator!=(const locale_t& __left, int __right) { return !(__left == __right); }
-
-  friend bool operator!=(const locale_t& __left, long long __right) { return !(__left == __right); }
-
-  friend bool operator!=(const locale_t& __left, std::nullptr_t __right) { return !(__left == __right); }
-
-  friend bool operator!=(int __left, const locale_t& __right) { return !(__left == __right); }
-
-  friend bool operator!=(std::nullptr_t __left, const locale_t& __right) { return !(__left == __right); }
-
-  operator bool() const { return __locale_ != nullptr; }
-
-  const char* __get_locale() const { return __locale_str_; }
-
-  operator _locale_t() const { return __locale_; }
-
-  lconv* __store_lconv(const lconv* __input_lc) {
-    delete __lc_;
-    __lc_ = new __lconv_storage(__input_lc);
-    return __lc_->__get();
-  }
-
-private:
-  _locale_t __locale_;
-  const char* __locale_str_;
-  __lconv_storage* __lc_ = nullptr;
-};
-
-// Locale management functions
-#define freelocale _free_locale
-// FIXME: base currently unused. Needs manual work to construct the new locale
-locale_t newlocale(int __mask, const char* __locale, locale_t __base);
-// uselocale can't be implemented on Windows because Windows allows partial modification
-// of thread-local locale and so _get_current_locale() returns a copy while uselocale does
-// not create any copies.
-// We can still implement raii even without uselocale though.
-
-lconv* localeconv_l(locale_t& __loc);
-size_t mbrlen_l(const char* __restrict __s, size_t __n, mbstate_t* __restrict __ps, locale_t __loc);
-size_t mbsrtowcs_l(
-    wchar_t* __restrict __dst, const char** __restrict __src, size_t __len, mbstate_t* __restrict __ps, locale_t __loc);
-size_t wcrtomb_l(char* __restrict __s, wchar_t __wc, mbstate_t* __restrict __ps, locale_t __loc);
-size_t mbrtowc_l(
-    wchar_t* __restrict __pwc, const char* __restrict __s, size_t __n, mbstate_t* __restrict __ps, locale_t __loc);
-size_t mbsnrtowcs_l(wchar_t* __restrict __dst,
-                    const char** __restrict __src,
-                    size_t __nms,
-                    size_t __len,
-                    mbstate_t* __restrict __ps,
-                    locale_t __loc);
-size_t wcsnrtombs_l(char* __restrict __dst,
-                    const wchar_t** __restrict __src,
-                    size_t __nwc,
-                    size_t __len,
-                    mbstate_t* __restrict __ps,
-                    locale_t __loc);
-wint_t btowc_l(int __c, locale_t __loc);
-int wctob_l(wint_t __c, locale_t __loc);
-
-decltype(MB_CUR_MAX) MB_CUR_MAX_L(locale_t __l);
-
-// the *_l functions are prefixed on Windows, only available for msvcr80+, VS2005+
-#define mbtowc_l _mbtowc_l
-#define strtoll_l _strtoi64_l
-#define strtoull_l _strtoui64_l
-#define strtod_l _strtod_l
-#if defined(_LIBCPP_MSVCRT)
-#  define strtof_l _strtof_l
-#  define strtold_l _strtold_l
-#else
-_LIBCPP_EXPORTED_FROM_ABI float strtof_l(const char*, char**, locale_t);
-_LIBCPP_EXPORTED_FROM_ABI long double strtold_l(const char*, char**, locale_t);
-#endif
-inline _LIBCPP_HIDE_FROM_ABI int islower_l(int __c, _locale_t __loc) { return _islower_l((int)__c, __loc); }
-
-inline _LIBCPP_HIDE_FROM_ABI int isupper_l(int __c, _locale_t __loc) { return _isupper_l((int)__c, __loc); }
-
-#define isdigit_l _isdigit_l
-#define isxdigit_l _isxdigit_l
-#define strcoll_l _strcoll_l
-#define strxfrm_l _strxfrm_l
-#define wcscoll_l _wcscoll_l
-#define wcsxfrm_l _wcsxfrm_l
-#define toupper_l _toupper_l
-#define tolower_l _tolower_l
-#define iswspace_l _iswspace_l
-#define iswprint_l _iswprint_l
-#define iswcntrl_l _iswcntrl_l
-#define iswupper_l _iswupper_l
-#define iswlower_l _iswlower_l
-#define iswalpha_l _iswalpha_l
-#define iswdigit_l _iswdigit_l
-#define iswpunct_l _iswpunct_l
-#define iswxdigit_l _iswxdigit_l
-#define towupper_l _towupper_l
-#define towlower_l _towlower_l
-#if defined(__MINGW32__) && __MSVCRT_VERSION__ < 0x0800
-_LIBCPP_EXPORTED_FROM_ABI size_t strftime_l(char* ret, size_t n, const char* format, const struct tm* tm, locale_t loc);
-#else
-#  define strftime_l _strftime_l
-#endif
-#define sscanf_l(__s, __l, __f, ...) _sscanf_l(__s, __f, __l, __VA_ARGS__)
-_LIBCPP_EXPORTED_FROM_ABI int snprintf_l(char* __ret, size_t __n, locale_t __loc, const char* __format, ...);
-_LIBCPP_EXPORTED_FROM_ABI int asprintf_l(char** __ret, locale_t __loc, const char* __format, ...);
-_LIBCPP_EXPORTED_FROM_ABI int vasprintf_l(char** __ret, locale_t __loc, const char* __format, va_list __ap);
-
-// not-so-pressing FIXME: use locale to determine blank characters
-inline int iswblank_l(wint_t __c, locale_t /*loc*/) { return (__c == L' ' || __c == L'\t'); }
-
-#endif // _LIBCPP___LOCALE_DIR_LOCALE_BASE_API_WIN32_H
diff --git a/libcxx/include/__locale_dir/support/windows.h b/libcxx/include/__locale_dir/support/windows.h
new file mode 100644
index 00000000000000..aace64c317b2b4
--- /dev/null
+++ b/libcxx/include/__locale_dir/support/windows.h
@@ -0,0 +1,279 @@
+//===-----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___LOCALE_DIR_SUPPORT_WINDOWS_H
+#define _LIBCPP___LOCALE_DIR_SUPPORT_WINDOWS_H
+
+#include <__config>
+#include <__cstddef/nullptr_t.h>
+#include <__utility/forward.h>
+#include <clocale> // std::lconv & friends
+#include <cstddef>
+#include <ctype.h>  // ::_isupper_l & friends
+#include <locale.h> // ::_locale_t
+#include <stdio.h>  // ::_sscanf_l
+#include <stdlib.h> // ::_strtod_l & friends
+#include <string.h> // ::_strcoll_l
+#include <string>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+#define _CATMASK(n) ((1 << (n)) >> 1)
+#define LC_COLLATE_MASK _CATMASK(LC_COLLATE)
+#define LC_CTYPE_MASK _CATMASK(LC_CTYPE)
+#define LC_MONETARY_MASK _CATMASK(LC_MONETARY)
+#define LC_NUMERIC_MASK _CATMASK(LC_NUMERIC)
+#define LC_TIME_MASK _CATMASK(LC_TIME)
+#define LC_MESSAGES_MASK _CATMASK(6)
+#define LC_ALL_MASK                                                                                                    \
+  (LC_COLLATE_MASK | LC_CTYPE_MASK | LC_MESSAGES_MASK | LC_MONETARY_MASK | LC_NUMERIC_MASK | LC_TIME_MASK)
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+namespace __locale {
+
+class __lconv_storage {
+public:
+  __lconv_storage(const lconv* __lc_input) {
+    __lc_ = *__lc_input;
+
+    __decimal_point_     = __lc_input->decimal_point;
+    __thousands_sep_     = __lc_input->thousands_sep;
+    __grouping_          = __lc_input->grouping;
+    __int_curr_symbol_   = __lc_input->int_curr_symbol;
+    __currency_symbol_   = __lc_input->currency_symbol;
+    __mon_decimal_point_ = __lc_input->mon_decimal_point;
+    __mon_thousands_sep_ = __lc_input->mon_thousands_sep;
+    __mon_grouping_      = __lc_input->mon_grouping;
+    __positive_sign_     = __lc_input->positive_sign;
+    __negative_sign_     = __lc_input->negative_sign;
+
+    __lc_.decimal_point     = const_cast<char*>(__decimal_point_.c_str());
+    __lc_.thousands_sep     = const_cast<char*>(__thousands_sep_.c_str());
+    __lc_.grouping          = const_cast<char*>(__grouping_.c_str());
+    __lc_.int_curr_symbol   = const_cast<char*>(__int_curr_symbol_.c_str());
+    __lc_.currency_symbol   = const_cast<char*>(__currency_symbol_.c_str());
+    __lc_.mon_decimal_point = const_cast<char*>(__mon_decimal_point_.c_str());
+    __lc_.mon_thousands_sep = const_cast<char*>(__mon_thousands_sep_.c_str());
+    __lc_.mon_grouping      = const_cast<char*>(__mon_grouping_.c_str());
+    __lc_.positive_sign     = const_cast<char*>(__positive_sign_.c_str());
+    __lc_.negative_sign     = const_cast<char*>(__negative_sign_.c_str());
+  }
+
+  std::lconv* __get() { return &__lc_; }
+
+private:
+  std::lconv __lc_;
+  std::string __decimal_point_;
+  std::string __thousands_sep_;
+  std::string __grouping_;
+  std::string __int_curr_symbol_;
+  std::string __currency_symbol_;
+  std::string __mon_decimal_point_;
+  std::string __mon_thousands_sep_;
+  std::string __mon_grouping_;
+  std::string __positive_sign_;
+  std::string __negative_sign_;
+};
+
+//
+// Locale management
+//
+class __locale_t {
+public:
+  __locale_t() : __locale_(nullptr), __locale_str_(nullptr), __lc_(nullptr) {}
+  __locale_t(std::nullptr_t) : __locale_(nullptr), __locale_str_(nullptr), __lc_(nullptr) {}
+  __locale_t(::_locale_t __loc, const char* __loc_str) : __locale_(__loc), __locale_str_(__loc_str), __lc_(nullptr) {}
+  __locale_t(const __locale_t& __loc)
+      : __locale_(__loc.__locale_), __locale_str_(__loc.__locale_str_), __lc_(nullptr) {}
+
+  ~__locale_t() { delete __lc_; }
+
+  __locale_t& operator=(const __locale_t& __loc) {
+    __locale_     = __loc.__locale_;
+    __locale_str_ = __loc.__locale_str_;
+    // __lc_ not copied
+    return *this;
+  }
+
+  friend bool operator==(const __locale_t& __left, const __locale_t& __right) {
+    return __left.__locale_ == __right.__locale_;
+  }
+
+  friend bool operator==(const __locale_t& __left, int __right) { return __left.__locale_ == nullptr && __right == 0; }
+
+  friend bool operator==(const __locale_t& __left, long long __right) {
+    return __left.__locale_ == nullptr && __right == 0;
+  }
+
+  friend bool operator==(const __locale_t& __left, std::nullptr_t) { return __left.__locale_ == nullptr; }
+
+  friend bool operator==(int __left, const __locale_t& __right) { return __left == 0 && nullptr == __right.__locale_; }
+
+  friend bool operator==(std::nullptr_t, const __locale_t& __right) { return nullptr == __right.__locale_; }
+
+  friend bool operator!=(const __locale_t& __left, const __locale_t& __right) { return !(__left == __right); }
+
+  friend bool operator!=(const __locale_t& __left, int __right) { return !(__left == __right); }
+
+  friend bool operator!=(const __locale_t& __left, long long __right) { return !(__left == __right); }
+
+  friend bool operator!=(const __locale_t& __left, std::nullptr_t __right) { return !(__left == __right); }
+
+  friend bool operator!=(int __left, const __locale_t& __right) { return !(__left == __right); }
+
+  friend bool operator!=(std::nullptr_t __left, const __locale_t& __right) { return !(__left == __right); }
+
+  operator bool() const { return __locale_ != nullptr; }
+
+  const char* __get_locale() const { return __locale_str_; }
+
+  operator ::_locale_t() const { return __locale_; }
+
+  std::lconv* __store_lconv(const std::lconv* __input_lc) {
+    delete __lc_;
+    __lc_ = new __lconv_storage(__input_lc);
+    return __lc_->__get();
+  }
+
+private:
+  ::_locale_t __locale_;
+  const char* __locale_str_;
+  __lconv_storage* __lc_ = nullptr;
+};
+
+// __uselocale can't be implemented on Windows because Windows allows partial modification
+// of thread-local locale and so _get_current_locale() returns a copy while __uselocale does
+// not create any copies. We can still implement RAII even without __uselocale though.
+__locale_t __uselocale(__locale_t) = delete;
+_LIBCPP_EXPORTED_FROM_ABI __locale_t __newlocale(int __mask, const char* __locale, __locale_t __base);
+inline _LIBCPP_HIDE_FROM_ABI void __freelocale(__locale_t __loc) { ::_free_locale(__loc); }
+_LIBCPP_EXPORTED_FROM_ABI lconv* __localeconv(__locale_t& __loc);
+
+//
+// Strtonum functions
+//
+
+// the *_l functions are prefixed on Windows, only available for msvcr80+, VS2005+
+#if defined(_LIBCPP_MSVCRT)
+inline _LIBCPP_HIDE_FROM_ABI float __strtof(const char* __nptr, char** __endptr, __locale_t __loc) {
+  return ::_strtof_l(__nptr, __endptr, __loc);
+}
+inline _LIBCPP_HIDE_FROM_ABI long double __strtold(const char* __nptr, char** __endptr, __locale_t __loc) {
+  return ::_strtold_l(__nptr, __endptr, __loc);
+}
+#else
+_LIBCPP_EXPORTED_FROM_ABI float __strtof(const char*, char**, __locale_t);
+_LIBCPP_EXPORTED_FROM_ABI long double __strtold(const char*, char**, __locale_t);
+#endif
+
+inline _LIBCPP_HIDE_FROM_ABI double __strtod(const char* __nptr, char** __endptr, __locale_t __loc) {
+  return ::_strtod_l(__nptr, __endptr, __loc);
+}
+
+inline _LIBCPP_HIDE_FROM_ABI long long __strtoll(const char* __nptr, char** __endptr, int __base, __locale_t __loc) {
+  return ::_strtoi64_l(__nptr, __endptr, __base, __loc);
+}
+inline _LIBCPP_HIDE_FROM_ABI unsigned long long
+__strtoull(const char* __nptr, char** __endptr, int __base, __locale_t __loc) {
+  return ::_strtoui64_l(__nptr, __endptr, __base, __loc);
+}
+
+//
+// Character manipulation functions
+//
+inline _LIBCPP_HIDE_FROM_ABI int __islower(int __c, _locale_t __...
[truncated]

@ldionne ldionne force-pushed the review/locale/8-extract-windows branch from dabed09 to cc9368a Compare November 11, 2024 18:25
@mstorsjo
Copy link
Member

Since this changes the name of some types and entry points in the built library, this is effectively an ABI break on Windows (which is acceptable since we don't promise ABI stability on that platform).

I'd like to clarify/reiterate some nuances about ABI stability on Windows. While I can't claim that the libcxx project promises ABI stability there, there are two quite different cases:

In MSVC environments, we've flagged this a number of times before, making it clear that we do break the ABI occasionally, and this is accepted.

In MinGW environments (using the Itanium C++ ABI), there are a number of setups that do use older libc++ DLLs with newer builds, where ABI compatibility generally is desired - both around llvm-mingw, and within msys2. So far, we've had a number of minor ABI breaks, where such a break (if they happen to break a small minority of packages) has been accepted and tolerated. But I would request that this isn't treated as blanket permission to break the whole ABI just because there's no formal guarantees either; breaking the ABI in this environment can cause grief to our users - proportional to the number of changed symbols and how frequently they are used of course.

CC @lazka @mati865


That said, I'm not sure how often these particular symbols really are referenced in practice; are they implicitly embedded in many cases, or are they only referenced if the user code explicitly uses locale things? In the latter case this is probably fine, in the former case, it will probably cause a lot of problems for msys2.


Separately; I see that this touches a MinGW configuration that we don't have CI coverage for - using MinGW with msvcrt.dll instead of UCRT. This configuration doesn't pass all tests (as msvcrt.dll is quite lacking in many ways), but I ship builds with this configuration in each release.

I can easily add CI coverage for this build configuration; 128 tests are failing in it, so it'll take a little while to get the tests running there, but we can easily add plain build testing of it at least.

@ldionne
Copy link
Member Author

ldionne commented Nov 11, 2024

Thanks for chiming in.

But I would request that this isn't treated as blanket permission to break the whole ABI

Ack. I am putting up the "cleanest" version of this patch which removes the old symbols that are not needed anymore. However, if backwards compatibility is required, we could embed these historically significant symbols into the shared library. There's a number of ways we can do that. We could use aliases if that's a thing on Windows, or we could simply define the old name in a separate .cpp file similar to what we're doing in libcxx/src/legacy_pointer_safety.cpp (but with a real implementation). I wanted to have a discussion before I started making this patch more complicated for ABI guarantees I wasn't sure mattered.

are they implicitly embedded in many cases, or are they only referenced if the user code explicitly uses locale things

I do think that older binaries are going to have references to those symbols, so that means running the binary against a new version of the library would fail with undefined references.

@mstorsjo
Copy link
Member

But I would request that this isn't treated as blanket permission to break the whole ABI

Ack. I am putting up the "cleanest" version of this patch which removes the old symbols that are not needed anymore. However, if backwards compatibility is required, we could embed these historically significant symbols into the shared library. There's a number of ways we can do that. We could use aliases if that's a thing on Windows, or we could simply define the old name in a separate .cpp file similar to what we're doing in libcxx/src/legacy_pointer_safety.cpp (but with a real implementation). I wanted to have a discussion before I started making this patch more complicated for ABI guarantees I wasn't sure mattered.

Ok, thanks! Sure, going about it this way seems reasonable.

When grepping around a bit, I realized that we have the frozen C++03 headers which also reference a number of symbols - so we probably need to keep providing what those need as well.

are they implicitly embedded in many cases, or are they only referenced if the user code explicitly uses locale things

I do think that older binaries are going to have references to those symbols, so that means running the binary against a new version of the library would fail with undefined references.

Right... I guess we'd need to test and see how many of current msys2 binaries would be broken by the change as is (once it's in a otherwise working shape) to figure out how much we'd practically want to keep providing...

@mstorsjo
Copy link
Member

See #115783 for additional CI coverage which can be valuable, as you're touching a couple of conditional lines that currently lack build test coverage upstream.

@lazka
Copy link

lazka commented Nov 12, 2024

CC @lazka @mati865

I don't mind rebuilds in general (for libc++ it seems to be ~ 620 packages), but I'm not sure what a good llvm rebootstrap strategy is - I don't think we had something like this before.

@mstorsjo
Copy link
Member

CC @lazka @mati865

I don't mind rebuilds in general (for libc++ it seems to be ~ 620 packages), but I'm not sure what a good llvm rebootstrap strategy is - I don't think we had something like this before.

Yeah, and even then, such a change would be quite disruptive for users; they would need to hard upgrade every package that uses libc++ in one step, otherwise they'd have broken executables all around.

So I think keeping backwards compat symbols around for at least a major version or two, would be valuable in any case.

(I haven't yet studied what the impact on the referenced symbols is from this change, for something like clang/llvm built on top of libc++ on mingw.)

@lazka
Copy link

lazka commented Nov 12, 2024

Yeah, and even then, such a change would be quite disruptive for users; they would need to hard upgrade every package that uses libc++ in one step, otherwise they'd have broken executables all around.

Last week we rebuilt 940 packages for Python 3.12, so that's not too special at least (assuming we know how to rebootstrap), though Python packages tend to be smaller.

So I think keeping backwards compat symbols around for at least a major version or two, would be valuable in any case.

That would certainly be appreciated, if that's possible.

@mati865
Copy link
Contributor

mati865 commented Nov 13, 2024

I'm not sure what a good llvm rebootstrap strategy

Ideally not having to do that 😉
Still, it'd be manageable if necessary by reverting the change and creating a temporary release that is linked statically to libc++, that would be the easy way. In the worst case we could fix Cygwin host (or maybe it already works these days?) and start over again.

@ldionne ldionne force-pushed the review/locale/8-extract-windows branch from 52187a8 to 767e60c Compare November 28, 2024 21:45
@ldionne ldionne force-pushed the review/locale/8-extract-windows branch from f22e3fb to 8f9461d Compare November 29, 2024 21:47
@ldionne
Copy link
Member Author

ldionne commented Dec 2, 2024

@mstorsjo This patch seems to be in a functional shape now. What would be the level of ABI compatibility you think we need to go for?

The minimum I can see would be providing the old symbols in a .cpp file to avoid changing the set of exported symbols. I assume back-deployment is not required, so we don't need to also do complicated stuff based on the deployment target to allow back-deploying?

@mstorsjo
Copy link
Member

mstorsjo commented Dec 2, 2024

@mstorsjo This patch seems to be in a functional shape now. What would be the level of ABI compatibility you think we need to go for?

Thanks! Let me take it for a spin now that it works as intended, I'll get back to you with a feel of how big a break this would seem to be in practice.

The minimum I can see would be providing the old symbols in a .cpp file to avoid changing the set of exported symbols. I assume back-deployment is not required, so we don't need to also do complicated stuff based on the deployment target to allow back-deploying?

That sounds like what I think would be enough. Backdeployment is indeed not required in this setup. Thanks!

But before you spend more time on that, let me first try this out. (It might be a few days, I'm a little backlogged at the moment.)

@ldionne
Copy link
Member Author

ldionne commented Dec 9, 2024

@mstorsjo Did you have any luck trying out this patch?

@mstorsjo
Copy link
Member

mstorsjo commented Dec 9, 2024

@mstorsjo Did you have any luck trying out this patch?

It does seem to compile nicely, but I haven't yet had time to study the effects of the ABI breakage. Let's hope I get to it this week!

@mstorsjo
Copy link
Member

Ok, now I've had a look at this.

This PR produces the following diff in the list of exported symbols from the libc++ DLL:

@@ -1,9 +1,3 @@
-  Name: _Z10asprintf_lPPc8locale_tPKcz
-  Name: _Z10snprintf_lPcy8locale_tPKcz
-  Name: _Z10strftime_lPcyPKcPK2tm8locale_t
-  Name: _Z11vasprintf_lPPc8locale_tPKcS_
-  Name: _Z8strtof_lPKcPPc8locale_t
-  Name: _Z9strtold_lPKcPPc8locale_t
   Name: _ZN10__cxxabiv116__enum_type_infoD0Ev
   Name: _ZN10__cxxabiv116__enum_type_infoD1Ev
   Name: _ZN10__cxxabiv116__enum_type_infoD2Ev
@@ -1831,6 +1825,22 @@
   Name: _ZNSt3__17promiseIvEC2Ev
   Name: _ZNSt3__17promiseIvED1Ev
   Name: _ZNSt3__17promiseIvED2Ev
+  Name: _ZNSt3__18__locale10__asprintfEPPcNS0_10__locale_tEPKcz
+  Name: _ZNSt3__18__locale10__snprintfEPcyNS0_10__locale_tEPKcz
+  Name: _ZNSt3__18__locale10__strftimeEPcyPKcPK2tmNS0_10__locale_tE
+  Name: _ZNSt3__18__locale11__mbsrtowcsEPwPPKcyPiNS0_10__locale_tE
+  Name: _ZNSt3__18__locale11__newlocaleEiPKcNS0_10__locale_tE
+  Name: _ZNSt3__18__locale12__localeconvERNS0_10__locale_tE
+  Name: _ZNSt3__18__locale12__mb_len_maxENS0_10__locale_tE
+  Name: _ZNSt3__18__locale12__mbsnrtowcsEPwPPKcyyPiNS0_10__locale_tE
+  Name: _ZNSt3__18__locale12__wcsnrtombsEPcPPKwyyPiNS0_10__locale_tE
+  Name: _ZNSt3__18__locale7__btowcEiNS0_10__locale_tE
+  Name: _ZNSt3__18__locale7__wctobEtNS0_10__locale_tE
+  Name: _ZNSt3__18__locale8__mbrlenEPKcyPiNS0_10__locale_tE
+  Name: _ZNSt3__18__locale8__strtofEPKcPPcNS0_10__locale_tE
+  Name: _ZNSt3__18__locale9__mbrtowcEPwPKcyPiNS0_10__locale_tE
+  Name: _ZNSt3__18__locale9__strtoldEPKcPPcNS0_10__locale_tE
+  Name: _ZNSt3__18__locale9__wcrtombEPcwPiNS0_10__locale_tE
   Name: _ZNSt3__18__rs_getEv
   Name: _ZNSt3__18__sp_mut4lockEv
   Name: _ZNSt3__18__sp_mut6unlockEv

So a handful of locale_t specific functions are removed and others appear.

A regular build of LLVM/Clang on top of this libc++ doesn't seem to reference any of those symbols, so we're not breaking the base toolchain if doing this change. That's good. :-)

I had a look in an install of msys2/clang64 to see how many packages/executables would be affected. I don't have a whole lot of packages installed so it's not a very thorough investigation, but if @lazka or @mati865 happen to have more packages installed locally maybe they can try too.

In /clang64/bin I ran ls *.exe *.dll | wc -l and noted that I happen to have 540 executables/DLLs lying around. Out of those, 287 seem to actually be referencing libc++.dll: grep libc++.dll *.exe *.dll | wc -l

How many of them would be bothered by this change?

for i in $(grep libc++.dll *.exe *.dll | grep "Binary file" | awk '{print $3}'); do if llvm-readobj --coff-imports $i | grep -q locale_t; then echo $i; fi; done

In my case, I only get one single hit, libboost_locale-mt.dll.

So if this is the scale of the breakage, perhaps it's not worth the effort to keep providing the old symbols for backwards compatibility. But let's wait for more numbers from the msys2 maintainers before concluding this.

@mati865
Copy link
Contributor

mati865 commented Dec 11, 2024

Apparently no matches for me:

$ pwd
/clang64/bin

$ ls *.exe *.dll | wc -l
1542

$ grep libc++.dll *.exe *.dll | wc -l
736

$ for i in $(grep libc++.dll *.exe *.dll | grep "Binary file" | awk '{print $3}'); do if llvm-readobj --coff-imports $i | grep -q locale_t; then echo $i; fi; done

@ldionne
Copy link
Member Author

ldionne commented Dec 16, 2024

Thanks to you both for running those experiments! Are you fine with me merging this PR then?

@ldionne
Copy link
Member Author

ldionne commented Dec 16, 2024

After thinking about this a bit more, I think your results are not very surprising. I think these symbols are mostly used from within the libc++ dylib, and so perhaps it's not even necessary to export (most of) them in the first place. We could explore that in future patches once we at least have a clear "theoretical" ABI boundary within the libc++ code itself.

…base API

This patch reimplements the locale base support for Windows flavors
in a way that is more modules-friendly and without defining non-internal
names.

Since this changes the name of some types and entry points in the
built library, this is effectively an ABI break on Windows (which
is acceptable since we don't promise ABI stability on that platform).
@ldionne ldionne force-pushed the review/locale/8-extract-windows branch from 8f9461d to fec5f34 Compare December 16, 2024 17:17
@mstorsjo
Copy link
Member

After thinking about this a bit more, I think your results are not very surprising. I think these symbols are mostly used from within the libc++ dylib, and so perhaps it's not even necessary to export (most of) them in the first place. We could explore that in future patches once we at least have a clear "theoretical" ABI boundary within the libc++ code itself.

Yes, I kinda am getting that feeling as well. (Plus very few things explicitly use locales at all - at most it's implicitly passed somewhere.)

Thanks to you both for running those experiments! Are you fine with me merging this PR then?

Yes, I'm ok with that - let's not create unnecessary complexity here.

@ldionne ldionne merged commit 084309a into llvm:main Dec 16, 2024
60 of 62 checks passed
@ldionne ldionne deleted the review/locale/8-extract-windows branch December 16, 2024 22:46
@mstorsjo
Copy link
Member

Btw, one aspect that did come to mind; we have the frozen C++03 header copies - which will expect a different ABI towards the precompiled bits of libcxx. We don't currently use the frozen C++03 headers quite yet (AFAIK?), but once we do, I guess this would crop up whenever trying to run tests in C++03 mode on Windows. (We don't test all standards modes on all OSes currently though.) So I guess we would need to do this kind of maintainance on those otherwise supposedly frozen headers?

@philnik777
Copy link
Contributor

Btw, one aspect that did come to mind; we have the frozen C++03 header copies - which will expect a different ABI towards the precompiled bits of libcxx. We don't currently use the frozen C++03 headers quite yet (AFAIK?), but once we do, I guess this would crop up whenever trying to run tests in C++03 mode on Windows. (We don't test all standards modes on all OSes currently though.) So I guess we would need to do this kind of maintainance on those otherwise supposedly frozen headers?

Yes, this is one place where we'd probably have to change the C++03 headers. They should be frozen as much as that's maintainable, so I don't think that goes against the idea.

@mstorsjo
Copy link
Member

mstorsjo commented Jan 2, 2025

Yes, this is one place where we'd probably have to change the C++03 headers. They should be frozen as much as that's maintainable, so I don't think that goes against the idea.

Ok, thanks, that's reassuring to hear that we'd be able to still do this kind of maintainance as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github:workflow libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants