Skip to content

Latest commit

 

History

History
107 lines (86 loc) · 5.9 KB

README.md

File metadata and controls

107 lines (86 loc) · 5.9 KB

Code snippets

System Code Pages

There are two relevant code pages:

  • the OEM code page for use by legacy console applications,
  • the ANSI code page for use by legacy GUI applications

There are several ways to get the active code pages:

  • chcp: Displays or sets the active code page number.
    • 437 IBM437 OEM United States
    • 850 ibm850 OEM Multilingual Latin 1; Western European (DOS)
    • 1250 windows-1250 ANSI Central European; Central European (Windows)
    • 65001 utf-8 Unicode (UTF-8)
    • 1200 utf-16 Unicode UTF-16, little endian byte order (BMP of ISO 10646)
  • PowerShell: Get-WinSystemLocale
  • PowerShell via registry: Get-ItemProperty HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage | Select-Object OEMCP, ACP
  • Windows C API: GetACP(), GetOEMCP(), GetConsoleOutputCP()

As of Windows Version 1903, an application can easily set its own code page as described here.

Change the case sensitivity of files and directories

If you can't believe that Greek letters are case insensitive as well, you can enable case-sensitivity for a specific folder and see for yourself.

fsutil.exe file setCaseSensitiveInfo <path> enable

Python

"🐱".encode("utf-8")

returns the encoding b'\xf0\x9f\x90\xb1'

hex(ord("🐱"))

returns the code point as a hexadecimal value '0x1f431'

Conversion of Unicode String to C++23 escaped characters:

print("".join([f"\\u{{{ord(x):x}}}" for x in "A äöüß μ ‰ ఠ 😆🐱"]))

Compiler Options

msvc's /utf-8 is shorthand for /source-charset:utf-8 and /execution-charset:utf-8, where source-charset can be omitted if a consistent BOM is present at each source file. execution-charset affects string literals in the binary. Runtime file IO is not affected.

We use an .editorconfig file to tell Visual Studio to save the file without the BOM and use the compiler option to set the encoding.

Resources

Characters