Skip to content

Commit

Permalink
added parser for modern argument parsing rules
Browse files Browse the repository at this point in the history
* Added a split function that supports the modern (post VS 2005)
 argument parsing rules.

* Fixed a bug where mslex failed to raise "Unquoted CMD metacharacters".

* Added tests.

* Improved quoted strings to be somewhat easier to read.
  • Loading branch information
smoofra committed Oct 15, 2024
1 parent ba267b8 commit 7735738
Show file tree
Hide file tree
Showing 7 changed files with 580 additions and 103 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
tests/examples.csv filter=lfs diff=lfs merge=lfs -text
39 changes: 36 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,34 @@ functions -- split, quote, and join -- just like shlex.
Windows Quoting
---------------

These are excellent articles to read if you really want to face the
sanity-melting reality buried under the surface of how windows passes command
line arguments to your programs. I recommend you read something else.
Since time immemorial, windows quoting behavior has been strange. Prior to
(I think) Visual Studio 2005, it exhibited the extremely strange modulo 3
periodic behavior which is emulated here in ``split_msvcrt()``. Programs
compiled with the C runtime from Visual Studio 2005 and later exhibit the
somewhat less strange behavior emulated in ``split_ucrt()``.

Microsoft still ships a dll called ``msvcrt.dll`` as part of Windows,
for compatibility reasons. And even though they have been very clear in
their documentation that nobody should ever link against this dll, people
still do, either for compatibility reasons of their own, or because it
is universally available on any version of windows you might care about
without needing to run an installer. And ``msvcrt.dll`` preserves the
extremely strange argument parsing behavior from prior to VS 2005.

You can can download the latest version of `msys2`_ today and build an
executable linking ``msvcrt.dll`` on Windows 11, and it will parse
arguments like Windows 95.

``mslex`` will produce quoted strings that will be parsed correctly by
either modern C runtimes or by ``msvcrt.dll``. When parsing, ``mslex``
parses it both ways and raises an error if they disagree. This can
be overridden by passing ``ucrt=True`` or ``ucrt=False`` to ``split``.

See also:

* `Parsing C Command Line Arguments`_

* `Windows is not a Microsoft Visual C/C++ Run-Time delivery channel`_

* `How a Windows Program Splits Its Command Line Into Individual Arguments`_

Expand All @@ -43,10 +68,18 @@ line arguments to your programs. I recommend you read something else.
.. _`Everyone quotes command line arguments the wrong way`:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

.. _`Windows is not a Microsoft Visual C/C++ Run-Time delivery channel`: https://devblogs.microsoft.com/oldnewthing/20140411-00/?p=1273

.. _`msys2`: https://www.msys2.org/docs/environments/

.. _`Parsing C Command Line Arguments`: https://learn.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170


Automatic selection between mslex and shlex
-------------------------------------------

If you want to automatically use mslex on Windows, and shlex otherwise, check out the `oslex`_ package.

.. _`oslex`: https://pypi.org/project/oslex/
.. _`msvcrt`: https://devblogs.microsoft.com/oldnewthing/20140411-00/?p=1273
.. _`UCRT`: https://learn.microsoft.com/en-us/cpp/porting/upgrade-your-code-to-the-universal-crt?view=msvc-170
Loading

0 comments on commit 7735738

Please sign in to comment.