Merge pull request #140 from DanCardin/dc/parse

fix: Refactor parser combinators into dedicated module, and document the behavior more thoroughly.
DanCardin · Aug 13, 2024 · ccc5766 · ccc5766
2 parents 7188941 + 5aa5b7b
commit ccc5766
Show file tree

Hide file tree

Showing 9 changed files with 146 additions and 53 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,10 @@
 
 ## 0.22
 
+### 0.22.5
+
+- fix: Refactor parser combinators into dedicated module, and document the behavior more thoroughly.
+
 ### 0.22.4
 
 - fix: Avoid applying annotated type parsing to default value.

diff --git a/docs/source/annotation.md b/docs/source/annotation.md
@@ -184,7 +184,7 @@ As such you can again, opt out of the "Mapping inference" entirely, by supplying
 your own `parse` function.
 
 ```{note}
-Mapping inference is built up out of component functions defined in `cappa.annotation`,
+Mapping inference is built up out of component functions defined in `cappa.parse`,
 such as `parse_list`, which know how to translate `list[int]` and a source list of raw
 parser strings into a list of ints.
 

diff --git a/docs/source/arg.md b/docs/source/arg.md
@@ -61,7 +61,7 @@ default inferred behavior.
 ```{note}
 This feature is currently experimental, in particular because the parser state
 available to either backend's callable is radically different. However, for an
-action callable which accepts no arguments, behaviors is unlikely to change.
+action callable which accepts no arguments, the behavior is unlikely to change.
 ```
 
 In addition to one of the literal `ArgAction` variants, the provided action can
@@ -73,7 +73,7 @@ argument.
 ```{note}
 Custom actions may interfere with general inference features, depending on what you're
 doing (given that you're taking over the parser's duty of determining how the
-code ought to handle the argument).
+code ought to handle the argument in question).
 
 As such, you may need to specify options like `num_args`, where you wouldn't have otherwise
 needed to.
@@ -97,7 +97,8 @@ The set of available objects to inject include:
  original input.
 
 The above set of objects is of potentially limited value. More parser state will
-likely be exposed through this interface in the future.
+likely be exposed through this interface in the future. If you think some specific
+bit of parser state is missing and could be useful to you, please raise an issue!
 
 For example:
 
@@ -289,3 +290,60 @@ is not allowed with argument '-v'`
 An explicit `group=` can still be used in concert with the above syntax to control
 the `order` and name of the resultant group.
 ```
+
+## Parse
+
+`Arg.parse` can be used to provide **specific** instruction to cappa as to how to
+handle the raw value given from the CLI parser backend.
+
+In _general_, this argument shouldn't need to be specified because the annotated
+type will generally _imply_ how that particular value ought to be parsed, especially
+for built in python types.
+
+However, there will inevitably be cases where the type itself is not enough to infer
+the specific parsing required. Take for example:
+
+```python
+from datetime import date
+
+@dataclass
+class Example:
+ iso_date: date
+ american_date: Annotated[date, cappa.Arg(parse=lambda date_str: date.strptime('%d/%m/%y'))]
+```
+
+Cappa's default date parsing assumes an input isoformat string. However you might instead
+want a specific alternate parsing behavior; and `parse=` is how that is achieved.
+
+Further, this is likely more useful for parsing any custom classes which dont have simple,
+single-string-input constructor arguments.
+
+```{note}
+Note cappa itself contains a number of component `parse_*` functions inside the `parse`
+module, which can be used in combination with your own custom `parse` functions.
+```
+
+### Parsing JSON
+
+Another example of a potentially useful parsing concept could be to parse json string input.
+For example:
+
+```python
+import json
+from dataclasses import dataclass
+from typing import Annotated
+
+import cappa
+
+@dataclass
+class Example:
+ todo: Annotated[dict[str, int], cappa.Arg(parse=json.loads)]
+
+todo = cappa.parse(Todo)
+print(todo)
+```
+
+Natively (at present), cappa doesn't have any specific `dict` type inference because it's
+ambiguous what CLI input shape that ought to map to. However, by combining that with
+a dedicated `parse=json.loads` annotation, `example.py '{"foo": "bar"}'` now yields
+`Example({'foo': 'bar'})`.
diff --git a/docs/source/backends.md b/docs/source/backends.md
@@ -1,5 +1,12 @@
 # Parser Backends
 
+```{note}
+If you're looking for custom parsing of individual arguments, you probably want
+[Arg.parse](./arg.md#parse) or [Arg.action](./arg.md#action).
+
+This document is concerned with the actual e2e CLI parsing process.
+```
+
 Cappa is designed in two parts:
 
 - The "frontend", which is the vast majority of the public API. All the
@@ -68,3 +75,31 @@ Some potential reasons you want want to use the argparse backend:
  guarantee an arbitrary argparse extension will function correctly with cappa,
  but to the extent possible, it's a goal that they should be supported if
  possible.
+
+## Custom Backends
+
+`cappa.invoke`/`cappa.parse` both accept a `backend=` argument which is used to
+select between the existing two backends shipped with Cappa.
+
+Technically, you could use this `backend` argument to author an entirely new
+backend to a different argument parser, like `click` for example (although
+a click backend was attempted at some point and later abandoned due to unforeseen
+complexities). This would allow you to retain all of cappa's pre-parsing and inference
+capabilities, as well as the post-processing, mapping, and invoke/dependency injection
+infrastructure.
+
+With that said, it's much more likely that it could be useful to make use of the
+backend argument to **wrap** one of the existing backends, and mutate the resultant
+output structure of the backend before it's passed further downstream. This is somewhat
+of an interesting usecase, and again if you find yourself making use of this in order
+to work around potential upstream deficiencies in cappa, please bring it up in an
+issue/discussion!
+
+```{note}
+The backend **interface** is currently not set in stone. Before relying on the specific
+details of the input/output shape of a backend, please bring it up in an issue/discussion
+in hopes that customizing the backend may be made to not be necessary!
+
+Further, it's likely that the backend interface is more formalized at some point
+in the future; at which point it may break those assumptions.
+```
diff --git a/docs/source/manual_construction.md b/docs/source/manual_construction.md
@@ -12,7 +12,7 @@ from the class in question, if much more manually.
 from dataclasses import dataclass
 
 import cappa
-from cappa.annotation import parse_list
+from cappa.parse import parse_list
 
 
 @dataclass
@@ -37,14 +37,14 @@ result = cappa.parse(command, argv=["one", "2", "3"])
 ```
 
 There are a number of built-in parser functions used to build up the existing
-inference system. [parse_value](cappa.annotation.parse_value) the the main
+inference system. [parse_value](cappa.parse.parse_value) the the main
 entrypoint used by cappa internally, but each of the parser factory functions
 below make up the component built-in parsers for each type.
 
 For inherent types, like `int`, `float`, etc. Their constructor may serve as
 their own parser.
 
 ```{eval-rst}
-.. autoapimodule:: cappa.annotation
+.. autoapimodule:: cappa.parse
  :members: parse_value, parse_list, parse_tuple, parse_literal, parse_none, parse_set, parse_union
 ```
diff --git a/src/cappa/arg.py b/src/cappa/arg.py
@@ -9,23 +9,20 @@
 
 from typing_inspect import is_optional_type
 
-from cappa.annotation import (
- detect_choices,
- is_sequence_type,
- parse_optional,
- parse_value,
-)
 from cappa.class_inspect import Field, extract_dataclass_metadata
 from cappa.completion.completers import complete_choices
 from cappa.completion.types import Completion
 from cappa.env import Env
+from cappa.parse import parse_optional, parse_value
 from cappa.typing import (
  MISSING,
  NoneType,
  T,
+ detect_choices,
  find_type_annotation,
  get_optional_type,
  is_of_type,
+ is_sequence_type,
  is_subclass,
  is_union_type,
  missing,

diff --git a/src/cappa/annotation.py → src/cappa/parse.py b/src/cappa/annotation.py → src/cappa/parse.py
@@ -1,14 +1,13 @@
 from __future__ import annotations
 
-import enum
 import types
 import typing
 from datetime import date, datetime, time
 
-from typing_inspect import get_origin, is_literal_type, is_optional_type
+from typing_inspect import is_literal_type
 
 from cappa.file_io import FileMode
-from cappa.typing import T, get_optional_type, is_none_type, is_subclass, is_union_type
+from cappa.typing import T, is_none_type, is_subclass, is_union_type, repr_type
 
 __all__ = [
  "parse_value",
@@ -18,7 +17,6 @@
  "parse_union",
  "parse_tuple",
  "parse_none",
- "detect_choices",
 ]
 
 
@@ -217,37 +215,3 @@ def file_io_mapper(value: str):
  return file_mode(value)
 
  return file_io_mapper
-
-
-def detect_choices(annotation: type) -> list[str] | None:
- if is_optional_type(annotation):
- annotation = get_optional_type(annotation)
-
- origin = typing.get_origin(annotation) or annotation
- type_args = typing.get_args(annotation)
- if is_subclass(origin, enum.Enum):
- return [v.value for v in origin] # type: ignore
-
- if is_subclass(origin, (tuple, list, set)):
- origin = typing.cast(type, type_args[0])
- type_args = typing.get_args(type_args[0])
-
- if is_union_type(origin):
- if all(is_literal_type(t) for t in type_args):
- return [str(typing.get_args(t)[0]) for t in type_args]
-
- if is_literal_type(origin):
- return [str(t) for t in type_args]
-
- return None
-
-
-def is_sequence_type(typ):
- return is_subclass(get_origin(typ) or typ, (typing.List, typing.Tuple, typing.Set))
-
-
-def repr_type(t):
- if isinstance(t, type) and not typing.get_origin(t):
- return str(t.__name__)
-
- return str(t).replace("typing.", "")
diff --git a/src/cappa/typing.py b/src/cappa/typing.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import enum
 import sys
 import types
 import typing
@@ -9,7 +10,7 @@
 import typing_extensions
 import typing_inspect
 from typing_extensions import Annotated, get_args, get_origin
-from typing_inspect import is_literal_type
+from typing_inspect import is_literal_type, is_optional_type
 
 try:
  from typing_extensions import Doc
@@ -159,6 +160,40 @@ def is_of_type(annotation, types):
  return False
 
 
+def detect_choices(annotation: type) -> list[str] | None:
+ if is_optional_type(annotation):
+ annotation = get_optional_type(annotation)
+
+ origin = typing.get_origin(annotation) or annotation
+ type_args = typing.get_args(annotation)
+ if is_subclass(origin, enum.Enum):
+ return [v.value for v in origin] # type: ignore
+
+ if is_subclass(origin, (tuple, list, set)):
+ origin = typing.cast(type, type_args[0])
+ type_args = typing.get_args(type_args[0])
+
+ if is_union_type(origin):
+ if all(is_literal_type(t) for t in type_args):
+ return [str(typing.get_args(t)[0]) for t in type_args]
+
+ if is_literal_type(origin):
+ return [str(t) for t in type_args]
+
+ return None
+
+
+def is_sequence_type(typ):
+ return is_subclass(get_origin(typ) or typ, (typing.List, typing.Tuple, typing.Set))
+
+
+def repr_type(t):
+ if isinstance(t, type) and not typing.get_origin(t):
+ return str(t.__name__)
+
+ return str(t).replace("typing.", "")
+
+
 if sys.version_info >= (3, 10):
  _get_type_hints = typing.get_type_hints
 

diff --git a/tests/test_manually_built.py b/tests/test_manually_built.py
@@ -4,8 +4,8 @@
 
 import cappa
 import pytest
-from cappa.annotation import parse_list, parse_value
 from cappa.output import Exit
+from cappa.parse import parse_list, parse_value
 
 from tests.utils import backends, parse