Skip to content

Commit

Permalink
New CLI Options (#364)
Browse files Browse the repository at this point in the history
* Added utility copy and info commands.

* Improve formating

* Updates to pass pre-commit.

* Add access pattern to info.

* Add tests.

* Move utility commands to root level.

* Update linting and tests.

* Resolve PR comments.

* Fix mdio copy.

* Remove duplicate tmp in .gitignore.

* Remove unnecessary try/except block in segy.py

* Make 'info' command work with new CLI and add rich printing

* make copy work with new CLI

* Update description in info.py module

* Change input mdio option to argument in info command

* Update variable name and table title in info.py

* Replace copy command filename options with arguments

* make tests work for option -> argument conversion

* Refactor imports and command options in copy.py

* revert back to click types, better error handling

* Refactor segy.py and update test_main.py

Updated the segy.py file to import specific functions from click, rather than the entire module. The command decorator's function signatures and calls are also updated. This is to improve specificity and reduce unnecessary overhead. Additionally, modified the way command line arguments are passed in test_main.py as per the refactored changes in the main function.

* Add future annotations to copy command

* Refactor import location in copy.py

The import statement for 'MDIOReader' in the copy.py file has been moved to a more appropriate position. This change aims to maximize importing efficiency by having the import statement closer to where the imported module is being used.

* Refactor MDIO info command for better code organization

The MDIO info command is refactored to enhance code readability and maintenance. The new structure involves separate functions for 'cast_stats', 'parse_grid' and 'pretty_print' to each perform distinct tasks. This improves the clear segregation of tasks and ease of future modifications.

* Move pytest-dependency to test suite installs

* Add future annotations import to info.py

* Update usage documentation for mdio commands

The documentation for the mdio commands has been updated to reflect changes in the command syntax. Parameters for input and output files are now required positional arguments, rather than options, enhancing the clarity and readability of the commands.

* directly import click_params objects

* directly import click_params objects

* Add "fastentrypoints" to build requirements

* change overwrite to flag

---------

Co-authored-by: Mark Roberts <mark.roberts@tgs.com>
  • Loading branch information
tasansal and markspec authored Mar 7, 2024
1 parent d14a170 commit 28a8e9f
Show file tree
Hide file tree
Showing 11 changed files with 363 additions and 120 deletions.
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST

pip-*
tmp*
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
Expand Down Expand Up @@ -150,4 +151,5 @@ cython_debug/
mdio1/*
*/mdio1/*
pytest-of-*
tmp/
tmp
debugging/*
38 changes: 19 additions & 19 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ There are many more options, please see the [CLI Reference](#cli-reference).

```shell
$ mdio segy import \
-i path_to_segy_file.segy \
-o path_to_mdio_file.mdio \
path_to_segy_file.segy \
path_to_mdio_file.mdio \
-loc 181,185 \
-names inline,crossline
```
Expand All @@ -20,8 +20,8 @@ should be executed.

```shell
$ mdio segy export \
-i path_to_mdio_file.mdio \
-o path_to_segy_file.segy
path_to_mdio_file.mdio \
path_to_segy_file.segy
```

## Cloud Connection Strings
Expand Down Expand Up @@ -79,19 +79,19 @@ Using UNIX:

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file s3://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
s3://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
```

Using Windows (note the extra escape characters `\`):

```console
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file s3://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
s3://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
```

Expand All @@ -114,19 +114,19 @@ Using a service account:

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file gs://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
gs://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"token": "~/.config/gcloud/application_default_credentials.json"}'
```

Using browser to populate authentication:

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file gs://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
gs://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"token": "browser"}'
```

Expand All @@ -145,9 +145,9 @@ If ADL is not pre-authenticated, you need to pass `--storage-options`.

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file az://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
az://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"account_name": "myaccount", "account_key": "my_super_private_key"}'
```

Expand Down
4 changes: 3 additions & 1 deletion noxfile.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""Nox sessions."""


import os
import shlex
import shutil
Expand Down Expand Up @@ -161,7 +163,7 @@ def mypy(session: Session) -> None:
def tests(session: Session) -> None:
"""Run the test suite."""
session.install(".")
session.install("coverage[toml]", "pytest", "pygments")
session.install("coverage[toml]", "pytest", "pygments", "pytest-dependency")
try:
session.run("coverage", "run", "--parallel", "-m", "pytest", *session.posargs)
finally:
Expand Down
2 changes: 1 addition & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ segyio = "^1.9.3"
numba = "^0.59.0"
psutil = "^5.9.5"
fsspec = ">=2023.9.1"
rich = "^13.7.1"
urllib3 = "^1.26.18" # Workaround for poetry-plugin-export/issues/183

# Extras
Expand Down Expand Up @@ -109,5 +110,5 @@ ignore_missing_imports = true


[build-system]
requires = ["poetry-core"]
requires = ["poetry-core", "fastentrypoints"]
build-backend = "poetry.core.masonry.api"
12 changes: 9 additions & 3 deletions src/mdio/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,21 @@
import click


KNOWN_MODULES = ["segy.py"]
KNOWN_MODULES = [
"segy.py",
"copy.py",
"info.py",
]


class MyCLI(click.MultiCommand):
"""CLI generator via plugin design pattern.
This class dynamically loads command modules from the specified
`plugin_folder`. Each command module should define a `cli` function
that implements the command logic.
`plugin_folder`. If the command us another CLI group, the command
module must define a `cli = click.Group(...)` and subsequent
commands must be added to this CLI. If it is a single utility it
must have a variable named `cli` for the command to be exposed.
Args:
- plugin_folder: Path to the directory containing command modules.
Expand Down
89 changes: 89 additions & 0 deletions src/mdio/commands/copy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
"""MDIO Dataset copy command."""


from __future__ import annotations

from click import STRING
from click import argument
from click import command
from click import option
from click_params import JSON


@command(name="copy")
@argument("source-mdio-path", type=str)
@argument("target-mdio-path", type=str)
@option(
"-access",
"--access-pattern",
required=False,
default="012",
help="Access pattern of the file",
type=STRING,
show_default=True,
)
@option(
"-exc",
"--excludes",
required=False,
default="",
help="Data to exclude during copy, like `chunked_012`. The data values won’t be "
"copied but an empty array will be created. If blank, it copies everything.",
type=STRING,
)
@option(
"-inc",
"--includes",
required=False,
default="",
help="Data to include during copy, like `trace_headers`. If not specified, and "
"certain data is excluded, it will not copy headers. To preserve headers, "
"specify trace_headers. If left blank, it will copy everything except what is "
"specified in the 'excludes' parameter.",
type=STRING,
)
@option(
"-storage",
"--storage-options",
required=False,
help="Custom storage options for cloud backends",
type=JSON,
)
@option(
"-overwrite",
"--overwrite",
is_flag=True,
help="Flag to overwrite if mdio file if it exists",
show_default=True,
)
def copy(
source_mdio_path: str,
target_mdio_path: str,
access_pattern: str = "012",
includes: str = "",
excludes: str = "",
storage_options: dict | None = None,
overwrite: bool = False,
) -> None:
"""Copy a MDIO dataset to anpther MDIO dataset.
Can also copy with empty data to be filled later. See `excludes`
and `includes` parameters.
More documentation about `excludes` and `includes` can be found
in Zarr's documentation in `zarr.convenience.copy_store`.
"""
from mdio import MDIOReader

reader = MDIOReader(source_mdio_path, access_pattern=access_pattern)

reader.copy(
dest_path_or_buffer=target_mdio_path,
excludes=excludes,
includes=includes,
storage_options=storage_options,
overwrite=overwrite,
)


cli = copy
134 changes: 134 additions & 0 deletions src/mdio/commands/info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
"""MDIO Dataset information command."""


from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Any

from click import STRING
from click import Choice
from click import argument
from click import command
from click import option


if TYPE_CHECKING:
from mdio.core import Grid


@command(name="info")
@argument("mdio-path", type=STRING)
@option(
"-access",
"--access-pattern",
required=False,
default="012",
help="Access pattern of the file",
type=STRING,
show_default=True,
)
@option(
"-format",
"--output-format",
required=False,
default="pretty",
help="Output format. Pretty console or JSON.",
type=Choice(["pretty", "json"]),
show_default=True,
show_choices=True,
)
def info(
mdio_path: str,
output_format: str,
access_pattern: str,
) -> None:
"""Provide information on a MDIO dataset.
By default, this returns human-readable information about the grid and stats for
the dataset. If output-format is set to json then a json is returned to
facilitate parsing.
"""
from mdio import MDIOReader

reader = MDIOReader(
mdio_path,
access_pattern=access_pattern,
return_metadata=True,
)

grid_dict = parse_grid(reader.grid)
stats_dict = cast_stats(reader.stats)

mdio_info = {
"path": mdio_path,
"stats": stats_dict,
"grid": grid_dict,
}

if output_format == "pretty":
pretty_print(mdio_info)

if output_format == "json":
json_print(mdio_info)


def cast_stats(stats_dict: dict[str, Any]) -> dict[str, float]:
"""Normalize all floats to JSON serializable floats."""
return {k: float(v) for k, v in stats_dict.items()}


def parse_grid(grid: Grid) -> dict[str, dict[str, int | str]]:
"""Extract grid information per dimension."""
grid_dict = {}
for dim_name in grid.dim_names:
dim = grid.select_dim(dim_name)
min_ = str(dim.coords[0])
max_ = str(dim.coords[-1])
size = str(dim.coords.shape[0])
grid_dict[dim_name] = {"name": dim_name, "min": min_, "max": max_, "size": size}
return grid_dict


def json_print(mdio_info: dict[str, Any]) -> None:
"""Convert MDIO Info to JSON and pretty print."""
from json import dumps as json_dumps

from rich import print

print(json_dumps(mdio_info, indent=2))


def pretty_print(mdio_info: dict[str, Any]) -> None:
"""Print pretty MDIO Info table to console."""
from rich.console import Console
from rich.table import Table

console = Console()

grid_table = Table(show_edge=False)
grid_table.add_column("Dimension", justify="right", style="cyan", no_wrap=True)
grid_table.add_column("Min", justify="left", style="magenta")
grid_table.add_column("Max", justify="left", style="magenta")
grid_table.add_column("Size", justify="left", style="green")

for _, axis_dict in mdio_info["grid"].items():
name, min_, max_, size = axis_dict.values()
grid_table.add_row(name, min_, max_, size)

stat_table = Table(show_edge=False)
stat_table.add_column("Stat", justify="right", style="cyan", no_wrap=True)
stat_table.add_column("Value", justify="left", style="magenta")

for stat, value in mdio_info["stats"].items():
stat_table.add_row(stat, f"{value:.4f}")

master_table = Table(title=f"File Information for {mdio_info['path']}")
master_table.add_column("MDIO Grid", justify="center")
master_table.add_column("MDIO Statistics", justify="center")
master_table.add_row(grid_table, stat_table)

console.print(master_table)


cli = info
Loading

0 comments on commit 28a8e9f

Please sign in to comment.