Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CLI Options #364

Merged
merged 31 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c9d571d
Added utility copy and info commands.
markspec Jun 20, 2023
92e15e6
Improve formating
markspec Jun 21, 2023
2a6f039
Updates to pass pre-commit.
markspec Jun 21, 2023
2f18427
Add access pattern to info.
markspec Jun 29, 2023
45a764d
Add tests.
markspec Jul 6, 2023
71c1ec9
Move utility commands to root level.
markspec Jul 19, 2023
0565708
Update linting and tests.
markspec Jul 19, 2023
f79ab7f
Resolve PR comments.
markspec Nov 7, 2023
07bc1e3
Fix mdio copy.
markspec Nov 7, 2023
16232b1
Remove duplicate tmp in .gitignore.
markspec Nov 7, 2023
0f9f63b
Remove unnecessary try/except block in segy.py
tasansal Mar 7, 2024
2cc8880
Make 'info' command work with new CLI and add rich printing
tasansal Mar 7, 2024
579baac
make copy work with new CLI
tasansal Mar 7, 2024
3f9b616
Update description in info.py module
tasansal Mar 7, 2024
c897a40
Change input mdio option to argument in info command
tasansal Mar 7, 2024
8cbe3a2
Update variable name and table title in info.py
tasansal Mar 7, 2024
afb2c79
Replace copy command filename options with arguments
tasansal Mar 7, 2024
3c6048f
make tests work for option -> argument conversion
tasansal Mar 7, 2024
fb6af77
Refactor imports and command options in copy.py
tasansal Mar 7, 2024
2bb2fbe
revert back to click types, better error handling
tasansal Mar 7, 2024
3dae50c
Refactor segy.py and update test_main.py
tasansal Mar 7, 2024
4f45c98
Add future annotations to copy command
tasansal Mar 7, 2024
0925cf4
Refactor import location in copy.py
tasansal Mar 7, 2024
ecb1b57
Refactor MDIO info command for better code organization
tasansal Mar 7, 2024
7c3f496
Move pytest-dependency to test suite installs
tasansal Mar 7, 2024
727140f
Add future annotations import to info.py
tasansal Mar 7, 2024
51f3052
Update usage documentation for mdio commands
tasansal Mar 7, 2024
fa1fa82
directly import click_params objects
tasansal Mar 7, 2024
316e6bd
directly import click_params objects
tasansal Mar 7, 2024
d4a0237
Add "fastentrypoints" to build requirements
tasansal Mar 7, 2024
fdf51fb
change overwrite to flag
tasansal Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST

pip-*
tmp*
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
Expand Down Expand Up @@ -150,4 +151,5 @@ cython_debug/
mdio1/*
*/mdio1/*
pytest-of-*
tmp/
tmp
debugging/*
38 changes: 19 additions & 19 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ There are many more options, please see the [CLI Reference](#cli-reference).

```shell
$ mdio segy import \
-i path_to_segy_file.segy \
-o path_to_mdio_file.mdio \
path_to_segy_file.segy \
path_to_mdio_file.mdio \
-loc 181,185 \
-names inline,crossline
```
Expand All @@ -20,8 +20,8 @@ should be executed.

```shell
$ mdio segy export \
-i path_to_mdio_file.mdio \
-o path_to_segy_file.segy
path_to_mdio_file.mdio \
path_to_segy_file.segy
```

## Cloud Connection Strings
Expand Down Expand Up @@ -79,19 +79,19 @@ Using UNIX:

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file s3://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
s3://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"key": "my_super_private_key", "secret": "my_super_private_secret"}'
```

Using Windows (note the extra escape characters `\`):

```console
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file s3://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
s3://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options "{\"key\": \"my_super_private_key\", \"secret\": \"my_super_private_secret\"}"
```

Expand All @@ -114,19 +114,19 @@ Using a service account:

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file gs://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
gs://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"token": "~/.config/gcloud/application_default_credentials.json"}'
```

Using browser to populate authentication:

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file gs://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
gs://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"token": "browser"}'
```

Expand All @@ -145,9 +145,9 @@ If ADL is not pre-authenticated, you need to pass `--storage-options`.

```shell
mdio segy import \
--input-segy-path path/to/my.segy
--output-mdio-file az://bucket/prefix/my.mdio
--header-locations 189,193
path/to/my.segy \
az://bucket/prefix/my.mdio \
--header-locations 189,193 \
--storage-options '{"account_name": "myaccount", "account_key": "my_super_private_key"}'
```

Expand Down
4 changes: 3 additions & 1 deletion noxfile.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""Nox sessions."""


import os
import shlex
import shutil
Expand Down Expand Up @@ -161,7 +163,7 @@ def mypy(session: Session) -> None:
def tests(session: Session) -> None:
"""Run the test suite."""
session.install(".")
session.install("coverage[toml]", "pytest", "pygments")
session.install("coverage[toml]", "pytest", "pygments", "pytest-dependency")
try:
session.run("coverage", "run", "--parallel", "-m", "pytest", *session.posargs)
finally:
Expand Down
2 changes: 1 addition & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ segyio = "^1.9.3"
numba = "^0.59.0"
psutil = "^5.9.5"
fsspec = ">=2023.9.1"
rich = "^13.7.1"
urllib3 = "^1.26.18" # Workaround for poetry-plugin-export/issues/183

# Extras
Expand Down Expand Up @@ -109,5 +110,5 @@ ignore_missing_imports = true


[build-system]
requires = ["poetry-core"]
requires = ["poetry-core", "fastentrypoints"]
build-backend = "poetry.core.masonry.api"
12 changes: 9 additions & 3 deletions src/mdio/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,21 @@
import click


KNOWN_MODULES = ["segy.py"]
KNOWN_MODULES = [
"segy.py",
"copy.py",
"info.py",
]


class MyCLI(click.MultiCommand):
"""CLI generator via plugin design pattern.

This class dynamically loads command modules from the specified
`plugin_folder`. Each command module should define a `cli` function
that implements the command logic.
`plugin_folder`. If the command us another CLI group, the command
module must define a `cli = click.Group(...)` and subsequent
commands must be added to this CLI. If it is a single utility it
must have a variable named `cli` for the command to be exposed.

Args:
- plugin_folder: Path to the directory containing command modules.
Expand Down
89 changes: 89 additions & 0 deletions src/mdio/commands/copy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
"""MDIO Dataset copy command."""


from __future__ import annotations

from click import STRING
from click import argument
from click import command
from click import option
from click_params import JSON


@command(name="copy")
@argument("source-mdio-path", type=str)
@argument("target-mdio-path", type=str)
@option(
"-access",
"--access-pattern",
required=False,
default="012",
help="Access pattern of the file",
type=STRING,
show_default=True,
)
@option(
"-exc",
"--excludes",
required=False,
default="",
help="Data to exclude during copy, like `chunked_012`. The data values won’t be "
"copied but an empty array will be created. If blank, it copies everything.",
type=STRING,
)
@option(
"-inc",
"--includes",
required=False,
default="",
help="Data to include during copy, like `trace_headers`. If not specified, and "
"certain data is excluded, it will not copy headers. To preserve headers, "
"specify trace_headers. If left blank, it will copy everything except what is "
"specified in the 'excludes' parameter.",
type=STRING,
)
@option(
"-storage",
"--storage-options",
required=False,
help="Custom storage options for cloud backends",
type=JSON,
)
@option(
"-overwrite",
"--overwrite",
is_flag=True,
help="Flag to overwrite if mdio file if it exists",
show_default=True,
)
def copy(
source_mdio_path: str,
target_mdio_path: str,
access_pattern: str = "012",
includes: str = "",
excludes: str = "",
storage_options: dict | None = None,
overwrite: bool = False,
) -> None:
"""Copy a MDIO dataset to anpther MDIO dataset.

Can also copy with empty data to be filled later. See `excludes`
and `includes` parameters.

More documentation about `excludes` and `includes` can be found
in Zarr's documentation in `zarr.convenience.copy_store`.
"""
from mdio import MDIOReader

reader = MDIOReader(source_mdio_path, access_pattern=access_pattern)

reader.copy(
dest_path_or_buffer=target_mdio_path,
excludes=excludes,
includes=includes,
storage_options=storage_options,
overwrite=overwrite,
)


cli = copy
134 changes: 134 additions & 0 deletions src/mdio/commands/info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
"""MDIO Dataset information command."""


from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Any

from click import STRING
from click import Choice
from click import argument
from click import command
from click import option


if TYPE_CHECKING:
from mdio.core import Grid


@command(name="info")
@argument("mdio-path", type=STRING)
@option(
"-access",
"--access-pattern",
required=False,
default="012",
help="Access pattern of the file",
type=STRING,
show_default=True,
)
@option(
"-format",
"--output-format",
required=False,
default="pretty",
help="Output format. Pretty console or JSON.",
type=Choice(["pretty", "json"]),
show_default=True,
show_choices=True,
)
def info(
mdio_path: str,
output_format: str,
access_pattern: str,
) -> None:
"""Provide information on a MDIO dataset.

By default, this returns human-readable information about the grid and stats for
the dataset. If output-format is set to json then a json is returned to
facilitate parsing.
"""
from mdio import MDIOReader

reader = MDIOReader(
mdio_path,
access_pattern=access_pattern,
return_metadata=True,
)

grid_dict = parse_grid(reader.grid)
stats_dict = cast_stats(reader.stats)

mdio_info = {
"path": mdio_path,
"stats": stats_dict,
"grid": grid_dict,
}

if output_format == "pretty":
pretty_print(mdio_info)

if output_format == "json":
json_print(mdio_info)


def cast_stats(stats_dict: dict[str, Any]) -> dict[str, float]:
"""Normalize all floats to JSON serializable floats."""
return {k: float(v) for k, v in stats_dict.items()}


def parse_grid(grid: Grid) -> dict[str, dict[str, int | str]]:
"""Extract grid information per dimension."""
grid_dict = {}
for dim_name in grid.dim_names:
dim = grid.select_dim(dim_name)
min_ = str(dim.coords[0])
max_ = str(dim.coords[-1])
size = str(dim.coords.shape[0])
grid_dict[dim_name] = {"name": dim_name, "min": min_, "max": max_, "size": size}
return grid_dict


def json_print(mdio_info: dict[str, Any]) -> None:
"""Convert MDIO Info to JSON and pretty print."""
from json import dumps as json_dumps

from rich import print

print(json_dumps(mdio_info, indent=2))


def pretty_print(mdio_info: dict[str, Any]) -> None:
"""Print pretty MDIO Info table to console."""
from rich.console import Console
from rich.table import Table

console = Console()

grid_table = Table(show_edge=False)
grid_table.add_column("Dimension", justify="right", style="cyan", no_wrap=True)
grid_table.add_column("Min", justify="left", style="magenta")
grid_table.add_column("Max", justify="left", style="magenta")
grid_table.add_column("Size", justify="left", style="green")

for _, axis_dict in mdio_info["grid"].items():
name, min_, max_, size = axis_dict.values()
grid_table.add_row(name, min_, max_, size)

stat_table = Table(show_edge=False)
stat_table.add_column("Stat", justify="right", style="cyan", no_wrap=True)
stat_table.add_column("Value", justify="left", style="magenta")

for stat, value in mdio_info["stats"].items():
stat_table.add_row(stat, f"{value:.4f}")

master_table = Table(title=f"File Information for {mdio_info['path']}")
master_table.add_column("MDIO Grid", justify="center")
master_table.add_column("MDIO Statistics", justify="center")
master_table.add_row(grid_table, stat_table)

console.print(master_table)


cli = info
Loading
Loading