Can I use nbstripout in Github action as a test ? #158

12rambau · 2021-10-27T22:37:29Z

I'm using the pre-commit hook of nbstripout and black to improve code quality of my python applications. As some of the maintainer are not installing pre-commit they are still pushing code with outputs.

Would it be possible to use nbstripout in Github action to check the presence of output ?

joaonc · 2021-10-30T00:07:14Z

This question is probably better suited for StackOverflow.
That said, check out the --dry-run option. However, this currently always return 0, which may make it not usable as a GH action (not super familiar w/ these though, so may be wrong). #159

kynan · 2022-01-02T16:54:23Z

Interesting use case. Have you had a chance to explore this further?

12rambau · 2022-01-05T07:45:25Z

not yet, but that's definitely on my to-do list.

kynan · 2022-09-24T12:45:20Z

@12rambau Are you still interested in this? Feel like working on it?

12rambau · 2022-09-24T15:17:48Z

I'm still interested but I was swamped with many things all year, I'll do some open source work starting this week and this repository is on my todo list

kynan · 2022-09-24T16:56:36Z

Perfect! Keep me posted and let me know if you have questions, get stuck etc. :)

arobrien · 2022-11-13T02:58:47Z

I created a script to do this as a stop-gap measure until we can get this into nbstripout. I'm putting in a few Pull requests now to try and make that happen, starting with #176 to get tests working on windows, and then aiming to do a re-organisation to reduce code duplication and move functionality out of main() after that.

This allows me to call python nbstripout_wrapper.py --check notebooks in my CI, and users can run python nbstripout_wrapper.py notebooks to strip all of their notebooks manually if they don't want to implement a pre-commit hook for whatever reason.

It's a bit quick-and-dirty, and it ends up re-implementing (copy-paste) a lot of nbstripout code, but key improvements were to:

implement recursive walking of a directory, so you can pass directories instead of just paths
keep track of which files would have been modified, and if running with --check, return 1 if any files would have been modified.

""""
Re-packages nbstripout functionality to be more useful for CI/CD pipelines.

Supports most of the same options as nbstripout, but does not support installation as a git hook.

Based on nbstripout version 0.6.1
This is currently very fragile to changes in nbstripout internal structure, so it is recommended to fix the version in
requirements.txt
"""

from argparse import ArgumentParser
from copy import deepcopy
import io
from pathlib import Path
from typing import Any, List
import warnings

from nbformat import NotebookNode, read, write, NO_CONVERT
from nbstripout import strip_output
from nbstripout._nbstripout import _parse_size


def process_args():
    parser = ArgumentParser()
    parser.add_argument(
        "--check",
        action="store_true",
        help="Run check without modifying any files. Returns 1 if files would have been modified.",
    )
    parser.add_argument("--keep-count", action="store_true", help="Do not strip the execution count/prompt number")
    parser.add_argument("--keep-output", action="store_true", help="Do not strip output", default=None)
    parser.add_argument(
        "--extra-keys",
        default="",
        help="Space separated list of extra keys to strip " "from metadata, e.g. metadata.foo cell.metadata.bar",
    )
    parser.add_argument(
        "--drop-empty-cells",
        action="store_true",
        help="Remove cells where `source` is empty or contains only whitepace",
    )
    parser.add_argument(
        "--drop-tagged-cells", default="", help="Space separated list of cell-tags that remove an entire cell"
    )
    parser.add_argument(
        "--strip-init-cells", action="store_true", help="Remove cells with `init_cell: true` metadata (default: False)"
    )
    parser.add_argument("--max-size", metavar="SIZE", help="Keep outputs smaller than SIZE", default="0")
    parser.add_argument("paths", nargs="*", help="Files or folders to check with nbstripout")
    return parser.parse_args()


ALLOWED_EXTENSIONS = [".ipynb"]

extra_keys = [
    "metadata.signature",
    "metadata.widgets",
    "cell.metadata.collapsed",
    "cell.metadata.ExecuteTime",
    "cell.metadata.execution",
    "cell.metadata.heading_collapsed",
    "cell.metadata.hidden",
    "cell.metadata.scrolled",
]


def read_notebook_file(p: Path):
    with io.open(p, "r", encoding="utf8") as f:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=UserWarning)
            return read(f, as_version=NO_CONVERT)


def write_notebook_file(nb: NotebookNode, p: Path):
    with io.open(p, "w", encoding="utf8", newline="") as f:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=UserWarning)
            write(nb, f)


def process_file(p: Path, is_check: bool, stripout_args: List[Any]):
    if p.suffix not in ALLOWED_EXTENSIONS:
        return []

    nb_input = read_notebook_file(p)

    nb_input_copy = deepcopy(nb_input)  # stip_output modifies input, so save a copy for comparison later
    nb_output = strip_output(nb_input, *stripout_args)

    if not is_check:
        write_notebook_file(nb_output, p)

    if not (nb_input_copy == nb_output):
        print(f"File: {p} was stripped by nbstripout.")
        return [p]
    return []


def process_folder(p: Path, is_check: bool, stripout_args: List[Any]):
    if p.is_dir():
        return [f for e in p.iterdir() for f in process_folder(e, is_check, stripout_args)]
    return process_file(p, is_check, stripout_args)


def main():
    args = process_args()
    is_check: bool = args.check
    if is_check:
        print("Checking Notebooks for outputs using nbstripout:")
    else:
        print("Cleaning Notebooks outputs using nbstripout:")

    stripout_args = [
        args.keep_output,
        args.keep_count,
        extra_keys,
        args.drop_empty_cells,
        args.drop_tagged_cells.split(),
        args.strip_init_cells,
        _parse_size(args.max_size),
    ]

    modified_files = []

    for p_str in args.paths:
        p = Path(p_str)
        if not p.exists():
            raise FileNotFoundError(p)
        modified_files.append(process_folder(p, is_check, stripout_args))

    modified_files = [f for arr in modified_files for f in arr]

    if is_check:
        if modified_files:
            print(f"{len(modified_files)} files would have been modified by nbstripout.")
            exit(1)
        else:
            print("Success: all Notebooks are already clean.")
            exit(0)

    print(f"{len(modified_files)} files were modified by nbstripout.")
    exit(0)


if __name__ == "__main__":
    main()

kynan · 2022-11-13T16:36:05Z

@arobrien Thanks for that contribution! I'm certainly open to extending / refactoring nbstripout to support that use case. I'll have to rely on contributors to do this work though as it's not a use case I have so I'm not really in a position to provide the requirements. It seems to me that you're motivated enough to get this implemented, which is great :)

More than happy to discuss how you'd like to approach this before you start sending PRs.

kynan · 2022-11-13T16:39:38Z

FWIW, I was reluctant to implement any functionality to recursively walk a directory tree or anything like that as you can simply use e.g. find to achieve the same (see #127 for additional thoughts), but I might be convinced otherwise to e.g. support Windows users.

arobrien · 2022-11-14T01:49:28Z

@kynan yes, I've got a little bit of capacity at the moment to help out.

As for approach, I think maintining current functionality is most important, but if we can make nbstripout work the same as black, at least as far as the Black The Basics help page then that would be great. Your licence seems to be pretty much exactly the same as the MIT licence used by black, so copying code shouldn't be a problem (I'm curious why you don't just use the standard MIT licence). Key features of black mentioned in that document:

Black accepts source_file_or_directory on the command line, as well as - to indicate stdin, with options for include/exclude rules when walking a directory
Black returns exit codes if run with --check
Black can output diffs
Black supports configuration via pyproject.toml

I can see that in the strictest sense none of these features are necessary if you combine them with tools like find, or more complex command line scripts, but I would suggest that Black serves as a model of how this sort of destructive linting tool can be easily worked into both IDEs and Continuous Integration pipelines. Black also serves as a model to how to decide design questions such as option lookup hierarchy.

I see a potential roadmap for achieving this as being:

Improve testing for current use cases
Re-organise the code to separate out main areas of concern:
- Entry point/command line parsing
- File list iteration/processing
- Modification of the file/JSON structure
- Git integration/installation/status
Implement features one at a time, in small steps:
1. Complete configuration file processing started in Support config files #169
2. Implement directory walk the same
  way as in black (process arguments into a file list, and then iterate over that list) https://github.com/psf/black/blob/d97b7898b34b67eb3c6839998920e17ac8c77908/src/black/__init__.py#L614
3. Implement --check
4. etc...

kynan · 2022-11-14T18:09:03Z

I've got a little bit of capacity at the moment to help out.

Great to hear!

I think maintining current functionality is most important

Fully agree!

If we can make nbstripout work the same as black, [...] then that would be great

Also agree :)

copying code shouldn't be a problem

I recommend against this, regardless of the license.

I'm curious why you don't just use the standard MIT licence

It is exactly the MIT License :)

kynan · 2022-11-14T20:17:46Z

@arobrien your roadmap looks reasonable, thanks!

12rambau · 2023-01-15T15:50:25Z

I see that folks have clearly taken over this matter and thanks for all progresses you made so far. I just want to make a small update on my initial use case.

The problem was the following: How to perform checks equivalent to the one done in pre-commits in the CI so that users that forgot to install it get a failed test. The solution was already existing in GitHub actions:
https://github.com/pre-commit/action

I now use it in all my repositories and I get the test I was expecting for nbstripout: https://github.com/12rambau/sepal_ui/blob/452bc1df1277308c789bc324251464928e7e49fa/.github/workflows/unit.yml#L13

I leave it up to you to continue working on it or close it.

kynan · 2023-01-15T22:31:31Z

Thanks for the update @12rambau! It's not entirely clear to me where nbstripout is invoked in the example you linked to?

12rambau · 2023-01-15T22:34:59Z

the github action process is calling the .pre-commit-config.yaml file of the repository. I just wanted to show how easy it is to reproduce the same behavior as the local pre-commit checks.

nbstripout is called in my config file here: https://github.com/12rambau/sepal_ui/blob/main/.pre-commit-config.yaml

kynan · 2023-01-16T20:49:33Z

Gotcha, that's the connection I had missed! Do I understand correctly that you're happy with this workflow and don't require any (further) changes in nbstripout to support your use case?

12rambau · 2023-01-16T21:27:08Z

Yes, I'm more than happy with this workflow and yes from my side everything I need is already covered by the current implementation of nbstripout.

kynan · 2024-11-16T20:00:46Z

Another update: as of release 0.8.0, there is a --verify flag (thanks to @jspaezp, see #195), an enhanced dry run mode which exits with return code 1 if nbstripout would have stripped anything and 0 otherwise.

Also, as of 6d181dc, --dry-run only prints output if the notebook would have actually been modified.

@12rambau @arobrien @joaonc Does this satisfy your use cases?

12rambau · 2024-11-24T08:53:10Z

it absolutely does ! thanks for working on it.

kynan · 2025-01-18T16:57:05Z

I'll close this for now. Please reopen if there's any outstanding issues or anything that needs fixing / can be improved.

kynan added help wanted state:needs follow up This issue has interesting suggestions / ideas worth following up on state:waiting Waiting for response for reporter type:enhancement labels Jan 2, 2022

kynan added this to the Backlog milestone Nov 13, 2022

ntjohnson1 mentioned this issue Sep 30, 2023

Add nbstripout to dev dependencies and commit checks sandialabs/pyttb#208

Closed

kynan added resolution:fixed and removed help wanted state:waiting Waiting for response for reporter state:needs follow up This issue has interesting suggestions / ideas worth following up on labels Jan 18, 2025

kynan modified the milestones: Backlog, 0.8.0 Jan 18, 2025

kynan self-assigned this Jan 18, 2025

kynan closed this as completed Jan 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use nbstripout in Github action as a test ? #158

Can I use nbstripout in Github action as a test ? #158

12rambau commented Oct 27, 2021

joaonc commented Oct 30, 2021

kynan commented Jan 2, 2022

12rambau commented Jan 5, 2022

kynan commented Sep 24, 2022

12rambau commented Sep 24, 2022

kynan commented Sep 24, 2022

arobrien commented Nov 13, 2022

kynan commented Nov 13, 2022

kynan commented Nov 13, 2022

arobrien commented Nov 14, 2022

kynan commented Nov 14, 2022

kynan commented Nov 14, 2022

12rambau commented Jan 15, 2023

kynan commented Jan 15, 2023

12rambau commented Jan 15, 2023

kynan commented Jan 16, 2023

12rambau commented Jan 16, 2023

kynan commented Nov 16, 2024

12rambau commented Nov 24, 2024

kynan commented Jan 18, 2025

Can I use nbstripout in Github action as a test ? #158

Can I use nbstripout in Github action as a test ? #158

Comments

12rambau commented Oct 27, 2021

joaonc commented Oct 30, 2021

kynan commented Jan 2, 2022

12rambau commented Jan 5, 2022

kynan commented Sep 24, 2022

12rambau commented Sep 24, 2022

kynan commented Sep 24, 2022

arobrien commented Nov 13, 2022

kynan commented Nov 13, 2022

kynan commented Nov 13, 2022

arobrien commented Nov 14, 2022

kynan commented Nov 14, 2022

kynan commented Nov 14, 2022

12rambau commented Jan 15, 2023

kynan commented Jan 15, 2023

12rambau commented Jan 15, 2023

kynan commented Jan 16, 2023

12rambau commented Jan 16, 2023

kynan commented Nov 16, 2024

12rambau commented Nov 24, 2024

kynan commented Jan 18, 2025