-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I use nbstripout in Github action as a test ? #158
Comments
This question is probably better suited for StackOverflow. |
Interesting use case. Have you had a chance to explore this further? |
not yet, but that's definitely on my to-do list. |
@12rambau Are you still interested in this? Feel like working on it? |
I'm still interested but I was swamped with many things all year, I'll do some open source work starting this week and this repository is on my todo list |
Perfect! Keep me posted and let me know if you have questions, get stuck etc. :) |
I created a script to do this as a stop-gap measure until we can get this into nbstripout. I'm putting in a few Pull requests now to try and make that happen, starting with #176 to get tests working on windows, and then aiming to do a re-organisation to reduce code duplication and move functionality out of This allows me to call It's a bit quick-and-dirty, and it ends up re-implementing (copy-paste) a lot of nbstripout code, but key improvements were to:
""""
Re-packages nbstripout functionality to be more useful for CI/CD pipelines.
Supports most of the same options as nbstripout, but does not support installation as a git hook.
Based on nbstripout version 0.6.1
This is currently very fragile to changes in nbstripout internal structure, so it is recommended to fix the version in
requirements.txt
"""
from argparse import ArgumentParser
from copy import deepcopy
import io
from pathlib import Path
from typing import Any, List
import warnings
from nbformat import NotebookNode, read, write, NO_CONVERT
from nbstripout import strip_output
from nbstripout._nbstripout import _parse_size
def process_args():
parser = ArgumentParser()
parser.add_argument(
"--check",
action="store_true",
help="Run check without modifying any files. Returns 1 if files would have been modified.",
)
parser.add_argument("--keep-count", action="store_true", help="Do not strip the execution count/prompt number")
parser.add_argument("--keep-output", action="store_true", help="Do not strip output", default=None)
parser.add_argument(
"--extra-keys",
default="",
help="Space separated list of extra keys to strip " "from metadata, e.g. metadata.foo cell.metadata.bar",
)
parser.add_argument(
"--drop-empty-cells",
action="store_true",
help="Remove cells where `source` is empty or contains only whitepace",
)
parser.add_argument(
"--drop-tagged-cells", default="", help="Space separated list of cell-tags that remove an entire cell"
)
parser.add_argument(
"--strip-init-cells", action="store_true", help="Remove cells with `init_cell: true` metadata (default: False)"
)
parser.add_argument("--max-size", metavar="SIZE", help="Keep outputs smaller than SIZE", default="0")
parser.add_argument("paths", nargs="*", help="Files or folders to check with nbstripout")
return parser.parse_args()
ALLOWED_EXTENSIONS = [".ipynb"]
extra_keys = [
"metadata.signature",
"metadata.widgets",
"cell.metadata.collapsed",
"cell.metadata.ExecuteTime",
"cell.metadata.execution",
"cell.metadata.heading_collapsed",
"cell.metadata.hidden",
"cell.metadata.scrolled",
]
def read_notebook_file(p: Path):
with io.open(p, "r", encoding="utf8") as f:
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
return read(f, as_version=NO_CONVERT)
def write_notebook_file(nb: NotebookNode, p: Path):
with io.open(p, "w", encoding="utf8", newline="") as f:
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
write(nb, f)
def process_file(p: Path, is_check: bool, stripout_args: List[Any]):
if p.suffix not in ALLOWED_EXTENSIONS:
return []
nb_input = read_notebook_file(p)
nb_input_copy = deepcopy(nb_input) # stip_output modifies input, so save a copy for comparison later
nb_output = strip_output(nb_input, *stripout_args)
if not is_check:
write_notebook_file(nb_output, p)
if not (nb_input_copy == nb_output):
print(f"File: {p} was stripped by nbstripout.")
return [p]
return []
def process_folder(p: Path, is_check: bool, stripout_args: List[Any]):
if p.is_dir():
return [f for e in p.iterdir() for f in process_folder(e, is_check, stripout_args)]
return process_file(p, is_check, stripout_args)
def main():
args = process_args()
is_check: bool = args.check
if is_check:
print("Checking Notebooks for outputs using nbstripout:")
else:
print("Cleaning Notebooks outputs using nbstripout:")
stripout_args = [
args.keep_output,
args.keep_count,
extra_keys,
args.drop_empty_cells,
args.drop_tagged_cells.split(),
args.strip_init_cells,
_parse_size(args.max_size),
]
modified_files = []
for p_str in args.paths:
p = Path(p_str)
if not p.exists():
raise FileNotFoundError(p)
modified_files.append(process_folder(p, is_check, stripout_args))
modified_files = [f for arr in modified_files for f in arr]
if is_check:
if modified_files:
print(f"{len(modified_files)} files would have been modified by nbstripout.")
exit(1)
else:
print("Success: all Notebooks are already clean.")
exit(0)
print(f"{len(modified_files)} files were modified by nbstripout.")
exit(0)
if __name__ == "__main__":
main() |
@arobrien Thanks for that contribution! I'm certainly open to extending / refactoring More than happy to discuss how you'd like to approach this before you start sending PRs. |
FWIW, I was reluctant to implement any functionality to recursively walk a directory tree or anything like that as you can simply use e.g. |
@kynan yes, I've got a little bit of capacity at the moment to help out. As for approach, I think maintining current functionality is most important, but if we can make
I can see that in the strictest sense none of these features are necessary if you combine them with tools like I see a potential roadmap for achieving this as being:
|
Great to hear!
Fully agree!
Also agree :)
I recommend against this, regardless of the license.
It is exactly the MIT License :) |
@arobrien your roadmap looks reasonable, thanks! |
I see that folks have clearly taken over this matter and thanks for all progresses you made so far. I just want to make a small update on my initial use case. The problem was the following: How to perform checks equivalent to the one done in pre-commits in the CI so that users that forgot to install it get a failed test. The solution was already existing in GitHub actions: I now use it in all my repositories and I get the test I was expecting for nbstripout: https://github.com/12rambau/sepal_ui/blob/452bc1df1277308c789bc324251464928e7e49fa/.github/workflows/unit.yml#L13 I leave it up to you to continue working on it or close it. |
Thanks for the update @12rambau! It's not entirely clear to me where |
the github action process is calling the nbstripout is called in my config file here: https://github.com/12rambau/sepal_ui/blob/main/.pre-commit-config.yaml |
Gotcha, that's the connection I had missed! Do I understand correctly that you're happy with this workflow and don't require any (further) changes in nbstripout to support your use case? |
Yes, I'm more than happy with this workflow and yes from my side everything I need is already covered by the current implementation of nbstripout. |
Another update: as of release 0.8.0, there is a Also, as of 6d181dc, @12rambau @arobrien @joaonc Does this satisfy your use cases? |
I'm using the pre-commit hook of nbstripout and black to improve code quality of my python applications. As some of the maintainer are not installing pre-commit they are still pushing code with outputs.
Would it be possible to use nbstripout in Github action to check the presence of output ?
The text was updated successfully, but these errors were encountered: