Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support config files #169

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Support config files #169

wants to merge 17 commits into from

Conversation

janosh
Copy link
Contributor

@janosh janosh commented Oct 1, 2022

Closes #166.

The code in this PR was adapted from PyCQA/autoflake#79. Big thank you and credit to @akeeman.

Copy link
Owner

@kynan kynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I don't like multiple changes conflated in the same PR, so please submit all your unrelated changes as separate PRs as requested.

nbstripout/_nbstripout.py Outdated Show resolved Hide resolved
pre-commit Outdated Show resolved Hide resolved
.github/workflows/tests.yml Outdated Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
nbstripout/_nbstripout.py Outdated Show resolved Hide resolved

# Traverse the file tree common to all files given as argument looking for
# a configuration file
config_path = os.path.commonpath([os.path.abspath(file) for file in args.files]) if args.files else os.getcwd()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use pathlib wherever possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kynan Thanks for the quick review. I believe I addressed all your comments except this one. I'm not too familiar with the pathlib module. Just did a quick search and couldn't find sth equivalent to os.path.commonpath(). Could you be more specific here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately there is no equivalent to commonpath in pathlib. I won't insist, so you can leave as-is :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a whole interesting thread on S/O on commonpath -
https://stackoverflow.com/a/43613742

Though not nessacarily helpful - they do end up still using commonpath, and just wrapping it with Path.

nbstripout/_nbstripout.py Outdated Show resolved Hide resolved
nbstripout/_utils.py Outdated Show resolved Hide resolved
CHANGELOG.rst Outdated Show resolved Hide resolved
@kynan
Copy link
Owner

kynan commented Oct 2, 2022

Please ensure your changes are compatible with Python 3.6 onwards.

@janosh janosh force-pushed the support-config-files branch from 09716a9 to e5cd304 Compare October 2, 2022 14:17
Copy link
Owner

@kynan kynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is actually much better than what is implemented now. The approach is:

  1. Read config files and turn their defined keys and values into "pseudo" command line arguments and parse those with argparse
  2. Only then parse the actual command line arguments.

This way the order of precedence is correct and I think the implementation is actually much more elegant and readable!


# Traverse the file tree common to all files given as argument looking for
# a configuration file
config_path = os.path.commonpath([os.path.abspath(file) for file in args.files]) if args.files else os.getcwd()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately there is no equivalent to commonpath in pathlib. I won't insist, so you can leave as-is :)

if not tail:
break

if config:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't picked up on this in my first round review: The way you implemented this, config file settings take precedence over command line arguments, however it should be the other way around.

I realise this will be a bit harder to implement. You could turn the Namespace returned by argparse into a dict and merge that into the dict you get from reading the config files, where the argparse dict overwrites.

What makes this even more annoying is that you don't know if a value has been set by the caller or is simply the default :( I may need to look around for an idiomatic way to implement this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay. Have to prioritize something else atm. Hope to come back to this at some point but if anyone sees fit, feel free to take over.

@kynan
Copy link
Owner

kynan commented Nov 14, 2022

@janosh are you still planning to pick this back up at some point?

@kynan kynan added type:enhancement state:waiting Waiting for response for reporter state:needs follow up This issue has interesting suggestions / ideas worth following up on labels Nov 14, 2022
@kynan kynan added this to the Backlog milestone Nov 14, 2022
@janosh
Copy link
Contributor Author

janosh commented Nov 14, 2022

@kynan Still pre-occupied. If @arobrien wants to take over, that would be great! 👍

@arobrien
Copy link
Contributor

arobrien commented Dec 8, 2022

@janosh and @kynan I've done some work to address the precedence and make other minor improvements. I wasn't sure how to get those edits into the PR, but I've created a PR to update this PR. Please let me know if there is a better way to address this.
janosh#1

@janosh
Copy link
Contributor Author

janosh commented Dec 8, 2022

@arobrien I think the usual approach is just to make a new PR to upstream, i.e. to https://github.com/kynan/nbstripout, not my fork. If you continued from my branch, the earlier commits will still be associated with me. I'm not worried about credit though. More important to get the feature. 😄

Either way, happy to merge your PR to my fork.

Copy link
Owner

@kynan kynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is definitely going in the right direction :) Just a few small bits to fix.

@@ -351,7 +351,7 @@ def status(git_config, install_location=INSTALL_LOCATION_LOCAL, verbose=False):
return 1


def main():
def setup_commandline() -> Namespace:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer returning a Namespace but a parser, right?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be called get_parser or sth like that.

import sys
from collections import defaultdict
from functools import partial
from typing import Any, Dict, Optional

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep imports sorted

Comment on lines +177 to +178
for a in parser._actions:
if a.dest in dict_config and isinstance(a, (_StoreTrueAction, _StoreFalseAction)):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of reaching into the internals of ArgumentParser but also don't have a great idea what to do instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought that this would still be more robust than keeping an explicit list of Boolean arguments, although that wouldn't be impossible. I looked at a range of options, including shifting to an alternative arguments library, like Click, but that all got too complicated.
I think that this is a reasonable compromise if we keep on top of adding new Python versions to the test suite whenever they become available.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agree

Comment on lines +198 to +199
for a in parser._actions:
if a.dest in raw_config and isinstance(a, (_StoreTrueAction, _StoreFalseAction)):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

# when it finds the file, or a .git directory, or a .hg directory, or the root of the file system, whichever
# comes first.
# if no files are given, start from cwd
config_path = os.path.commonpath([os.path.abspath(file) for file in args.files]) if args.files else os.path.abspath(os.getcwd())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line gets a bit long with a ternary

nbstripout/_utils.py Outdated Show resolved Hide resolved
if name not in valid_args:
raise ValueError(f'{name} in the config file is not a valid option')

# separate into default-overrides and special treatment
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a brief comment why extra_keys needs special treatment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure that it does. I left this with the same functionality that @janosh did.
This special treatment lets you specify a base set of extra keys in the config files, and then add to that in an ad-hoc manner, but does not let you remove extra keys....
Alternative treatment would be that whenever you specify extra keys you would have to list the complete set.
The more I think about it, the more I think that perhaps special treatment is NOT justified.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's a fair point and treating it specially may lead to surprising results.

@janosh can you explain your rationale? Do you agree that special treatment of this flag may be surprising?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember for sure at this point, but I think it might be left over from the original autoflake8 config file PR. I agree minimal user surprise is usually the way to go.

tests/test_read_config_files.py Outdated Show resolved Hide resolved
@kynan kynan modified the milestones: Backlog, 0.7.0 Dec 10, 2022
janosh and others added 6 commits December 26, 2022 15:07
Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
* Add command line argument keep-id, which maintiains randomly generated cell ids. Otherwise cell ids are assigned incrementally (after the removal of cells), which should keep them consistent across runs in version control

* Modify test_cell and test_exception in test_keep_output_tags.py to use the new strip_output signature

* Fix failed test_end_to_end_nbstripout with test_max_size by passing --keep-id for keeping the existing ids

* Add tests for notebooks with and without the --keep-id flag. A new extension expected_id was added for expected output with ordered ids

* Modify the readme to include the --include-id flag

* Add keyword arguments for None inputs in test_keep_output_tags.py

* Rename expected output files to make desired sequential ids more explicit

Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
* Clarify that extra keys may _only_ specify notebook and cell metadata to be stripped.
* Update metadata stripped by default.

Addresses kynan#187
@@ -120,7 +120,7 @@
import sys
import warnings

from nbstripout._utils import strip_output, strip_zeppelin_output
from nbstripout._utils import strip_output, strip_zeppelin_output, merge_configuration_file
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from nbstripout._utils import strip_output, strip_zeppelin_output, merge_configuration_file
from nbstripout._utils import strip_output, strip_zeppelin_output, merge_configuration_files


def main():
parser = setup_commandline()
args = merge_configuration_file(parser)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args = merge_configuration_file(parser)
args = merge_configuration_files(parser)

@kynan
Copy link
Owner

kynan commented Dec 7, 2023

@janosh Are you still planning to address the outstanding review comments? :)

@janosh
Copy link
Contributor Author

janosh commented Dec 7, 2023

Sorry, don't have time. Feel free to close.

@kynan
Copy link
Owner

kynan commented Dec 7, 2023

Thanks for the update! I'll leave this pending for now. Maybe I'll find time to pick this up myself over the holidays.

@l-johnston
Copy link

Will this feature be available in pyproject.toml?

In my use case, I need to filter notebooks that have a specific cell tag while leaving all others unfiltered. Will this feature support this use case?

@kynan
Copy link
Owner

kynan commented Jan 26, 2024

Will this feature be available in pyproject.toml?

I think this PR was supposed to add pyproject.toml support for configuring nbstripout, yes.

I need to filter notebooks that have a specific cell tag while leaving all others unfiltered. Will this feature support this use case?

I don't think your feature request is related to this PR. Please file a new issue.

@l-johnston would you be interested in working on this PR?

@kynan kynan modified the milestones: 0.7.0, Backlog Jan 27, 2024
@kynan kynan added help wanted and removed state:waiting Waiting for response for reporter labels Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted state:needs follow up This issue has interesting suggestions / ideas worth following up on type:enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read config from setup.cfg
6 participants