- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
API: Make most arguments for read_html and read_json keyword-ony #27573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
22fd9dc    to
    5318a76      
    Compare
  
    | We need to deprecate the old behavior first. As written, this is an api breaking change. And we shouldn’t do this until 1.0 or later. | 
| 
 and how exactly would you deprecate ? | 
| Should be able to use a decorator to inspect the calm, right? And if any positional or keyword args are passed in args then we warn?…  On Jul 25, 2019, at 07:44, Jeff Reback ***@***.***> wrote:
 We need to deprecate the old behavior first. As written, this is an api breaking change.
 And we shouldn’t do this until 1.0 or later.
 and how exactly would you deprecate ?
 —
 You are receiving this because you commented.
 Reply to this email directly, view it on GitHub, or mute the thread. | 
| Thank you. I also wanted to propose somewhat like this. Or do such a decorator already exist? | 
| Hello @alexitkes! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-03-31 18:09:11 UTC | 
b00ec4a    to
    99e0e3c      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests to read_html / read_json to ensure that warnings are raised there?
d6f29f4    to
    cd5d658      
    Compare
  
    | Lifting #27573 (comment) to the top level 
 What do you think about this @alexitkes? | 
| @TomAugspurger Thank you for review. Yes, writing argument list twice will be quite annoying, but I will need some time to find a way to avoid it, and I would like to try. However, I think it would be better to preserve the  | 
| 
 Why do you think we'll need that? That sounds more complicated than just detecting kwargs passed positionally. If we aren't going to need it then I'd rather not implement it. | 
| @TomAugspurger I have just written such a decorator and it does not look too complicated. However it is my first experience with  It will not be difficult to remove  | 
| I'm not sure without looking more closely. My thought is that all keyword
arguments being keyword-only is easier to explain to users.… On Tue, Jul 30, 2019 at 3:10 AM Alex Itkes ***@***.***> wrote:
 @TomAugspurger <https://github.com/TomAugspurger> I have just written
 such a decorator and it does not look too complicated. However it is my
 first experience with inspect module so possibly more tests are needed
 though some tests are written and they all run successfully.
 It will not be difficult to remove allowed_args argument and always set
 it to list of required arguments, if you think eventually all arguments
 having the default values should be keyword-only arguments. Should I do it?
 —
 You are receiving this because you were mentioned.
 Reply to this email directly, view it on GitHub
 <#27573?email_source=notifications&email_token=AAKAOIWDW7YBWLIQNA6SISTQB7ZQNA5CNFSM4IGSQB62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3DFE6A#issuecomment-516313720>,
 or mute the thread
 <https://github.com/notifications/unsubscribe-auth/AAKAOIQDBCCUJD5VIZIZ5VLQB7ZQNANCNFSM4IGSQB6Q>
 .
 | 
| 
 I have almost forgotten one thing - the  | 
| 
 Oh sorry. In that case we'll definitely need to have the ability to have some keyword-or-positional arguments. | 
| Well, it seems to work fine though I am a bit worried because it's my first experience with  | 
| """ | ||
| df = pd.DataFrame({"A": [2, 4, 6], "B": [3, 6, 9]}, index=[0, 1, 2]) | ||
| with tm.assert_produces_warning(FutureWarning): | ||
| assert_frame_equal(df, read_json(df.to_json(orient="split"), "split")) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you minimize the amount of code in the context manager? I think just read_json(buf, "split").
Also, do we need to test all three of these? What differs between them?
        
          
                pandas/util/_decorators.py
              
                Outdated
          
        
      | def wrapper(*args, **kwargs): | ||
| if isinstance(allow_args, int) and len(args) > allow_args: | ||
| msg = ( | ||
| "After version %s all arguments of %s " | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version -> pandas version.
Use {}-style string templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done it. I also added a separate function to form a better looking warning message depending on the number of arguments, but so far I only tested it manually. Is there a proper way to test how exactly does the produced warning look?
2982165    to
    f737870      
    Compare
  
    | @WillAyd @jreback @jorisvandenbossche do you have thoughts on this (the overall goal, the implementation seems fine). 
 | 
cd84833    to
    c8284ef      
    Compare
  
            
          
                pandas/tests/io/test_html.py
              
                Outdated
          
        
      | def test_index(self): | ||
| df1 = self.read_html(self.spam_data, ".*Water.*", index_col=0) | ||
| df2 = self.read_html(self.spam_data, "Unit", index_col=0) | ||
| with tm.assert_produces_warning(FutureWarning): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment for non-duplicated tests; would rather not use the deprecated behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, such amount of duplicated tests looks quite ugly, so I removed all tests with deprecated behavior except for one. Would it be enough?
46e9ceb    to
    70359f6      
    Compare
  
    Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>
01ec83a    to
    ed63145      
    Compare
  
    | @WillAyd OK, done it. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be OK with moving forward here - @jreback or @TomAugspurger ?
| Do we have a general policy for which arguments are keyword only? | 
| lgtm. on anything with lots of args (pretty much read_*), happy to only have the first arg positional (path_or_buf). | 
| looks good, @TomAugspurger ok here? | 
| Yeah…  On Apr 6, 2020, at 18:25, Jeff Reback ***@***.***> wrote:
 
 looks good, @TomAugspurger ok here?
 —
 You are receiving this because you were mentioned.
 Reply to this email directly, view it on GitHub, or unsubscribe. | 
| thanks! | 
| Thanks @alexitkes ! | 
As mentioned in #27544 it is better to use keyword-only arguments for functions with too many arguments. A deprecation warning will be displayed if
read_htmlget more than 2 positional arguments (ioandmatch) orread_jsonget more than 1 positional argument (path_or_buf).Three groups of tests are actually needed
deprecate_nonkeyword_argumentsdecoratorrread_jsonfunction emits aFutureWarningwhenever necessary and does not emit it whenever not.read_htmlfunction emits aFutureWarningwhenever necessary and does not emit it whenever not.black pandasgit diff upstream/master -u -- "*.py" | flake8 --diff