Skip to content

Fix 25648 improve read csv stacktrace on bad argument #33023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

roberthdevries
Copy link
Contributor

@roberthdevries roberthdevries commented Mar 25, 2020

@roberthdevries
Copy link
Contributor Author

roberthdevries commented Mar 25, 2020

It works, but it may not be pretty because it uses exec() which is apparently an unwanted pattern

@WillAyd
Copy link
Member

WillAyd commented Mar 26, 2020

Yea this is pretty hacky :-X is this really the only option?

@WillAyd WillAyd added the Error Reporting Incorrect or improved errors from pandas label Mar 26, 2020
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on this change unfortunately. exec is extremely hacky for something that is arguably minor.

While I don't have an immediate alternative off the top of my head, I think it's better to let this one percolate a little than to force a fix.

@gfyoung gfyoung added the IO CSV read_csv, to_csv label Mar 26, 2020
Co-Authored-By: gfyoung <gfyoung17+GitHub@gmail.com>
@roberthdevries
Copy link
Contributor Author

The options to fix this are limited by the fact that the exception raising mechanism uses a read-only variable of the code attribute of a function: <func>.code.co_name.
This variable is set by the python compiler/interpreter by taking the name of the function when it is defined: def <func>. Which means that we have to do def read_csv() and def read_table() to fix this issue.

I see 4 options in total:

  1. Use exec() to dynamically create the functions with the appropriate names
  2. Make two duplicates of parser_f with all of its arguments, factor out the body of the function
  3. Make a template function e.g. parser_f.py.in and generate read_csv.py and read_table.py at build time
  4. Don't fix this issue 😃

May I have your suggestions/votes please

@gfyoung
Copy link
Member

gfyoung commented Mar 28, 2020

  1. -1: too hacky as already discussed

  2. -1: maintenance burden > error correctness

  3. -0.5: less maintenance burden, but I still feel creating a template just for this is too much

  4. -0: It's worked us for awhile, but I don't want to give up on this just yet.

Middle ground option:

Rename parser_f to something like read_document ? Then the function name isn't as foreign sounding and at least aligns with what read_csv and read_table do.

Another middle ground option:

Create a wrapper around parser_f than takes in *args and **kwargs and catch TypeError to replace instances of the string parser_f with the function name.

The disadvantage is we lose the signature when we do help on these functions.

@jreback
Copy link
Contributor

jreback commented Mar 28, 2020

agree with comments here - we cannot use string execution like this

you an try setting

parser_f.name = ...

@gfyoung
Copy link
Member

gfyoung commented Mar 28, 2020

agree with comments here - we cannot use string execution like this

you an try setting

parser_f.name = ...

Unfortunately, the stacktrace uses a read-only attribute to provide the function name in that error.

@jbrockmendel
Copy link
Member

@roberthdevries can you rebase

@gfyoung
Copy link
Member

gfyoung commented Apr 4, 2020

@roberthdevries can you rebase

Before rebasing, I think it's only fair that we get some consensus on how to address the underlying problem, given that we largely agree that the proposed fix is not something we would like to merge.

@roberthdevries
Copy link
Contributor Author

I am currently ill, and not able to work on this

@gfyoung
Copy link
Member

gfyoung commented Apr 5, 2020

I am currently ill, and not able to work on this

@roberthdevries : Oh no! Very sorry to hear that. Hope you get better soon.

We can take over this PR once we get some consensus.

@jbrockmendel
Copy link
Member

is it possible compat.set_function_name may work where just setting func.__name__ doesnt?

@gfyoung
Copy link
Member

gfyoung commented Apr 16, 2020

is it possible compat.set_function_name may work where just setting func.__name__ doesnt?

@jbrockmendel : Unfortunately not. As mentioned in the OP, the function name in the error message is populated from a read-only attribute.

@jbrockmendel
Copy link
Member

OK, then im inclined towards the emerging consensus of "learn to live with it"

@gfyoung
Copy link
Member

gfyoung commented Apr 16, 2020

OK, then im inclined towards the emerging consensus of "learn to live with it"

Learn to live with it? We've been living with this stacktrace issue for years 😄

@simonjayhawkins
Copy link
Member

@roberthdevries Thanks for looking into this. closing for now as stale. ping if you want to continue.

@roberthdevries
Copy link
Contributor Author

Can you mark #25648 as wontfix?

@simonjayhawkins
Copy link
Member

added a needs discussion label and removed the contributions welcome for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERR: read_csv exposes an internal function when bad argument is specified
6 participants