-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Reimplement and undeprecate DataFrame.lookup #39171
Comments
this is not a friendly way to ask for things.
|
@impredicative you are welcome to contribute patches, e.g. doc fixes or other things. |
@impredicative out of curiosity, what is exactly difficult about the new proposed method, which is also significantly faster since I agree that we can fix the documentation in the deprecated message and in the whatsnew though. @jreback Would you be open to un-deprecate |
@erfannariman What's difficult, for example, is how complicated this answer using |
I don't disagree with your point, although I don't think the proposed method is persé difficult, but I might be biased. I would not be against un-deprecating |
This comment has been minimized.
This comment has been minimized.
@impredicative there is almost no usage of lookup AFAICT. very very few issues / tests / SO entries. If you have a valid, real work very common then am happy to show a doc example / recipe. Having a method must be a high bar. |
@jreback A google.com search for |
@impredicative you are not comparing apples to apples here. try with indexing operators, we already have I in fact have an issue to remove |
@jreback I don't use In general, I would ask the question: "Is the alternative more simple or more difficult to use?" If a method simplifies code, making it less verbose and prevents users from reimplementing common patterns, then it's valuable, even if you personally think it's a duplicate, proportional to how well it accomplishes this goal. |
this is balanced against an already huge api. There are alternatives from this. If you have use case where this is especially awkward / burdensome I would encourage you to open an issue with a reproducible example and show why a new method makes sense. |
I learned about the deprecation of |
Sure I would be happy to make a PR with a new proposed more efficient method and there we can discuss into more details and the core devs can share their thoughts as well. What would be your suggestion, the answer you gave on SO? Could you maybe share a reproducible example in a new ticket and mention this issue? @quanghm |
I ran a quick test and the results said the reverse:
The proposed method always comes last by a big margin in terms of speed. This only gets worse if the column names get longer. |
Using lambdas is a non-starter. It shouldn't even be suggested unless it is the last option on earth.
|
Regarding the lambda function, that was originally an attempt to chain the three lines to detect possible speed improvement. Answer was not. Using
Also, can you name some Pandas datatypes that wouldn't work with All those aside, the message here is that the proposed method is slower than current |
Thanks for the extensive example and speedtests @quanghm . I did a quick check and seems like Lines 3848 to 3861 in b5958ee
For your approach, I agree that it's more efficient both in speed and memory allocation, but I remember writing multiple methods to replace So I think it all boils down to @jreback and the argument that |
@erfannariman Thanks. The code for However, in the case that On another note, this is certainly new for me
can you provide some details? |
I confirmed that
|
I think it is premature to deprecate I do simple things. Here is a simple example: >>> index = pd.Series(['2020-01-04', '2020-01-03'], index=['A', 'B'])
>>> df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6]}, index=['2020-01-03', '2020-01-04', '2020-01-05'])
>>> index
A 2020-01-04
B 2020-01-03
dtype: object
>>> df
A B
2020-01-03 1 4
2020-01-04 2 5
2020-01-05 3 6
>>> values = df.lookup(index.values, index.index)
>>> values
array([2, 4], dtype=int64)
>>> pd.Series(values, index=index.index)
A 2
B 4
dtype: int64 What should I do to use The code example from SO is tricky.
Ask yourself what intention it expresses. This is some kind of esoteric gibberish without reference to the subject area. The code is too low-level and verbose compared to the simple and concise |
@jreback I think it's you that not comparing apples to apples here: none of those APIs you listed offer The example that @espdev listed in his comment above is a more common case for the use of |
First off, I can confirm I can reproduce the performance difference from #39171 (comment), even without the lambda function (this is all on the v1.1.4 tag):
Second, is there a case when the solution in #39171 (comment) would not work well enough? If not, should the user guide just be updated to use that and we can move on from this issue? Third - @impredicative please consider your tone |
My tone is justifiable. I have had enough of Pandas being unprofessionally developed. Panda works only for simple manipulations of small dataframes, and scales extremely poorly or not at all. I have spent weeks and weeks trying to work around its various issues, and the best thing for me to do right now is to move to a real package like Dask, PySpark, etc. I'm sure that many other users here feel similarly. Fourth - @MarcoGorelli please consider how Pandas is actually developed. |
If anyone wants to take this forward with a respectful tone (e.g. constructive comments like #39171 (comment) ), then feel free to open a new issue - closing and locking as this is not the way to have a productive discussion and several comments have been off-topic |
thanks @MarcoGorelli |
Is your feature request related to a problem?
I seriously get the impression that Pandas is actively being sabotaged. As a case in point,
DataFrame.lookup
was deprecated in v1.2. The problems with this are:DataFrame.lookup
says to see itself for an example. Well, there is no example in the doc page. The only example I'm aware of is here instead which is a different page.lookup
. Why break it? If the current implementation oflookup
is suboptimal, shouldn't it be optimized instead?melt
whenlookup
works quite simply. For example, compare this simple answer usinglookup
with this complicated answer usingmelt
.Does nobody review changes, docs, and release notes anymore prior to the release? It looks this way.
Describe the solution you'd like
lookup
if attainable usingmelt
or otherwise.lookup
.The text was updated successfully, but these errors were encountered: