Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equivalent of Series.map for DataFrame #12550

Closed
BrenBarn opened this issue Mar 7, 2016 · 5 comments · Fixed by #32068 or #32370
Closed

Equivalent of Series.map for DataFrame #12550

BrenBarn opened this issue Mar 7, 2016 · 5 comments · Fixed by #32068 or #32370
Labels
API Design Docs good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@BrenBarn
Copy link

BrenBarn commented Mar 7, 2016

I think there should be a way to take a DataFrame and use the values in its columns as indices into a MultiIndexed Series. Here is an example:

>>> d = pandas.DataFrame({"Let": ["A", "B", "C"], "Num": [1, 2, 3]})
>>> d
  Let  Num
0   A    1
1   B    2
2   C    3
>>> ser = pandas.Series(
...     ['a', 'b', 'c', 'd', 'e', 'f'],
...     index=pandas.MultiIndex.from_arrays([["A", "B", "C"]*2, [1, 2, 3, 4, 5, 6]])
... )
>>> ser
A  1    a
B  2    b
C  3    c
A  4    d
B  5    e
C  6    f
dtype: object

With this data, you should be able to do d.map(ser) (or whatever method name instead of map) and get the same result as this:

>>> ser.ix[d.apply(tuple, axis=1)]
A  1    a
B  2    b
C  3    c
dtype: object

You currently cannot do this without converting the rows to tuples (ser[d] gives ValueError: Cannot index with multidimensional key). Converting to tuple is an awkward way to do a very natural task, which is using a tabular array of data to look up values in a tabular index (i.e., a MultiIndex).

I'm creating an issue to resurrect this request from much earlier discussion here and here.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

map is just a fancy way of doing a merge

In [24]: d
Out[24]: 
  Let  Num
0   A    1
1   B    2
2   C    3

In [25]: ser
Out[25]: 
A  1    a
B  2    b
C  3    c
A  4    d
B  5    e
C  6    f
dtype: object

In [26]: ser.index.names=['Let','Num']

In [27]: pd.merge(d, ser.reset_index(), on=['Let','Num'])
Out[27]: 
  Let  Num  0
0   A    1  a
1   B    2  b
2   C    3  c

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Mar 7, 2016
@jreback
Copy link
Contributor

jreback commented Mar 7, 2016

I suppose this could be shown in a doc example and the merge docs (and maybe cross-ref .map with .merge.

@jreback jreback added this to the 0.18.1 milestone Mar 7, 2016
@jreback
Copy link
Contributor

jreback commented Apr 1, 2016

@BrenBarn do you want to do a doc-example for this?

@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 13, 2016
@JBuitelaar
Copy link

JBuitelaar commented Jan 30, 2017

I find myself doing something like this all the time, so I wrote my own function for it.

In terms of functionality, it's like a (vectorized and more generic) VLOOKUP in Excel. The implementation is just a wrapper around pd.DataFrame.merge (or join for some cases)

df.vlookup(other, drop_missing_rows=False, drop_lookup_cols=False ).

Similar to pd.DataFrame.merge and pd.Series.map, but:

  • You may (or may not) want to keep the columns you're joining on (Series.map drops them, df.merge keeps them)
  • always join on other.index.names, but you may want to use the index of df (like join) or the columns (like merge) or a combination.
  • just like map, it raises when other.index rows are not unique
  • you may want to drop the rows that don't have an entry in other (inner join) or insert NAs (left join).

Happy to submit if it helps

@pdpark
Copy link

pdpark commented Jan 8, 2018

I'll write up an example of this in the merging docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment