Skip to content

API: Add equals method to NDFrames. #5283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2014

Conversation

unutbu
Copy link
Contributor

@unutbu unutbu commented Oct 20, 2013

Also adds array_equivalent, which
is similar to np.array_equal except that it handles object arrays and
treats NaNs in corresponding locations as equal.

closes #5183

@jreback
Copy link
Contributor

jreback commented Oct 20, 2013

pls run a perf check on this (test_perf.sh)

these comparisons are used everywhere

do u need the shape check?
the null check might kill perf on this
why are u not doing == and != ?

@jtratner
Copy link
Contributor

@jreback - seems like it doesn't work for this example, but we could be missing something

left = pd.Float64Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, nan], dtype='object')
right = pd.Float64Index([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, nan], dtype='object')

# OR 
left = np.array([1.0, 2.0, nan], dtype=object)
right= np.array([1.0, 2.0, nan], dtype=object)

(fully enumerated here - https://gist.github.com/unutbu/7070565)

@jtratner
Copy link
Contributor

to be explicit:

left = np.array([1.0, 2.0, nan], dtype=object)
    ...: right= np.array([1.0, 2.0, nan], dtype=object)
    ...: 

left != right
Out[16]: array([False, False, False], dtype=bool)

left != left
Out[17]: array([False, False, False], dtype=bool)

right != right
Out[18]: array([False, False, False], dtype=bool)

nan != nan
Out[19]: True

@jtratner
Copy link
Contributor

Though I guess they compare true with == so not a real issue - we were going back and forth on another PR b/c sometimes nan handling can be confusing :P

@jreback
Copy link
Contributor

jreback commented Oct 20, 2013

you have to astype to float! before you can do the comparison (not sure exactly why) only works if they r all float values (so you need to do it in a try except)

@unutbu
Copy link
Contributor Author

unutbu commented Oct 20, 2013

@jreback: I'm working on installing vbench and figuring out how to run test_perf.sh...

@unutbu
Copy link
Contributor Author

unutbu commented Oct 20, 2013

@jreback: When I run

time ./test_perf.sh -b array-equivalent -t array-equivalent^ 

I get

sqlalchemy.exc.IntegrityError: (IntegrityError) column checksum is not unique u'INSERT INTO benchmarks (checksum, name, description) VALUES (?, ?, ?)' ('ea1993ef61c3cc4e871d2cce3c5d983c', 'eval_frame_chained_cmp_python', None)

I see I can limit test_perf.sh to one test, such as

time ./test_perf.sh -b array-equivalent -t array-equivalent^ -r reindex

which yielded

    Invoked with :
    --ncalls: 3
    --repeats: 3


    -------------------------------------------------------------------------------
    Test name                                    | head[ms] | base[ms] |  ratio   |
    -------------------------------------------------------------------------------
    reindex_frame_level_align                    |   2.6046 |  10.1856 |   0.2557 |
    dataframe_reindex                            |   0.4900 |   0.6377 |   0.7684 |
    frame_reindex_axis0                          | 110.6919 | 126.7160 |   0.8735 |
    frame_reindex_columns                        |   0.4164 |   0.4683 |   0.8890 |
    frame_reindex_both_axes_ix                   |  43.5000 |  46.9437 |   0.9266 |
    reindex_frame_level_reindex                  |   2.3306 |   2.3570 |   0.9888 |
    frame_reindex_upcast                         |  16.1486 |  16.2884 |   0.9914 |
    reindex_fillna_pad_float32                   |   0.5860 |   0.5894 |   0.9942 |
    reindex_fillna_backfill_float32              |   0.5997 |   0.6014 |   0.9972 |
    frame_reindex_both_axes                      |  46.7057 |  46.7397 |   0.9993 |
    reindex_daterange_pad                        |   2.9510 |   2.9523 |   0.9995 |
    reindex_fillna_backfill                      |   1.0234 |   1.0213 |   1.0020 |
    reindex_fillna_pad                           |   0.8663 |   0.8514 |   1.0175 |
    reindex_multiindex                           |   1.5457 |   1.5034 |   1.0281 |
    frame_reindex_axis1                          | 558.3910 | 510.9200 |   1.0929 |
    reindex_daterange_backfill                   |   3.4040 |   2.9933 |   1.1372 |
    -------------------------------------------------------------------------------
    Test name                                    | head[ms] | base[ms] |  ratio   |
    -------------------------------------------------------------------------------

    Ratio < 1.0 means the target commit is faster then the baseline.
    Seed used: 1234

    Target [5c6116c] : Merge pull request #5281 from cancan101/index_meta_data_doc

    DOC: Added versionadded for "Setting index metadata"
    Base   [8c8ef7d] : ENH: Add array_equivalent, to address the handling of NaNs when comparing arrays for equality.

    Added NDFrame.equals

    Index, Float64Index, and MultiIndex's equal method now uses array_equivalent
    instead of np.array_equal.

Clearly I don't know what I'm doing. What is the right test_perf.sh command?
I see there are other choices for -r in pandas/vb_suite. But which is the right/relevant one(s)?

@jreback
Copy link
Contributor

jreback commented Oct 20, 2013

b should be the commit before 1st of yours and t should be the last commit of yours

generally I rebase to master before this

@unutbu
Copy link
Contributor Author

unutbu commented Oct 20, 2013

With array-equivalent rebased to master,

time ./test_perf.sh -b master -t array-equivalent 

yields vb_suite.log

@jreback
Copy link
Contributor

jreback commented Oct 20, 2013

concat_series_axis1                          | 204.8774 |  83.7650 |   2.4459 |
reindex_frame_level_align                    |   8.9770 |   1.2484 |   7.1910 |

so look at these in master and in your PR using %prun...and see if you can figure out what's up...

null_right = np.isnan(right)
except TypeError:
return np.array_equal(left, right)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just coerce to float (if it fails that your fallback is fine, though that itself takes some time, might be better just to check the index type first) you don't need the isnull/isnan checking at all, just do (left != left) & (right != right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I tried

def array_equivalent(left, right):
    left, right = np.asarray(left), np.asarray(right)
    try:
        left = left.astype(float)
        right = right.astype(float)        
    except (ValueError, TypeError):
        return np.array_equal(left, right)
    else:
        return (left.shape == right.shape
                and ((left == right) | (left != left) & (right != right)).all())

time ./test_perf.sh -b master -t coerce-to-float yields (using Python2.7, Numpy 1.7)

series_align_irregular_string                |  97.3604 |  68.7210 |   1.4167 |
series_align_left_monotonic                  |  32.0517 |  22.5259 |   1.4229 |
concat_series_axis1                          | 430.0770 |  82.5344 |   5.2109 |
reindex_frame_level_align                    |  23.5590 |   1.2616 |  18.6734 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Also, coercing to float drops the imaginary part of complex arrays:

>>> np.array([nan, 1+1j], dtype='complex').astype(float)
array([ nan,   1.])

So np.isnan will (I think) handle more dtypes than (x != x), and has comparable, maybe even favorable speed, when applied to float arrays:

In [6]: x = np.array([1, 2, nan])

In [7]: %timeit x != x
1000000 loops, best of 3: 1.23 µs per loop

In [5]: %timeit np.isnan(x)
1000000 loops, best of 3: 1.1 µs per loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah....so my suggestion made it worse!

I think you need to detect if you need to do this in the first place (maybe by only checking on Index/Float64Index) types (as Int64Index cannot hold nan)....so you avoid the try: except: overhead

@unutbu
Copy link
Contributor Author

unutbu commented Oct 21, 2013

With the current commit, test_perf.sh yields

groupby_simple_compress_timing               |  54.9030 |  47.4270 |   1.1576 |
frame_iloc_dups                              |   0.3117 |   0.2663 |   1.1704 |
index_int64_intersection                     |  41.4550 |  33.6334 |   1.2326 |
groupby_series_simple_cython                 |   7.6556 |   5.9413 |   1.2885 |
series_align_left_monotonic                  |  30.3703 |  22.4893 |   1.3504 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

I'm going to try adding a check for Int64Index arrays next...

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

also try doing test_perf again....these could be 'random'.....(e.g. if they are not similar on subsequent runs) then its just an artifact of the data....you can also try with a bigger n (numcalls)

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

@unutbu have a look at #5219 I believe your replacement will simply be called by that, yes?

@jtratner
Copy link
Contributor

question here - why do you need to cast it to float first? I thought it worked with just ==? I'm sure I'm missing something but just wanted to make sure we had an example that fails using ==. (or maybe it's just float dtype that fails)

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

I think object dtype that has floats in it (iow float64index) fails ; not sure why though

@unutbu
Copy link
Contributor Author

unutbu commented Oct 21, 2013

@jreback Regarding #5219, yes, I am striving to make array_equivalent a drop-in replacement for np.array_equal. It should behave exactly like np.array_equal except that NaNs in corresponding locations should be treated as equal.

The tests in tests/test_common.py in the test_array_equivalent show the behavior I'm currently testing for.

@hayd
Copy link
Contributor

hayd commented Oct 21, 2013

Perhaps related: weird bug in numpy assert_array_equal I came across in a test a while ago (that I can't repo outside of the test suite) #4458.

(@unutbu: your doing pandas' pull requests now, awesome!)

@unutbu
Copy link
Contributor Author

unutbu commented Oct 21, 2013

@jtratner: I did try coercing to float (#5283 (diff)), but found there were problems. (See the link for more details.)

(Fixed incorrect link.)

Currently, array_equivalent uses np.isnan instead of pd.isnull because it is faster, but since it raises TypeError or NotImplementedError (Python2.6 or 3.2) on object arrays (unlike pd.isnull), I'm using np.array_equal as a fallback.

@jtratner
Copy link
Contributor

Again, can we take a quick step back here: what's an example where it
doesn't work to compare ndarrays with == (let's assume that
array_equivalent always gets ndarrays for now). So you don't have to deal
with Index subclasses - will always get actual ndarrays.

if you pass array of floats with dtype object and some are nan, it compares
incorrectly with ==, right?

@jtratner
Copy link
Contributor

So if you're thinking of Float64Index - just do '.view(ndarray)' so you're
not dealing with anything on pandas level.

Once we get it to work for ndarray, then can consider what to do for
NDFrame and friends. (trivial to view Index as ndarray for now)
On Oct 21, 2013 5:51 PM, "Jeffrey Tratner" jeffrey.tratner@gmail.com
wrote:

Again, can we take a quick step back here: what's an example where it
doesn't work to compare ndarrays with == (let's assume that
array_equivalent always gets ndarrays for now). So you don't have to deal
with Index subclasses - will always get actual ndarrays.

if you pass array of floats with dtype object and some are nan, it
compares incorrectly with ==, right?

@unutbu
Copy link
Contributor Author

unutbu commented Oct 21, 2013

@jtratner: I don't quite understand. What is the "it" in the phrase "where it doesn't work..."?

Currently the test

assert array_equivalent(np.array([nan, None], dtype='object'),
                        np.array([nan, None], dtype='object')) 

pass.s

@jtratner
Copy link
Contributor

Finally have a computer - just need to look at something for myself. I
mean, where a == b | ((a != a) & ( b != b)) doesn't work, since that's
what I'd expect to work everywhere with a check for matching dtypes.

@jtratner
Copy link
Contributor

I just used this:

def array_equiv(n1, n2):
    return n1.shape == n2.shape and n1.dtype == n2.dtype and ((n1 == n2) | ((n1 != n1) & (n2 != n2))).all()

And it worked for all of these - am I missing why this is complicated? Is there a numpy version issue?

import numpy as np
nan = np.nan
for func in [
             lambda : np.array([0.1, 0.2, np.nan, 0.3], dtype=object),
             lambda : np.array([0.1, 0.2, np.nan, 0.3, np.nan], dtype=float),
             lambda : np.array([None, None, np.nan, None], dtype=object),
             lambda : np.array([], dtype=object)]:
    assert array_equiv(func(), func())

Then callers should be responsible for checking anything at pandas-level.

@unutbu
Copy link
Contributor Author

unutbu commented Oct 21, 2013

How about:

import numpy as np
import pandas as pd
import pandas.core.common as com

def array_equiv(n1, n2):
    return n1.shape == n2.shape and n1.dtype == n2.dtype and ((n1 == n2) | ((n1 != n1) & (n2 != n2))).all()

index = np.random.random(10)
df1 = pd.DataFrame(np.random.random(10,), index=index, columns=['floats'])
df1['dates'] = pd.date_range('2000-1-1', periods=10, freq='T')
df1.ix[::2] = np.nan

print(array_equiv(df1.values, df1.values))
# False

However, my array_equivalent does not handle object arrays correctly either. To work around the above problem, I had to add code to NDFrame.equals to test each column separately.

@jtratner
Copy link
Contributor

okay, thanks - just wanted to make sure we had something that explicitly didn't work for the simpler version.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

actually...why don't we do both...

use the simpler version...if its True (then we are done as we don't have false positives), however a False can fall back to the slower version

>>> array_equivalent([1, nan, 2], [1, 2, nan])
False
"""
if isinstance(left, pd.Int64Index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can change this to something like if not issubclass(left.dtype.type, (np.object_, np.floating)): return np.array_equal(left, right), right? Given that only object and floating can hold nan?

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

can you put in some tests for datetime/timedeltas? (incluing with NaT) ? and bools too

you might need to change the comparisons to something like this:

def equals(self, other):
         if self.dtype != other.dtype or self.shape != other.shape: return False
         return np.array_equal(self._try_operate(self.values), self._try_operate(other.values))

_try_operate essentially does .view('i8') as needed (this way you wont' have to change anything else

(it might work w/o this ...not sure exactly what np.array_equal does)

@unutbu
Copy link
Contributor Author

unutbu commented Jan 21, 2014

While writing a test for timedeltas and bools, I've come upon an interesting problem:

Suppose df1 and df2 are defined this way:

import numpy as np
import pandas as pd
index = np.random.random(10)
df1 = pd.DataFrame(np.random.random(10,), index=index, columns=['floats'])
df1['text'] = 'the sky is so blue. we could use more chocolate.'.split()
df1['start'] = pd.date_range('2000-1-1', periods=10, freq='T')
df1['end'] = pd.date_range('2000-1-1', periods=10, freq='D')
df1['diff'] = df1['end'] - df1['start']
df1['bool'] = (np.arange(10) % 3 == 0)
df1.ix[::2] = np.nan
df2 = df1.copy()

Then the underlying blocks look like this:

In [2]: df1._data.blocks
Out[2]: 
[DatetimeBlock: [start, end], 2 x 10, dtype: datetime64[ns],
 FloatBlock: [floats], 1 x 10, dtype: float64,
 ObjectBlock: [text], 1 x 10, dtype: object,
 TimeDeltaBlock: [diff], 1 x 10, dtype: timedelta64[ns],
 FloatBlock: [bool], 1 x 10, dtype: float64]

In [3]: df2._data.blocks
Out[3]: 
[DatetimeBlock: [start, end], 2 x 10, dtype: datetime64[ns],
 FloatBlock: [floats, bool], 2 x 10, dtype: float64,
 ObjectBlock: [text], 1 x 10, dtype: object,
 TimeDeltaBlock: [diff], 1 x 10, dtype: timedelta64[ns]]

df1 has two FloatBlocks while df2 has one FloatBlock.

Is there a way to massage the BlockManager into a canonical form? (or put more generally, how would you go about comparing these two BlockManagers for equality?)

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

before comparing, bm.consolidate_inplace() (will combine the blocks); this is a normal operation and is somewhat 'lazy', e.g. only done when needed. You will see this called a lot; do it inside the BlockManager.equals first thing (or after you compare shapes, but before iterating over the blocks)

blocks are created in various operations (e.g. insertion, changing a block dtype, etc)...the consolidate merges them (if it can)

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

another slight complication, block order is not-guaranteed, int that you could have [IntBlock, FloatBlock] in one and [FloatBlock, IntBlock] in another and they could be equal

so you should prob sort in some kind of order before you iterate (actually many ways to handle this).

@jreback jreback closed this Jan 21, 2014
@jreback jreback reopened this Jan 21, 2014
@unutbu
Copy link
Contributor Author

unutbu commented Jan 22, 2014

In internals.py,

def _consolidate(blocks, items):
    # sort by _can_consolidate, dtype
    gkey = lambda x: x._consolidate_key
    grouper = itertools.groupby(sorted(blocks, key=gkey), gkey)

causes the blocks to be sorted by _consolidate_key (which includes the dtype). I think this will enforce the same dtype order of the blocks when comparing blockmanagers that should be equal.

However, it is also possible that the blockmanagers might have multiple blocks of the
same dtype but in different orders: [IntBlock1, IntBlock2] versus [IntBlock2,
IntBlock1].

Do you know if the call to _consolidate_inplace() will cause the merged blocks
to always appear in the same order?

@jreback
Copy link
Contributor

jreback commented Jan 22, 2014

Yes the blocks CAN be in different orders; but since their are only a small number of block types, you could either order by the block types in a specific way (prob easiest), or iterate over one and find in the other

separately you might be able to guarantee that consolidate_inplace puts them in the same order (e.g. it would insert into a specific order rather than always appending at the end); I think this would be pretty straightfoward to do

@unutbu
Copy link
Contributor Author

unutbu commented Jan 22, 2014

I think I need some help. I've been trying to create a test where the current code fails, but haven't been able to find one.

I'm pushing my test_internals.py to help clarify the case I'm worried about.
But it still passes the test because the index is unique and so in _merge_blocks

        # unique, can reindex
        if items.is_unique:
            return new_block.reindex_items_from(items)

makes the returned value the same for both blockmanagers because items is the same.

I wonder if there might be a problem if items is not unique, but I haven't been able to create such an example.

Can you help me find and example which breaks the current code?

@jreback
Copy link
Contributor

jreback commented Jan 22, 2014

here's a non-unique example; essentially the placement is a set index to locations (as opposed to
the unique case where .ref_locs computes the indexer), here is is 'set' (by the calling function). You need this for the non-uniques to map the items in a block to the ref_items as they both could be non-unique (even across blocks).

This may not answer your question about the unique case, which I am thinking because of the reindex actually DOES guarantee orderings.(certainly on the items), but maybe on the blocks (as I said I cannot prove that it does not work)

from pandas.core.internals import make_block, BlockManager
import numpy as np
from pandas import Index

index = Index(list('aaabbb'))
block1 = make_block(np.arange(12).reshape(3,4), list('aaa'), index, placement=[0,1,2])
block2 = make_block(np.arange(12).reshape(3,4)*10, list('bbb'), index, placement=[3,4,5])
block1.ref_items = block2.ref_items = index
bm1 = BlockManager([block1, block2], [index, np.arange(block1.shape[1])])
bm2 = BlockManager([block2, block1], [index, np.arange(block1.shape[1])])

print "before consolidation"
print bm1
print bm1.blocks[0]._ref_locs
print bm2.blocks[0]._ref_locs
print bm2
print bm1.blocks[0]._ref_locs
print bm2.blocks[0]._ref_locs

bm1._consolidate_inplace()
bm2._consolidate_inplace()

print "\nafter consolidation"
print bm1
print bm1.blocks[0]._ref_locs
print bm2
print bm2.blocks[0]._ref_locs

output

before consolidation
BlockManager
Items: Index([u'a', u'a', u'a', u'b', u'b', u'b'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3], dtype='int64')
IntBlock: [a, a, a], 3 x 4, dtype: int64
IntBlock: [b, b, b], 3 x 4, dtype: int64
[0 1 2]
[3 4 5]
BlockManager
Items: Index([u'a', u'a', u'a', u'b', u'b', u'b'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3], dtype='int64')
IntBlock: [b, b, b], 3 x 4, dtype: int64
IntBlock: [a, a, a], 3 x 4, dtype: int64
[0 1 2]
[3 4 5]

after consolidation
BlockManager
Items: Index([u'a', u'a', u'a', u'b', u'b', u'b'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3], dtype='int64')
IntBlock: [a, a, a, b, b, b], 6 x 4, dtype: int64
[0 1 2 3 4 5]
BlockManager
Items: Index([u'a', u'a', u'a', u'b', u'b', u'b'], dtype='object')
Axis 1: Int64Index([0, 1, 2, 3], dtype='int64')
IntBlock: [b, b, b, a, a, a], 6 x 4, dtype: int64
[3 4 5 0 1 2]

@@ -4004,6 +4024,9 @@ def _merge_blocks(blocks, items, dtype=None, _can_consolidate=True):
raise AssertionError("_merge_blocks are invalid!")
dtype = blocks[0].dtype

if not items.is_unique:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example you gave did indeed break the code. I've added your example to test_internals.py and am handling this case by sorting the blocks according to their ref_locs.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

ok looks good. can you do a quick perf test (just ccomment if its notok). II would add a small mention in the main docs (and put in a link from v0.13.1.txt), maybe in a sub-section near any/all/bool (IIRC in basics.rst). Also pls add a one-liner in release notes.

@y-p @jtratner ??

@unutbu
Copy link
Contributor Author

unutbu commented Jan 23, 2014

Problem:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_apply_np_mean                          |   3.4624 |   1.9017 |   1.8207 |
frame_apply_lambda_mean                      |   3.3963 |   1.2426 |   2.7331 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

I re-ran these Benchmarks and found the ratio is consistently large.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

are you rebased on master? I just added these

@unutbu
Copy link
Contributor Author

unutbu commented Jan 23, 2014

Oops, thanks for the reminder. Now, much better:

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
frame_apply_np_mean                          |   3.1480 |   3.2330 |   0.9737 |
frame_apply_lambda_mean                      |   3.1850 |   3.1913 |   0.9980 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

yep...that looks fine

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

this looks ok to me.....@y-p ?
@jorisvandenbossche

@unutbu rebase maybe just to fix the release notes if you have a chance

@ghost
Copy link

ghost commented Jan 24, 2014

Can't review, up to you.

@unutbu
Copy link
Contributor Author

unutbu commented Jan 24, 2014

I think the Travis test failed for a reason unrelated to my commits. Is there a way to restart Travis on the same build, or should a push an innocuous change to try it again?

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

there is a little button on the rhs of the screen where you can restart an individual job

or can always

git commit -C HEAD --amend then force push (resets the commit hash and forces a rebuild)

…nt`, which is similar to `np.array_equal` except that it handles object arrays and treats NaNs in corresponding locations as equal.

TST: Add tests for NDFrame.equals and BlockManager.equals

DOC: Mention the equals method in basics, release and v.0.13.1
jreback added a commit that referenced this pull request Jan 24, 2014
API: Add equals method to NDFrames.
@jreback jreback merged commit 929fd1c into pandas-dev:master Jan 24, 2014
@@ -215,6 +215,14 @@ These operations produce a pandas object the same type as the left-hand-side inp
that if of dtype ``bool``. These ``boolean`` objects can be used in indexing operations,
see :ref:`here<indexing.boolean>`

As of v0.13.1, Series, DataFrames and Panels have an equals method to compare if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged this thanks! maybe as a small followup....can you explain in the docs why one would need to do this, maybe a small example is in order?

@unutbu
Copy link
Contributor Author

unutbu commented Jan 24, 2014

@jreback: Sure; I tried pushing to here, but since that did not work, I've opened PR #6072.

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

yep already merged

one thing on the doc update

can u put a link from v0.13.1 back to your new section

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: NDFrame.equals
4 participants