Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the hash of matrix spaces and improve its performance #12290

Closed
simon-king-jena opened this issue Jan 10, 2012 · 17 comments
Closed

Fix the hash of matrix spaces and improve its performance #12290

simon-king-jena opened this issue Jan 10, 2012 · 17 comments

Comments

@simon-king-jena
Copy link
Member

The central assumptions for any hash function is: If two objects are equal then they must have the same hash. That assumption is violated for matrix spaces:

sage: M = MatrixSpace(ZZ, 10)
sage: N = MatrixSpace(ZZ, 10,sparse=True)
sage: N
Full MatrixSpace of 10 by 10 sparse matrices over Integer Ring
sage: M
Full MatrixSpace of 10 by 10 dense matrices over Integer Ring
sage: M == N
True
sage: hash(M)==hash(N)
False

That has to be fixed. Moreover, the hash of matrix spaces is rather sluggish and should thus be improved speed-wise:

sage: %timeit hash(M)
625 loops, best of 3: 13.8 µs per loop

The root of both evils is the generic __hash__ method inherited from SageObject:

    def __hash__(self):
        return hash(self.__repr__())

Apply

attachment: trac12290_unique_matrix_space.patch

Depends on #11900

Component: linear algebra

Keywords: hash matrix space unique parent

Author: Simon King

Reviewer: David Loeffler

Merged: sage-5.0.beta9

Issue created by migration from https://trac.sagemath.org/ticket/12290

@simon-king-jena
Copy link
Member Author

Make the hashes of two equal matrix spaces equal. Improve the performance

@simon-king-jena
Copy link
Member Author

Author: Simon King

@simon-king-jena
Copy link
Member Author

comment:1

Attachment: trac12290_matrixspace_hash.patch.gz

With my patch, the __hash__ method returns the hash of the same data that are used for comparison of matrix spaces. Hence, we have the new doc test

sage: M = MatrixSpace(ZZ, 10)
sage: N = MatrixSpace(ZZ, 10,sparse=True)
sage: M == N
True
sage: hash(M)==hash(N)
True

Now about speed. If one computes the hash value over and over again by

    def __hash__(self):
        return hash((self.base(),self.__nrows, self.__ncols))

then the timing is

sage: timeit("hash(M)", number=10^6)
1000000 loops, best of 3: 1.72 µs per loop

If the hash value is stored in a single underscore attribute, such as

    def __hash__(self):
        try:
            return self._hash
        except AttributeError:
            self._hash = h = hash((self.base(),self.__nrows, self.__ncols))
        return h

then one gets

sage: timeit("hash(M)", number=10^6)
1000000 loops, best of 3: 801 ns per loop

With a double-underscore __hash instead of _hash, one has

sage: timeit("hash(M)", number=10^6)
1000000 loops, best of 3: 712 ns per loop

With directly accessing the __dict__ such as

    def __hash__(self):
        try:
            return self.__dict__['_hash']
        except KeyError:
            self.__dict__['_hash'] = h = hash((self.base(),self.__nrows, self.__ncols))
        return h

one has

sage: timeit("hash(M)", number=10^6)
1000000 loops, best of 3: 611 ns per loop

and with the patch, one has

sage: timeit("hash(M)", number=10^6)
1000000 loops, best of 3: 547 ns per loop

How is that possible? Apparently a "try-except" block has some overhead. Hence, simply returning a lazy attribute (which is solution of my patch) is fastest. Note that it is not possible to use @cached_method on the __hash__ method.

Needs review (although I still need to run full doctests)!

@simon-king-jena
Copy link
Member Author

comment:2

make ptest results in one error:

sage -t -force_lib "devel/sage/sage/matrix/matrix2.pyx"     
**********************************************************************
File "/home/simon/SAGE/sage-4.8.alpha3/devel/sage/sage/matrix/matrix2.pyx", line 581:
    sage: B.elementwise_product(C).is_sparse()
Exception raised:
    Traceback (most recent call last):
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1231, in run_one_test
        self.run_one_example(test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/sagedoctest.py", line 38, in run_one_example
        OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
      File "/home/simon/SAGE/sage-4.8.alpha3/local/bin/ncadoctest.py", line 1172, in run_one_example
        compileflags, 1) in test.globs
      File "<doctest __main__.example_9[34]>", line 1, in <module>
        B.elementwise_product(C).is_sparse()###line 581:
    sage: B.elementwise_product(C).is_sparse()
      File "matrix2.pyx", line 626, in sage.matrix.matrix2.Matrix.elementwise_product (sage/matrix/matrix2.c:5345)
      File "element.pyx", line 2994, in sage.structure.element.canonical_coercion (sage/structure/element.c:20329)
      File "element.pyx", line 3011, in sage.structure.element.canonical_coercion (sage/structure/element.c:20245)
      File "coerce.pyx", line 939, in sage.structure.coerce.CoercionModel_cache_maps.canonical_coercion (sage/structure/coerce.c:9043)
    TypeError: no common canonical parent for objects with parents: 'Full MatrixSpace of 5 by 6 sparse matrices over Integer Ring' and 'Full MatrixSpace of 5 by 6 sparse matrices over Rational Field'
**********************************************************************
1 items had failures:
   1 of  50 in __main__.example_9
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/simon/.sage//tmp/matrix2_8852.py
         [19.4 s]

Quite an interesting error, I think!

@simon-king-jena
Copy link
Member Author

comment:3

Interesting indeed.

Without the patch, one has

sage: M1 = MatrixSpace(ZZ, 5,6, sparse=True)
sage: M2 = MatrixSpace(ZZ, 5,6, sparse=False)
sage: M1==M2
True
sage: D = {M1:1,M2:2}
sage: len(D)
2

Obvious reason: M1 and M2 are equal (so then the length of D should be one, not two!), but they have different hash and are thus in different buckets of the dictionary.

With my patch, they have the same hash, and by consequence they yield the same dictionary item - and that is bad for coercion! Hence, non-unique parents strike again...

@simon-king-jena
Copy link
Member Author

comment:4

The problem could be fixed by turning matrix spaces into unique parents (which would be the straight-forward thing to do).

I just asked sage-devel for permission to make matrix spaces unique. Adding a dense and a sparse matrix would not be a problem for the coercion model.

@simon-king-jena
Copy link
Member Author

Attachment: trac12290_unique_matrix_space.patch.gz

Use UniqueRepresentation as a base class for matrix spaces.

@simon-king-jena
Copy link
Member Author

comment:5

I have attached a patch that follows a totally different approach: Use UniqueRepresentation as a base class for matrix spaces!

Advantage: One gets __hash__, __cmp__ and __reduce__ for free, and the hash is even faster than with my previous patch.

sage: M = MatrixSpace(ZZ, 10)
sage: N = MatrixSpace(ZZ, 10,sparse=True)
sage: M == N
False
sage: timeit("hash(M)", number=10^6)
1000000 loops, best of 3: 511 ns per loop

The price to pay (as one can see in the example): The spaces of dense versus sparse matrices are not considered equal anymore. For applications, this shouldn't matter, since the coercion model can easily deal with it. In fact, I like the new behaviour a lot better than the old behaviour!

Old:

sage: M = MatrixSpace(ZZ, 10)
sage: N = MatrixSpace(ZZ, 10,sparse=True)
sage: a = M.random_element()
sage: b = N.random_element()
sage: (a+b).parent()
Full MatrixSpace of 10 by 10 dense matrices over Integer Ring
sage: (b+a).parent()
Full MatrixSpace of 10 by 10 sparse matrices over Integer Ring

The parent of the sum depends on the order of summands!!

But with the new patch, one has

sage: M = MatrixSpace(ZZ, 10)
sage: N = MatrixSpace(ZZ, 10,sparse=True)
sage: a = M.random_element()
sage: b = N.random_element()
sage: (a+b).parent()
Full MatrixSpace of 10 by 10 dense matrices over Integer Ring
sage: (b+a).parent()
Full MatrixSpace of 10 by 10 dense matrices over Integer Ring

independent of the summation order.

I had to change some existing doctests in a trivial way, and then the whole test suite passes. Ready for review!

Apply trac12290_unique_matrix_space.patch

@simon-king-jena
Copy link
Member Author

Changed keywords from hash matrix space to hash matrix space unique parent

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member Author

Dependencies: #11900

@simon-king-jena
Copy link
Member Author

comment:6

The new patch has a dependency: #11900, which has positive review and is already in sage-5.0.prealpha0

@simon-king-jena
Copy link
Member Author

comment:7

Note that inspite of the red blob, the patchbot finds that all tests pass with the the latest prerelease. So, review anyone?

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 10, 2012

comment:8

Hi Simon,

This looks good, and I have doctests running on it as I write; but can you tell us what's going on with the new doctest at line 334 of matrix_integer_2x2?

David

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 10, 2012

comment:9

I see: the old copy method returned a copy whose base ring was None (!). Anyway, this looks like good stuff, and patchbot seems happy with it on two different v5.0 builds, so let's get it in.

@loefflerd
Copy link
Mannequin

loefflerd mannequin commented Mar 10, 2012

Reviewer: David Loeffler

@jdemeyer
Copy link

Merged: sage-5.0.beta9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants