Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The category graph should comply with Python's method resolution order #11943

Closed
simon-king-jena opened this issue Oct 21, 2011 · 125 comments
Closed

Comments

@simon-king-jena
Copy link
Member

Let C be a category. C.all_super_categories() starts with C.super_categories() and finds all super categories inductively.

Unfortunately, the algorithm of C.all_super_categories() does not comply with the C3 algorithm, that is used by Python to determine the method resolution order for C.parent_class and C.element_class.

The aim of this ticket is to be more consistent. Eventually, for any category C (perhaps with modifications for hom categories), one should have

sage: C.parent_class.mro() == [X.parent_class for X in C.all_super_categories()] + [object]
True
sage: C.element_class.mro() == [X.element_class for X in C.all_super_categories()] + [object]
True

and that test should become part of the test suite.

At #11900, the implementation of all_super_categories() has been changed, so, work should rely on #11900.

Unfortunately, it seems that the C3 algorithm can not simply be imported from Python, even though Python uses it internally, namely in Objects/typeobject.c. Looking at the implementation, it requires that one gets a list of types, while we want to use it on a list of objects.

Therefore, we need to implement C3 from scratch. Implementing C3 is easy, but if it shall be fast, one needs to be careful.

In particular, one needs to be able to L.pop(0) quickly, where L is a list. Unfortunately, L.pop(0) is slow, even in Cython. I found that, if the lists are not too long, it is best to revert L and do L.pop() instead, which is optimized in Cython. See the discussion at sage-devel.

What my patch does:

  • Provide the C3 algorithm in a new extension module sage.misc.c3
  • Use it to compute all_super_categories()
  • Add a test _test_category_graph, asserting that self.parent_class.mro() and self.all_super_categories() are compatible.

Apply:

attachment: trac11943_mro_for_all_super_categories_combined.patch

Depends on #11900
Depends on #7980

CC: @hivert @koffie

Component: categories

Keywords: category graph, method resolution order

Author: Simon King

Reviewer: Nicolas M. Thiéry

Merged: sage-5.1.beta0

Issue created by migration from https://trac.sagemath.org/ticket/11943

@simon-king-jena
Copy link
Member Author

comment:1

With the new test, I found a bug in algebra_ideals:

sage: C = AlgebraIdeals(FreeAlgebra(QQ,2,'a,b'))
sage: C.parent_class
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (396, 0))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/king/Seminar/<ipython console> in <module>()

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/misc/lazy_attribute.pyc in __get__(self, a, cls)
    507         if a is None: # when doing cls.x for cls a class and x a lazy attribute
    508             return self
--> 509         result = self.f(a)
    510         if result is NotImplemented:
    511             # Workaround: we make sure that cls is the class

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/categories/category.pyc in parent_class(self)
    626         """
    627         return dynamic_class("%s.parent_class"%self.__class__.__name__,
--> 628                              tuple(cat.parent_class for cat in self.super_categories()),
    629                              self.ParentMethods,
    630                              reduction = (getattr, (self, "parent_class")))

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/misc/cachefunc.pyc in __call__(self, *args, **kwds)
    553             return self.cache[k]
    554         except KeyError:
--> 555             w = self._cachedmethod._instance_call(self._instance, *args, **kwds)
    556             self.cache[k] = w
    557             return w

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/misc/cachefunc.pyc in _instance_call(self, inst, *args, **kwds)
    776 
    777         """
--> 778         return self._cachedfunc.f(inst, *args, **kwds)
    779 
    780     def _get_instance_cache(self, inst):

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/categories/algebra_ideals.pyc in super_categories(self)
     67             [Category of algebra modules over Univariate Polynomial Ring in x over Rational Field]
     68         """
     69         from algebra_modules import AlgebraModules
     70         R = self.algebra()
---> 71         return [AlgebraModules(R)]

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/misc/classcall_metaclass.pyc in __call__(cls, *args, **options)
    256             return cls.__classcall_private__(cls, *args, **options)
    257         elif hasattr(cls, "__classcall__"):
--> 258             return cls.__classcall__(cls, *args, **options)
    259         else:
    260             return type.__call__(cls, *args, **options)

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/misc/cachefunc.pyc in __call__(self, *args, **kwds)
    176             return self.cache[k]
    177         except KeyError:
--> 178             w = self.f(*args, **kwds)
    179             self.cache[k] = w
    180             return w

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/structure/unique_representation.pyc in __classcall__(cls, *args, **options)
    447             True
    448         """
--> 449         instance = type.__call__(cls, *args, **options)
    450         assert isinstance( instance, cls )
    451         if instance.__class__.__reduce__ == UniqueRepresentation.__reduce__:

/mnt/local/king/SAGE/broken/local/lib/python2.6/site-packages/sage/categories/algebra_modules.pyc in __init__(self, A)
     53         from sage.categories.commutative_algebras import CommutativeAlgebras
     54         if not hasattr(A, "base_ring") or not A in CommutativeAlgebras(A.base_ring()):
---> 55             raise TypeError, "A (=%s) must be a commutative algebra"%A
     56         Category_module.__init__(self, A)
     57 

TypeError: A (=Free Algebra on 2 generators (a, b) over Rational Field) must be a commutative algebra

Thus, apparently, that category has never been used.

The problem is that the super categories of the category of algebra ideals should be the category of algebra modules. However, while the former accepts non-commutative algebras, the latter wants to see commutative algebras.

The clean way to proceed would be: Make algebra modules work over non-commutative rings. If that is too much of work, C.super_categories() should be [] if C is the category of algebra ideals over a non-commutative rings.

@simon-king-jena
Copy link
Member Author

Dependencies: #11900

@simon-king-jena
Copy link
Member Author

comment:2

Patch's up!

To my surprise, the change in the order of super categories was relatively harmless. In few cases, a test involving all_super_categories had to change.

I added a test _test_category_graph to the Test Suite of categories. By that test, I found one bug in algebra_ideals, which I fixed. Hom categories can not use the new test yet, since it tests the mro of the parent class against the list of parent classes of all super categories - which is fairly inconsistent for hom categories. That is already a "to do", and should be part of the next category overhaule.

Apart from these small changes, all doc tests passed! In particular, I did not need to change super_categories (except for algebra ideals): The current category graph seems to be consistent!!

That was the good news.

The bad news is that we have a regression in the computation of all_super_categories. With #11900, we have

sage: L = [Algebras(GF(p)) for p in prime_range(10000)]
sage: %time X = [C.all_super_categories() for C in L]
CPU times: user 0.74 s, sys: 0.01 s, total: 0.75 s
Wall time: 0.75 s

With the patches from here added, we only have

sage: L = [Algebras(GF(p)) for p in prime_range(10000)]
sage: %time X = [C.all_super_categories() for C in L]
CPU times: user 1.06 s, sys: 0.02 s, total: 1.07 s
Wall time: 1.08 s

I tried hard to make my C3 implementation as fast as possible in the range we need: few lists (say, 4) of moderate length (not more than 60).

I think the performance should be improved. But a review can already be started.

@simon-king-jena
Copy link
Member Author

Author: Simon King

@nthiery
Copy link
Contributor

nthiery commented Oct 21, 2011

comment:3

Replying to @simon-king-jena:

With the new test, I found a bug in algebra_ideals:
The problem is that the super categories of the category of algebra ideals should be the category of algebra modules. However, while the former accepts non-commutative algebras, the latter wants to see commutative algebras.
The clean way to proceed would be: Make algebra modules work over non-commutative rings. If that is too much of work, C.super_categories() should be [] if C is the category of algebra ideals over a non-commutative rings.

+1. Actually my patch on #10963 is adding a TODO about it :-)

@nthiery
Copy link
Contributor

nthiery commented Oct 21, 2011

comment:4

Replying to @simon-king-jena:

To my surprise, the change in the order of super categories was relatively harmless. In few cases, a test involving all_super_categories had to change.

I am not so surprised, since most categories are actually used by some
parent (which detects any incompatibility). But it's very nice to
detect such incompatibilities earlier on and to catch them
systematically in the TestSuite!

Cheers,
Nicolas

@simon-king-jena
Copy link
Member Author

comment:5

I tried to track down the regression of all_super_categories. Recall that my patch from #11900 did improve the implementation of all_super_categories, but did not change the algorithm.

With #11900, the main work is done in _all_super_categories_raw. With the patch from here, the work is done in c3_algorithm - and according to %prun, less time is spent in c3_algorithm than it was in _all_super_categories_raw. However, more time is spent in the expression

C.all_super_categories() for C in self.super_categories()

I guess I have to improve the implementation of the recursion.

@simon-king-jena
Copy link
Member Author

comment:6

PS: It also seems that much time is spent in super_categories(), which is a cached method. Probably it would be better to turn it into a lazy attribute (or have two lazy attributes, namely for the case proper=True and proper=False).

Namely, even though #11115 will almost totally reduce the overhead of calling a cached function, a lazy attribute will always be faster.

@simon-king-jena
Copy link
Member Author

comment:7

Replying to @simon-king-jena:

PS: It also seems that much time is spent in super_categories(), which is a cached method. Probably it would be better to turn it into a lazy attribute (or have two lazy attributes, namely for the case proper=True and proper=False).

Sorry, I meant: One lazy attribute C.super_categories, and two lazy attributes C.all_super_categories and C.all_super_categories_proper (super_categories() has no arguments).

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member Author

comment:8

I have created a more invasive patch: It turns C.super_categories() and C.all_super_categories(proper) into lazy attributes, and it uses the C3 algorithm in a more efficient way.

The two patches are alternative. Personally, I prefer the second patch. If the reviewer disagrees and suggests to use the first patch, I'd like to add one improvement to the first patch. If the reviewer agrees with me, then eventually either this patch or the patch from #11935 needs to be rebased.

I just tested that all doctests pass, with sage-4.7.2.alpha3+#11900+the second patch from here.

And here are some timings:
With just #11900:

king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage elltest1.sage

real    0m18.089s
user    0m17.633s
sys     0m0.344s
king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage ellbsd.sage

real    0m5.578s
user    0m5.084s
sys     0m0.400s

and

sage: L = [Algebras(GF(p)) for p in prime_range(10000)]
sage: %time X = [C.all_super_categories() for C in L]
CPU times: user 0.72 s, sys: 0.01 s, total: 0.72 s
Wall time: 0.73 s

With #11900+the second patch from here:

king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage elltest1.sage

real    0m17.799s
user    0m17.333s
sys     0m0.384s
king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage ellbsd.sage

real    0m5.408s
user    0m4.956s
sys     0m0.376s

and

sage: L = [Algebras(GF(p)) for p in prime_range(10000)]
# all_super_categories is lazy now, we are not calling it
sage: %time X = [C.all_super_categories for C in L]
CPU times: user 0.54 s, sys: 0.00 s, total: 0.54 s
Wall time: 0.54 s

Note an additional doc test, demonstrating that the new version of all_super_categories can detect inconsistencies:

sage: class X(Category):
...    @lazy_attribute
...    def super_categories(self):
...        return [Objects()]
sage: class X(Category):
...    @lazy_attribute
...    def super_categories(self):
...        return [Objects()]
sage: class Y(Category):
...    @lazy_attribute
...    def super_categories(self):
...        return [Objects()]
sage: class A(Category):
...    @lazy_attribute
...    def super_categories(self):
...        return [X(),Y()]
sage: class B(Category):
...    @lazy_attribute
...    def super_categories(self):
...        return [Y(),X()]
sage: class Foo(Category):
...    @lazy_attribute
...    def super_categories(self):
...       return [A(),B()]
sage: F = Foo()

Python is not able to create a consistent mro for the parent class - and all_super_categories detects that inconsistency as well:

sage: F.parent_class
Traceback (most recent call last):
...
TypeError: Cannot create a consistent method resolution
order (MRO) for bases X.parent_class, Y.parent_class
sage: F.all_super_categories
Traceback (most recent call last):
...
ValueError: Can not merge the items Category of x, Category of y.

Of course, there still is the new consistency test for the category graph in the test suite.

So, it can really be reviewed now, and my suggestion is:

Apply trac11943_mro_for_all_super_categories_lazy.patch

@simon-king-jena
Copy link
Member Author

comment:9

There was no reply on sage-combinat-devel and only little reply on sage-devel: The only reply was Jason Grout stating that the use of attributes is often more pythonic than the use of a method.

So, the idea to turn super_categories and all_super_categories into lazy attributes is supported.

@simon-king-jena
Copy link
Member Author

comment:10

Maarten suggested the following:

What I think would be the best solution in this case would be is:

make lazy attributes

_super_categories and
_all_super_categories

Make the functions super_categories and all_super_categories say in the
documentation that developers shouldn't use these but that they should use
the lazy attributes.

I think that this would be a very good solution indeed.

@simon-king-jena
Copy link
Member Author

Work Issues: Preserve super_categories etc. as methods, but introduce a lazy attribute _super_categories

@simon-king-jena
Copy link
Member Author

comment:11

I think the following model is best for preserving backwards compatibility:

  • If one wants to provide the super categories of a category, one should implement a method super_categories() - this is what one would currently do.
  • There is a new lazy attribute _super_categories defined for the base class of categories. That lazy attribute calls the method super_categories(). Of course, calling the method happens only once, so that the speed is acceptable.
  • In all applications, the method call super_categories() shall be replaced by getting the attribute _super_categories. That is explained in a note to the developers in the documentation of super_categories.
  • There is a lazy attribute _all_super_categories and _all_super_categories_proper, and there is still a method all_super_categories(proper=False). The method returns the appropriate lazy attribute, and the documentation of the method states that one should better use the lazy attributes.

@simon-king-jena
Copy link
Member Author

comment:12

I have changed my patch according to what I learnt from the discussion on sage-devel. Namely:

  • I preserved the old methods super_categories and all_super_categories, so that the users can still work with them and can read their documentation.
  • In particular, if one wants to define a new category, one would still define a method (no need for a cached method) called super_categories that returns the immediate super categories. However, the documentation now also states that internally (i.e., for development), the new lazy attributes explained below should be used.
  • The patch introduces three lazy attributes _super_categories, _all_super_categories and _all_super_categories_proper that carry the lists of (immediate or all) super categories. If one asks for the super categories in code, the fastest way is to request these lazy attributes, not calling the old methods.

Anyway, why do I see a need to make things faster? Recall that the purpose of this ticket is to order the list of all super categories according to Python's method resolution order, i.e., the C3 algorithm. I did my very best to implement the C3 algorithm as fast as possible. However:

  • When I keep super_categories() what they are, then even my best attempt to implement all_super_categories based on C3 resulted in a mild regression.
  • When I keep super_categories() what they are (namely cached methods) and use Rewrite cached_method in Cython #11115, then there is neither a regression nor a speed-up.
  • When I replace super_categories() by _super_categories (a lazy attribute), then I am getting a speed-up, even without Rewrite cached_method in Cython #11115.

Here are the data. I've run sage -t devel/sage/sage/schemes/ and found these total running times:

That's not much (3.3% speedup), but certainly better than a regression.

@simon-king-jena
Copy link
Member Author

Changed work issues from Preserve super_categories etc. as methods, but introduce a lazy attribute _super_categories to none

@simon-king-jena
Copy link
Member Author

comment:13

I forgot the patch bot:

Apply trac11943_mro_for_all_super_categories_lazy.patch

@simon-king-jena
Copy link
Member Author

comment:14

There is a new suggestion, also based on the sage-devel thread:

If we want to test whether a category C is a super category of a category D, then we currently try to find C in the list of all super categories. But it is much faster to search it in a frozen set of super categories.

So, I suggest to use a lazy attribute _set_of_super_categories, providing the frozen set. It will be used in is_subcategory. Of course, the list of super categories should still be available, and its order should comply with Python's mro, according to the subject of this ticket.

Since #11115 is merged into sage-4.7.3.alpha1, I wonder whether it is still necessary to have the list of all super categories as a lazy attribute, or whether a cached method is fast enough. I'll need some tests.

However, I think it is a good idea to have a lazy attribute _super_categories. In that way, we have the speed even if the user does not use @cached_method when (s)he implements the method super_categories().

"Needs work", until I have sorted it out.

@nthiery

This comment has been minimized.

@nthiery
Copy link
Contributor

nthiery commented Apr 6, 2012

Changed work issues from rebase to none

@nthiery
Copy link
Contributor

nthiery commented Apr 6, 2012

comment:84

All tests pass; latest versions of the patches uploaded. Please fold the two patches once you have reviewed my changes.

@simon-king-jena
Copy link
Member Author

comment:85

Sorry, I can not do (reviewing or other) work, being in the middle of holidays. I'll return next week.

@nthiery
Copy link
Contributor

nthiery commented Apr 10, 2012

comment:86

Replying to @simon-king-jena:

Sorry, I can not do (reviewing or other) work, being in the middle of holidays. I'll return next week.

Sure: enjoy your vacations!

Just in case: are you ok with, e.g., Florent finishing the review, or do you prefer double checking things yourself? (just to give a chance to the patch to go into 5.0).

Cheers,
Nicolas

@nthiery
Copy link
Contributor

nthiery commented Apr 10, 2012

comment:87

On 5.0.beta13, I was getting a random doctest failure due to the order
of the classes changing in Python's error message about MRO's:

File "/home/nthiery/sage-5.0.beta13/devel/sage-combinat/sage/misc/c3.pyx", line 81:
    sage: F.parent_class
Expected:
    Traceback (most recent call last):
    ...
    TypeError: Cannot create a consistent method resolution
    order (MRO) for bases X.parent_class, Y.parent_class
Got:
    Traceback (most recent call last):
    ...      File "/home/nthiery/sage-5.0.beta13/local/bin/ncadoctest.py", line 1231, in run_one_test
    TypeError: Cannot create a consistent method resolution
    order (MRO) for bases Y.parent_class, X.parent_class

The updated patch uses "..." instead of X and Y.

@simon-king-jena
Copy link
Member Author

comment:88

Replying to @nthiery:

Replying to @simon-king-jena:
Sure: enjoy your vacations!

Thank you! By the way, it is in France.

Just in case: are you ok with, e.g., Florent finishing the review

Of course! Just go ahead.

@simon-king-jena
Copy link
Member Author

comment:89

The reviewer patch mostly looks fine to me (running tests now). But I think one should use the occasion and apply the trac directive in the Sphynx markup. Hence,

... in :trac:`11943`

rather than

... in trac ticket #11943

When I wrote the original patch, I have not been aware of the directive.

@nthiery
Copy link
Contributor

nthiery commented Apr 16, 2012

comment:90

Replying to @simon-king-jena:

The reviewer patch mostly looks fine to me (running tests now). But I think one should use the occasion and apply the trac directive in the Sphynx markup. Hence,

... in :trac:`11943`

rather than

... in trac ticket #11943

When I wrote the original patch, I have not been aware of the directive.

Perfect; I was about to upload an updated patch doing just that after a suggestion by Florent on Friday :-) Give me two minutes.

@nthiery
Copy link
Contributor

nthiery commented Apr 16, 2012

comment:91

Attachment: trac11943_mro_for_all_super_categories_lazy_hook-review-nt.patch.gz

There it is. Here is the diff between the two patches (oops, sorry, I did the diff the wrong way):

 diff --git a/sage/categories/algebras.py b/sage/categories/algebras.py
 --- a/sage/categories/algebras.py
@@ -155,7 +155,7 @@
 +        All the super categories of this category, including this category.
  
 -        TEST::
-+        Since :trac:`11943`, the order of super categories is
++        Since trac ticket #11943, the order of super categories is
 +        determined by Python's method resolution order C3 algorithm.
  
 -            sage: Rngs()._all_super_categories_raw()
@@ -193,7 +193,7 @@
 +        r"""
 +        All the proper super categories of this category.
 +
-+        Since :trac:`11943`, the order of super categories is
++        Since trac ticket #11943, the order of super categories is
 +        determined by Python's method resolution order C3 algorithm.
 +
 +        .. seealso:: :meth:`all_super_categories`
@@ -266,7 +266,7 @@
 +         - ``proper`` -- a boolean (default: ``False``); whether to exclude this category.
  
 -        FIXME:
-+        Since :trac:`11943`, the order of super categories is
++        Since trac ticket #11943, the order of super categories is
 +        determined by Python's method resolution order C3 algorithm.
  
 -        - make sure that this is compatible with the python algorithm
@@ -422,7 +422,7 @@
 -        By trac ticket #11943, the list of categories returned by :meth:`all_super_categories`
 -        is supposed to correspond to the method resolution order of the parent or element
 -        classes. This method verifies it.
-+            By :trac:`11943`, the list of categories returned by
++            By trac ticket #11943, the list of categories returned by
 +            :meth:`all_super_categories` is supposed to match with the
 +            method resolution order of the parent and element
 +            classes. This method checks this.

If you are happy with the changes, please fold the two patches, and double check the patch header.

Cheers,
Nicolas

@simon-king-jena
Copy link
Member Author

Replaces the two other patches: Order the super categories of a category according to Python's mro. Internally replace (all_)super_categories by lazy attributes, to prevent a regression. Use _subcategory_hook_

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member Author

comment:92

Attachment: trac11943_mro_for_all_super_categories_combined.patch.gz

Replying to @nthiery:

If you are happy with the changes, please fold the two patches, and double check the patch header.

Done! I am happy with your reviewer patch (which is folded into the combined patch), and if you think it is appropriate, please give it a positive review.

Apply trac11943_mro_for_all_super_categories_combined.patch

@jdemeyer
Copy link

Reviewer: Nicolas M. Thiéry

@jdemeyer
Copy link

jdemeyer commented May 6, 2012

Merged: sage-5.1.beta0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants