Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter sort_neighbors to method init_short_digraph #37662

Merged
merged 9 commits into from
Jun 9, 2024

Conversation

dcoudert
Copy link
Contributor

Fixes #37642.

We add parameter sort_edges to method init_short_digraph of the short_digraph data structure defined in src/sage/graphs/base/static_sparse_graph.pxd|pyx.

  • When set to True (default), for each vertex, the list of neighbors is sorted by increasing vertex labels. This enables to search for a vertex in this list using binary search, and so in time O(log(n)), but the overall running time of method init_short_digraph is in O(m + n*log(m)).
  • When set to False, neighbors are not sorted. The running time of method init_short_digraph is reduced to O(n + m), but the running time for searching in the list of neighbors increases to O(m).

So this new parameter is particularly useful for linear time algorithms such as lex_BFS.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

@dcoudert dcoudert changed the title Add parameter sort_neighbors to init_short_digraph in src/sage/graphs/base/static_sparse_graph.pxd|pyx Add parameter sort_neighbors to method init_short_digraph Mar 24, 2024
@dcoudert dcoudert self-assigned this Mar 24, 2024
Copy link

github-actions bot commented Mar 24, 2024

Documentation preview for this PR (built with commit 052c58b; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@cyrilbouvier
Copy link
Contributor

I will continue here the discussion started in issue #37642

I investigated more and I think I found two more problems.

  1. At the beginning of the function init_short_digraph there is the following line:
 cdef list vertices = vertex_list if vertex_list is not None else G.vertices(sort=True)

It implies that if vertex_list is None, the complexity must be at least O(n*log(n)). I did not find a place where the fact that the vertices are sorted is used (but I may have missed it). But the following comment points toward the fact that sorting is not needed:

The optional argument ``vertex_list`` is assumed to be a list of all
vertices of the graph ``G`` in some order.

So maybe, we could change sort=True into sort=False, and remove the following warning:

Otherwise, the result of ``G.vertices(sort=True)`` will be used instead. Because this only
works if the vertices can be sorted, using ``vertex_list`` is
useful when working with possibly non-sortable objects in Python
  1. Even with the proposed changes, the following loop is not in O(m):
for u, v in G.edge_iterator(labels=False, sort_vertices=False):
  ...

I claim that the complexity is O(n^2) when the backend is a dense data structure and O(m*log(m)) when the backend is a sparse data structure.

  • For dense data structure, the graph is stored as a binary matrix. To enumerate the edges, one must iterate over all coefficients of the matrix and test if it is zero or not. The cost is O(n^2). If m ~ n^2, the complexity is O(m) as expected, otherwise one cannot have a complexity in O(m). Dense graph should only be used in the first case but there is nothing preventing a user to use it for any graph. So the doc cannot claim a better complexity that O(n^2).
  • For sparse data structure, the edges are stored in a hash table where each bucket is a binary tree (from here.) Unfortunately, due to the way the enumeration is done (using the method next_in_neighbor_unsafe), finding the next edge cost O(log(|size of the binary tree|). As the number of buckets per vertex is fixed (and seems to always be the default value 16), the complexity is O(log(|neighbors of the vertice|). So the cost of enumerating all the edges is, in the worst case, O(m*log(m)) (e.g. enumerating the edges of graphs.CompleteGraph costs O(m*log(m)).

@dcoudert
Copy link
Contributor Author

For the first point: parameter vertex_list has been added during the transition to Python 3 and we now use it in all calls to init_short_digraph. When it is not specified, i.e., when set to None, I agree that we don't need to sort vertices. So we can change the default value to False.
Note that this is a low level method that will not be called directly by non-expert users.

The second point is more tricky and you are right that the time complexity depends on the backend.
Recall that the time complexity of lex-BFS is linear assuming some data structure of the input graph. That is, it assumes that we can list neighbors in time number of neighbors, etc. As always, the time complexity of the algorithm ignores the eventual cost of converting the input graph into a suitable data structure, and so the cost of operations in the input data structure.
We have the same issue here for converting the input graph into a short_digraph.
We may clarify this point in the documentation, but I don't see how to avoid such cost.

@cyrilbouvier
Copy link
Contributor

We have the same issue here for converting the input graph into a short_digraph.
We may clarify this point in the documentation, but I don't see how to avoid such cost.

For the dense backend, I see no way around it. For the sparse backend, I think it is possible to list the edges of a vertex in linear time with the current data structure (but not by using next_in_neighbor_unsafe).
I will experiment in the following days and try to write a method for that.

@cyrilbouvier
Copy link
Contributor

cyrilbouvier commented Mar 27, 2024

I wrote a quick and dirty edge iterator for sparse graph that seems (more) linear (see graphs where abscissa is m+n)
edge_iterators

The idea is that in the current code, to list the neighbors of an edge, the method _next_neighbor_unsafe is called in a loop. This method may be costly, because going from one edge to the next may require to go through the whole tree.
If we want the complete list of edges, we can simply do a depth-first enumeration of the tree which gives a linear algo.

I will try to tidy the code and propose a PR (or can I commit on the branch of this PR ?).


Coming back to the original problem: the lex_BFS method.

We have the same issue here for converting the input graph into a short_digraph.

As you say, both for converting a graph to a short_digraph and for the BFS algo, the complexity is driven by the cost of enumerating the list of neighbors of an vertex. So if, given a graph G, one is able to do it in O(#neighbors), what is the point of converting G to a short_digraph ? One could directly perform the BFS algo on G and obtain a O(m+n) complexity. And if one is not able to do it in O(#neighbors) (like for dense_graph), one could not attain the O(m+n) complexity, converting G to a short_digraph is also useless. Do you agree ?

Edit: In fact I see one case where it could be necessary: if for a backend it was possible to enumerate all the edges in O(m+n) but not the edges of a vertex in O(#neighbors). But no backends of Sage have this weird condition, they all enumerate all the edges by iterating over the vertices and listing the incident edges.

@dcoudert
Copy link
Contributor Author

The idea is that in the current code, to list the neighbors of an edge, the method _next_neighbor_unsafe is called in a loop. This method may be costly, because going from one edge to the next may require to go through the whole tree. If we want the complete list of edges, we can simply do a depth-first enumeration of the tree which gives a linear algo.

Indeed, we can certainly gain a lot this way.

I will try to tidy the code and propose a PR (or can I commit on the branch of this PR ?).

It is better to make a dedicated PR.

Coming back to the original problem: the lex_BFS method.

We have the same issue here for converting the input graph into a short_digraph.

As you say, both for converting a graph to a short_digraph and for the BFS algo, the complexity is driven by the cost of enumerating the list of neighbors of an vertex. So if, given a graph G, one is able to do it in O(#neighbors), what is the point of converting G to a short_digraph ? One could directly perform the BFS algo on G and obtain a O(m+n) complexity. And if one is not able to do it in O(#neighbors) (like for dense_graph), one could not attain the O(m+n) complexity, converting G to a short_digraph is also useless. Do you agree ?

Edit: In fact I see one case where it could be necessary: if for a backend it was possible to enumerate all the edges in O(m+n) but not the edges of a vertex in O(#neighbors). But no backends of Sage have this weird condition, they all enumerate all the edges by iterating over the vertices and listing the incident edges.

You are right that for lex_BFS the conversion to a short_digraph might not be the best choice. What we really need is a mapping from vertices to integers in range 0..n-1. Since the algorithm scans the neighborhood of a vertex only once, the cost of the conversion is not amortized.

Consequently, priority should be given to improving the way the backends iterate over the neighbors / incident edges. We may also have a better method for listing all edges. In parallel, we can change the lex_BFS method to avoid the conversion to short_digraph.

The changes proposed in this PR are interesting only for methods iterating multiple times on the neighbors of some vertices and exploiting the fact that vertices are relabeled in 0..n-1.

@cyrilbouvier
Copy link
Contributor

With the patch below to remove the sorting of vertices, I think this PR can be merged.

diff --git a/src/sage/graphs/base/static_sparse_graph.pyx b/src/sage/graphs/base/static_sparse_graph.pyx
index 35f3de67c9..aa05daa8bf 100644
--- a/src/sage/graphs/base/static_sparse_graph.pyx
+++ b/src/sage/graphs/base/static_sparse_graph.pyx
@@ -217,10 +217,7 @@ cdef int init_short_digraph(short_digraph g, G, edge_labelled=False, vertex_list
 
     If ``vertex_list`` is given, it will be used to map vertices of
     the graph to consecutive integers. Otherwise, the result of
-    ``G.vertices(sort=True)`` will be used instead. Because this only
-    works if the vertices can be sorted, using ``vertex_list`` is
-    useful when working with possibly non-sortable objects in Python
-    3.
+    ``list(G)`` will be used instead.
     """
     g.edge_labels = NULL
 
@@ -242,7 +239,7 @@ cdef int init_short_digraph(short_digraph g, G, edge_labelled=False, vertex_list
         raise ValueError("The source graph must be either a DiGraph or a Graph object !")
 
     cdef int i, j, v_id
-    cdef list vertices = vertex_list if vertex_list is not None else G.vertices(sort=True)
+    cdef list vertices = vertex_list if vertex_list is not None else list(G)
     cdef dict v_to_id = {v: i for i, v in enumerate(vertices)}
     cdef list neighbor_label
     cdef list edge_labels

@dcoudert
Copy link
Contributor Author

I think I did all required changes.

@dcoudert
Copy link
Contributor Author

There is a failing doctest reported by the CI but I'm unable to reproduce it (with or without this PR) on my macOS laptop. Not sure of the origin of this failing doctest.

sage -t --long --random-seed=286735480429121101562228604801325644303 src/sage/schemes/hyperelliptic_curves/hyperelliptic_finite_field.py
**********************************************************************
Error: Failed example:: Got: x^14 - 104552983718807072150991*x^13 - 124329619411163890896880*x^12 - 100942772735746064408723*x^11 - 51113561669730455050211*x^10 - 60365355472042375018083*x^9 - 33153418523092398763360*x^8 + 111940569229447224904615*x^7 - 1657637772736096845769236640*x^6 - 150907352204924088779748066718083*x^5 - 6388811864670767039761396944576599789*x^4 - 630841859726166432104900978153709319808723*x^3 - 38849120920791033088236337546999139560329103120*x^2 - 1633444343563468584198522787014731319759324274850991*x + 781140631562281254374947500349999

    H.frobenius_polynomial_matrix()  # long time, 8s on a Corei7
Expected:
    x^14 - 76*x^13 + 220846*x^12 - 12984372*x^11 + 24374326657*x^10 - 1203243210304*x^9
    + 1770558798515792*x^8 - 74401511415210496*x^7 + 88526169366991084208*x^6
    - 3007987702642212810304*x^5 + 3046608028331197124223343*x^4
    - 81145833008762983138584372*x^3 + 69007473838551978905211279154*x^2
    - 1187357507124810002849977200076*x + 781140631562281254374947500349999
Got:
    x^14 - 104552983718807072150991*x^13 - 124329619411163890896880*x^12 - 100942772735746064408723*x^11 - 51113561669730455050211*x^10 - 60365355472042375018083*x^9 - 33153418523092398763360*x^8 + 111940569229447224904615*x^7 - 1657637772736096845769236640*x^6 - 150907352204924088779748066718083*x^5 - 6388811864670767039761396944576599789*x^4 - 630841859726166432104900978153709319808723*x^3 - 38849120920791033088236337546999139560329103120*x^2 - 1633444343563468584198522787014731319759324274850991*x + 781140631562281254374947500349999
**********************************************************************

@mkoeppe
Copy link
Member

mkoeppe commented Jun 4, 2024

Update/rebase for a CI run?

@dcoudert
Copy link
Contributor Author

dcoudert commented Jun 4, 2024

I have rebased the branch. I'm waiting for some feedback from @cyrilbouvier to know if this is ok or if he has better proposals.

@cyrilbouvier
Copy link
Contributor

Except my review that has not been addressed (changing O(m + n\log{m}) by O(n + m\log{m}) in a comment, I have no more remarks.

FYI, I am currently writing a PR to add a function that list the neighbors of a vertex in linear time for the sparse backend (one use case would be init_short_digraph).

@dcoudert
Copy link
Contributor Author

dcoudert commented Jun 4, 2024

Should be ok now.

@cyrilbouvier
Copy link
Contributor

LGTM

@dcoudert
Copy link
Contributor Author

dcoudert commented Jun 4, 2024

Thank you for the review. I'm setting this PR to positive review.

@dcoudert
Copy link
Contributor Author

dcoudert commented Jun 4, 2024

The CI reports some doctests errors that I cannot reproduce on my macOS laptop and that are not marked as # need sage.graphs. So it should not be related.

vbraun pushed a commit to vbraun/sage that referenced this pull request Jun 5, 2024
…t_digraph`

    
Fixes sagemath#37642.

We add parameter `sort_edges` to method `init_short_digraph` of the
`short_digraph` data structure defined in
`src/sage/graphs/base/static_sparse_graph.pxd|pyx`.
- When set to `True` (default), for each vertex, the list of neighbors
is sorted by increasing vertex labels. This enables to search for a
vertex in this list using binary search, and so in time `O(log(n))`, but
the overall running time of method  `init_short_digraph` is in `O(m +
n*log(m))`.
- When set to `False`, neighbors are not sorted. The running time of
method `init_short_digraph` is reduced to `O(n + m)`, but the running
time for searching in the list of neighbors increases to `O(m)`.

So this new parameter is particularly useful for linear time algorithms
such as `lex_BFS`.



### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [ ] I have created tests covering the changes.
- [x] I have updated the documentation accordingly.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#37662
Reported by: David Coudert
Reviewer(s):
@vbraun vbraun merged commit 2d12f02 into sagemath:develop Jun 9, 2024
15 of 18 checks passed
@mkoeppe mkoeppe added this to the sage-10.4 milestone Jun 9, 2024
vbraun pushed a commit to vbraun/sage that referenced this pull request Sep 8, 2024
…rs but without the extra log complexity

    
<!-- ^ Please provide a concise and informative title. -->
<!-- ^ Don't put issue numbers in the title, do this in the PR
description below. -->
<!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method
to calculate 1 + 2". -->
<!-- v Describe your changes below in detail. -->
<!-- v Why is this change required? What problem does it solve? -->
<!-- v If this PR resolves an open issue, please link to it here. For
example, "Fixes sagemath#12345". -->

This PR improve the `init_short_digraph` function that is used to
initialize `StaticSparseCGraph` (used for immutable `Graph` and
`DiGraph`).

Before, a boolean parameter `sort_neighbors` was used to specify if we
wanted to sort the neighbors or not. It implied an extra `log` in the
complexity (as `qsort` was called).
With this PR, the neighbors are always sorted at no extra cost. It is
done by appending to the neighbors list the vertices in the correct
order so the call to `qsort` is not needed anymore.

This PR partly reverts sagemath#38198 and mostly reverts sagemath#37662

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#38427
Reported by: cyrilbouvier
Reviewer(s): cyrilbouvier, David Coudert
vbraun pushed a commit to vbraun/sage that referenced this pull request Sep 19, 2024
…rs but without the extra log complexity (2nd try)

    
<!-- ^ Please provide a concise and informative title. -->
<!-- ^ Don't put issue numbers in the title, do this in the PR
description below. -->
<!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method
to calculate 1 + 2". -->
<!-- v Describe your changes below in detail. -->
<!-- v Why is this change required? What problem does it solve? -->
<!-- v If this PR resolves an open issue, please link to it here. For
example, "Fixes sagemath#12345". -->

This PR is based on the closed PR sagemath#38427 (by myself) that was breaking
too much doctests.

This PR improve the `init_short_digraph` function that is used to
initialize `StaticSparseCGraph` (used for immutable `Graph` and
`DiGraph`).

Before, a boolean parameter `sort_neighbors` was used to specify if we
wanted to sort the neighbors or not. It implied an extra `log` in the
complexity (as `qsort` was called).
With this PR, the neighbors are always sorted at no extra cost. It is
done by appending to the neighbors list the vertices in the correct
order so the call to `qsort` is not needed anymore.

This PR partly reverts sagemath#38198 and mostly reverts sagemath#37662


Contrary to PR sagemath#38427, I did not include the patch that remove the
sorting of vertices when StaticSparseGraph are initialized because it
was breaking too many doctests. Instead I added a option to disabled
said sorting and use it in my doctests to check that the new code works.

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [ ] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [ ] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#38664
Reported by: cyrilbouvier
Reviewer(s): David Coudert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lex_BFS method of Graph is not linear as written in the documentation
4 participants