-
-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lex_BFS method of Graph is not linear as written in the documentation #37642
Comments
You are right, the time complexity of method I agree with the idea of adding a parameter Then we will have to check methods using |
I did proposed changes in #37662. |
Note that the same problem is also present for
|
…t_digraph` Fixes sagemath#37642. We add parameter `sort_edges` to method `init_short_digraph` of the `short_digraph` data structure defined in `src/sage/graphs/base/static_sparse_graph.pxd|pyx`. - When set to `True` (default), for each vertex, the list of neighbors is sorted by increasing vertex labels. This enables to search for a vertex in this list using binary search, and so in time `O(log(n))`, but the overall running time of method `init_short_digraph` is in `O(m + n*log(m))`. - When set to `False`, neighbors are not sorted. The running time of method `init_short_digraph` is reduced to `O(n + m)`, but the running time for searching in the list of neighbors increases to `O(m)`. So this new parameter is particularly useful for linear time algorithms such as `lex_BFS`. ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [x] I have updated the documentation accordingly. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#37662 Reported by: David Coudert Reviewer(s):
…t_digraph` Fixes sagemath#37642. We add parameter `sort_edges` to method `init_short_digraph` of the `short_digraph` data structure defined in `src/sage/graphs/base/static_sparse_graph.pxd|pyx`. - When set to `True` (default), for each vertex, the list of neighbors is sorted by increasing vertex labels. This enables to search for a vertex in this list using binary search, and so in time `O(log(n))`, but the overall running time of method `init_short_digraph` is in `O(m + n*log(m))`. - When set to `False`, neighbors are not sorted. The running time of method `init_short_digraph` is reduced to `O(n + m)`, but the running time for searching in the list of neighbors increases to `O(m)`. So this new parameter is particularly useful for linear time algorithms such as `lex_BFS`. ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [x] I have updated the documentation accordingly. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#37662 Reported by: David Coudert Reviewer(s):
…neighbors in linear time <!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> As noted in issue sagemath#37642, enumerating the neighbors of a vertex for a graph that use the sparse backend (the default in Sage) is **not** linear on the number of neighbors (there is an extra log factor). The goal of this PR is to fix this. The problem is that to enumerate the neighbors of a vertex, the current code calls iteratively _next_neighbor_unsafe (or one of its out or in variant). But this method return an int corresponding to a neighbor, so to go to the next neighbor, it was first necessary to retrieve this vertex* in the structure storing all the neighbors. As it is stored in a sorted binary tree, the cost of retrieving is O(log(#neighbors)). *(or a close one if it was deleted in the meantime) To obtain a linear complexity, I wrote a new method that do a simple tree traversal which is linear in the number of nodes in the tree (i.e., the number of neighbors). As this new method does not have a similar interface as the previous _next_neighbor_unsafe, some more rewriting was necessary to use it in other parts of the code. 1. I wrote new methods for the SparseGraph class: out_neighbors_unsafe and in_neighbors_unsafe. They overwrite the ones from the base class and rely on the new methods _neighbors_unsafe and _neighbors_BTNode_unsafe. The _neighbors_BTNode_unsafe is a low-level method enumerating the neighbors in linear time. 2. I rewrote the two methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe to use the new _neighbors_BTNode_unsafe. 3. I wrote a new method _iterator_edges method for SparseGraphBackend to overwrite the one from the base class CGraphBackend, in order to expose the low-level code to the Graph class. Question: During the writing of this PR, I notice that the methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe are only defined for the sparse backend and are not used anywhere in the code. Can I remove them ? Or should they be kept for compatibility ? ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [] I have created tests covering the changes. - [] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#38167 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert
<!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> As noted in sagemath#37642, there is a difference in the complexity of enumerating neighbors or edges between sparse graphs and dense graphs. This PR aims at making the comments about the time complexity of algorithms clearer in regard of this difference. Most of the changes I made in this PR are due to the fact that the function `init_short_digraph` have the following time complexity: - `O(n+m)` when the input graph has a sparse representation and sort_neighbors is False - `O(n+m log(m))` when the input graph has a sparse representation and sort_neighbors is True - `O(n^2)` when the input graph has a dense representation and sort_neighbors is False - `O(n^2 log(m))` when the input graph has a dense representation and sort_neighbors is True with `m` the number of edges and `n` the number of vertices of the graph. I spotted 5 additional complexity claims in the comments that I was not able to verify: 1. for strong_orientations_iterator in orientations.py 2. for yen_k_shortest_simple_paths in path_enumeration.pyx 3. for feng_k_shortest_simple_paths in path_enumeration.pyx 4. for edge_disjoint_spanning_trees in spanning_tree.pyx (there is loop over the sorted edges of the graphs, which should imply a complexity of at least `n+m log(m)` for sparse graphs and `O(n^2 log(m))` for dense graphs 5. for is_partial_cube in partial_cube.py (there is at least one enumeration of neighbors of a vertex (via calling `contracted[root]`, line 318) so `m` should appear in the complexity) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#38198 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert
…neighbors in linear time <!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> As noted in issue sagemath#37642, enumerating the neighbors of a vertex for a graph that use the sparse backend (the default in Sage) is **not** linear on the number of neighbors (there is an extra log factor). The goal of this PR is to fix this. The problem is that to enumerate the neighbors of a vertex, the current code calls iteratively _next_neighbor_unsafe (or one of its out or in variant). But this method return an int corresponding to a neighbor, so to go to the next neighbor, it was first necessary to retrieve this vertex* in the structure storing all the neighbors. As it is stored in a sorted binary tree, the cost of retrieving is O(log(#neighbors)). *(or a close one if it was deleted in the meantime) To obtain a linear complexity, I wrote a new method that do a simple tree traversal which is linear in the number of nodes in the tree (i.e., the number of neighbors). As this new method does not have a similar interface as the previous _next_neighbor_unsafe, some more rewriting was necessary to use it in other parts of the code. 1. I wrote new methods for the SparseGraph class: out_neighbors_unsafe and in_neighbors_unsafe. They overwrite the ones from the base class and rely on the new methods _neighbors_unsafe and _neighbors_BTNode_unsafe. The _neighbors_BTNode_unsafe is a low-level method enumerating the neighbors in linear time. 2. I rewrote the two methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe to use the new _neighbors_BTNode_unsafe. 3. I wrote a new method _iterator_edges method for SparseGraphBackend to overwrite the one from the base class CGraphBackend, in order to expose the low-level code to the Graph class. Question: During the writing of this PR, I notice that the methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe are only defined for the sparse backend and are not used anywhere in the code. Can I remove them ? Or should they be kept for compatibility ? ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [] I have created tests covering the changes. - [] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#38167 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert
<!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> As noted in sagemath#37642, there is a difference in the complexity of enumerating neighbors or edges between sparse graphs and dense graphs. This PR aims at making the comments about the time complexity of algorithms clearer in regard of this difference. Most of the changes I made in this PR are due to the fact that the function `init_short_digraph` have the following time complexity: - `O(n+m)` when the input graph has a sparse representation and sort_neighbors is False - `O(n+m log(m))` when the input graph has a sparse representation and sort_neighbors is True - `O(n^2)` when the input graph has a dense representation and sort_neighbors is False - `O(n^2 log(m))` when the input graph has a dense representation and sort_neighbors is True with `m` the number of edges and `n` the number of vertices of the graph. I spotted 5 additional complexity claims in the comments that I was not able to verify: 1. for strong_orientations_iterator in orientations.py 2. for yen_k_shortest_simple_paths in path_enumeration.pyx 3. for feng_k_shortest_simple_paths in path_enumeration.pyx 4. for edge_disjoint_spanning_trees in spanning_tree.pyx (there is loop over the sorted edges of the graphs, which should imply a complexity of at least `n+m log(m)` for sparse graphs and `O(n^2 log(m))` for dense graphs 5. for is_partial_cube in partial_cube.py (there is at least one enumeration of neighbors of a vertex (via calling `contracted[root]`, line 318) so `m` should appear in the complexity) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#38198 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert
…neighbors in linear time <!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> As noted in issue sagemath#37642, enumerating the neighbors of a vertex for a graph that use the sparse backend (the default in Sage) is **not** linear on the number of neighbors (there is an extra log factor). The goal of this PR is to fix this. The problem is that to enumerate the neighbors of a vertex, the current code calls iteratively _next_neighbor_unsafe (or one of its out or in variant). But this method return an int corresponding to a neighbor, so to go to the next neighbor, it was first necessary to retrieve this vertex* in the structure storing all the neighbors. As it is stored in a sorted binary tree, the cost of retrieving is O(log(#neighbors)). *(or a close one if it was deleted in the meantime) To obtain a linear complexity, I wrote a new method that do a simple tree traversal which is linear in the number of nodes in the tree (i.e., the number of neighbors). As this new method does not have a similar interface as the previous _next_neighbor_unsafe, some more rewriting was necessary to use it in other parts of the code. 1. I wrote new methods for the SparseGraph class: out_neighbors_unsafe and in_neighbors_unsafe. They overwrite the ones from the base class and rely on the new methods _neighbors_unsafe and _neighbors_BTNode_unsafe. The _neighbors_BTNode_unsafe is a low-level method enumerating the neighbors in linear time. 2. I rewrote the two methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe to use the new _neighbors_BTNode_unsafe. 3. I wrote a new method _iterator_edges method for SparseGraphBackend to overwrite the one from the base class CGraphBackend, in order to expose the low-level code to the Graph class. Question: During the writing of this PR, I notice that the methods out_neighbors_BTNode_unsafe and in_neighbors_BTNode_unsafe are only defined for the sparse backend and are not used anywhere in the code. Can I remove them ? Or should they be kept for compatibility ? ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [] I have created tests covering the changes. - [] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#38167 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert
<!-- ^ Please provide a concise and informative title. --> <!-- ^ Don't put issue numbers in the title, do this in the PR description below. --> <!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method to calculate 1 + 2". --> <!-- v Describe your changes below in detail. --> <!-- v Why is this change required? What problem does it solve? --> <!-- v If this PR resolves an open issue, please link to it here. For example, "Fixes sagemath#12345". --> As noted in sagemath#37642, there is a difference in the complexity of enumerating neighbors or edges between sparse graphs and dense graphs. This PR aims at making the comments about the time complexity of algorithms clearer in regard of this difference. Most of the changes I made in this PR are due to the fact that the function `init_short_digraph` have the following time complexity: - `O(n+m)` when the input graph has a sparse representation and sort_neighbors is False - `O(n+m log(m))` when the input graph has a sparse representation and sort_neighbors is True - `O(n^2)` when the input graph has a dense representation and sort_neighbors is False - `O(n^2 log(m))` when the input graph has a dense representation and sort_neighbors is True with `m` the number of edges and `n` the number of vertices of the graph. I spotted 5 additional complexity claims in the comments that I was not able to verify: 1. for strong_orientations_iterator in orientations.py 2. for yen_k_shortest_simple_paths in path_enumeration.pyx 3. for feng_k_shortest_simple_paths in path_enumeration.pyx 4. for edge_disjoint_spanning_trees in spanning_tree.pyx (there is loop over the sorted edges of the graphs, which should imply a complexity of at least `n+m log(m)` for sparse graphs and `O(n^2 log(m))` for dense graphs 5. for is_partial_cube in partial_cube.py (there is at least one enumeration of neighbors of a vertex (via calling `contracted[root]`, line 318) so `m` should appear in the complexity) ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> - [x] The title is concise and informative. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation and checked the documentation preview. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on. For example, --> <!-- - sagemath#12345: short description why this is a dependency --> <!-- - sagemath#34567: ... --> URL: sagemath#38198 Reported by: cyrilbouvier Reviewer(s): cyrilbouvier, David Coudert
Steps To Reproduce
Expected Behavior
A linear time algorithm as written in the documentation: https://doc.sagemath.org/html/en/reference/graphs/sage/graphs/generic_graph.html#sage.graphs.generic_graph.GenericGraph.lex_BFS
Actual Behavior
From what I understand (see below) the complexity is at least O(n+m*log(m))
Additional Information
From what I was able to understand, the problem comes from the
lex_BFS
function. In this function the input graph is converted to a short_digraph by callinginit_short_digraph
(see here). My claim is that the functioninit_short_digraph
is not linear.Note that the function
lex_BFS_fast_short_digraph
doing the actual computation is linear, the problem comes from before, when the graph is converted to the data structure expected by this function.Looking at the code of
init_short_digraph
, I found two places where the list of edges is sorted, which imply a complexity of at least m*log(m) [ I only considered the case where the function is called withedge_labelled=False
andvertex_list
is provided, as inlex_BFS
, otherwise more sorting can probably be found ]:G.edge_iterator(labels=False)
is called with the default value forsort_vertices
, which is True; this one is easy to fix, usingG.edge_iterator(labels=False,sort_vertices=False)
should be enough.has_edge
andtriangle_count
assumed that the sort is done during initialization.In the case of
lex_BFS
, these two functions are not used, so the sort could be remove.But even with these two changes:
the function does not seem to be linear. I am not able to track where the non linear part is coming from (I suspect from the edge_iterator but I was not able to track it down).
An easy fix would be to add a
sort
parameter toinit_short_digraph
to disable all sorting inside this function and use it inlex_BFS
. Does this sound like a good idea ? If yes, I could propose a PR.Environment
Checklist
The text was updated successfully, but these errors were encountered: