-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: properly partition networks containing one-sided gap-junction connections #1774
Conversation
bors try |
1 similar comment
bors try |
tryAlready running a review |
tryBuild failed: |
bors try |
tryBuild failed: |
bors try |
tryBuild succeeded: |
Hi @noraabiakar, |
Let's say we have a system of 2 cells and cell 0 has an incoming gap junction connection from cell 1 (but no outgoing connection). This is not supported at the moment but it won't raise any exceptions. Our connected components algorithm assumes that all gap junction connections are bidirectional and will come up with the following cells groups: {0, 1} and {1}, which is incorrect. What this PR does is change all unidirectional gap-junctions to bidirectional gap-junctions prior to calling the connected components algorithm which will then do the correct thing again, generating only one cell groups {0,1}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few suggestions to improve performance
arbor/communication/mpi.hpp
Outdated
std::vector<int> counts_internal, displs_internal; | ||
|
||
// Vector of individual vector sizes | ||
std::vector<int> internal_sizes(values.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int
is a narrowing cast from size_t
, so I would suggest using unsigned long
.
arbor/partition_load_balance.cpp
Outdated
|
||
// Generate a local gj_connection table. | ||
// The table is indexed by the index of the target gid in the gid_part of that domain. | ||
// If gid_part[domain_id] = [a, b[; local_gj_connection of gid `x` is at index `x-a`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting of the braces is wonky: should it be [a, b]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intended to show that b
in not in the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "half open range" [a, b) == [a, a+1, a+2, ..., b-1]
?
arbor/partition_load_balance.cpp
Outdated
local_gj_connection_table[gid-dom_range.first].push_back(c.peer.gid); | ||
} | ||
} | ||
// Sort the inner vectors of gj_connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change comment to give more context, e.g. sort the gj connections of each local cell
arbor/partition_load_balance.cpp
Outdated
// If gid is not in the peer connection table insert it such that | ||
// the vector remains sorted. | ||
auto it = std::lower_bound(peer_conns.begin(), peer_conns.end(), gid); | ||
if (it == peer_conns.end() || *it != gid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the potential to be really expensive both in terms of time, and in memory if there are many insertions.
How about looping over once to count how many insertions have to be made (if any), allocate a new vector if need be, then do the insertions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what you mean, but I changed the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have misunderstood what was being done here, but it looks like:
- search through a sorted vector for a value
- if the value is not found, insert it such that the vector is still sorted
There are two performance issues here:
- inserting in a vector will require
n/2
copies to move the values after the inserted value out of the way - insertion may also require reallocation of more memory, and a copy of the whole vector contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new implementation is better. The vector doesn't need to be sorted in the end.
- Check that a
gid
is available in theglobal_gj_connection_table[peer]
vector, if not, insert it into anunordered_set
. - Append the values of the
unordered_set
onto theglobal_gj_connection_table[gid]
vectors.
arbor/partition_load_balance.cpp
Outdated
auto conns = global_gj_connection_table[element]; | ||
for (const auto& peer: conns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid making a copy of global_gj_connection_table[element]
, which is a std::vector
, and looking up peer
twice in the visited
set, consider the following:
for (const auto& peer: global_gj_connection_table[element]) {
if (visited.insert(peer).second) {
q.push(peer);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was supposed to be const auto& conns = global_gj_connection_table[element]
. I will implement your suggestion anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant what I proposed: there is no need to create the const auto& cons
, though whether you do or don't is a matter of personal preference.
bors try |
tryBuild failed: |
|
||
// Append the missing peers into the global_gj_connections table | ||
for (unsigned i = 0; i < global_gj_connection_table.size(); ++i) { | ||
std::move(missing_peers[i].begin(), missing_peers[i].end(), std::back_inserter(global_gj_connection_table[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the entries in global_gj_connection_table[i]
supposed to be sorted?
The loop above assumes that they are, which enables std::binary_search/std::lower_bound
, however back_inserter
to add missing values is not guaranteed to maintain sorted values.
The current
partition_load_balance
function assumes that gap-junction connections are always double-sided (i.e. if gidx
has a gap-junction connection from peer gidy
, then gidy
must also have a gap-junction connection from peer gidx
). This used to be a requirement that was checked prior to #1682. Since then, single-sided gap-junctions are in principle allowed, butpartition_load_balance
still operates under the bidirectional gap-junction connection assumption resulting in some gids being present in multiple cell-groups.This PR modifies the
partition_load_balance
function to do the following:Fixes #1767.