-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trying to set ghostRank of non-locally owned index #663
Comments
Possibly related to #525 Can you visualize what the MPI partitions look like prior to the failure? Are they read from PAMELA? This error is supposed to come up only when you have very thin layers of cells in partitions, but maybe there is another pathological case we haven't thought of. The problem arises when a rank is acting as a sender for a ghosted node, because the rank that needs the node is not directly connected to the rank that owns it. |
That error message needs to indicate what rank it is. |
In my experience the element where two partitions are connected to the same element could render unexpected results if not careful. In this picture it would be the blue element with the green and purple partitions attached to it across two of the faces. Also your earlier figure with discontinuous partitions is totally normal for Metis. I am not sure if there is a way to enforce the continuity, but the code should normally not care about it.
|
@AntoineMazuyer None of these partitions should cause a problem. Are you able to locate the problem and determine what the objects are that are causing the error? i.e. where in the mesh are they, and where is the offending object. |
@AntoineMazuyer I cannot find |
Ooops I get the name wrong -_- https://github.com/GEOSX/GEOSX/tree/feature/mazuyer/integratedPAMELAtest |
Following the discussion this morning with @joshua-white Everything to reproduce it is on the original post. Even the mesh files and the xml file. I have merge with develop, problem is still here |
@AntoineMazuyer I put in some code to add the missing neighbors. However, there is another problem with the decomposition.
This is the problem we have been dodging for a while. When two ranks are separated by a single layer of elements from a third rank. It will be a little more intrusive to fix this one since it pretty much breaks the assumptions used to decompose the mesh. There is no longer any common local nodes between the two ranks. I think that we aren't going to be able to avoid this sort of thing with Metis...can we? |
@rrsettgast thanks for taking the time to debug this !To be sure I am understanding correctly the problem number 1, if I sum up, we are in this situation Where by construction, a rank 2 never share an edge with a rank 0. So the neighbor list provided my Metis doesn't include this relation ? And in GEOSX, we want to have a stricter neighbor relations defined by "node connections" and not "edge connections" ? |
@AntoineMazuyer The current problem is shown in the last image i posted. Node237 is owned by the "green" rank. It is not at all part of the "light blue" rank....except that the "dark blue" rank sends it to the "light blue" rank as a ghost. So "light blue" and "green" share nothing (not even the node in question) in their original discretization, and the ghosting algorithm fails. This is because the ghosting algorithm works based of shared nodes...if there are no shared nodes, then there is no ghosting that occurs. To fix this we would have to add a section after the current ghosting algorithm to alter the send/receive lists s.t. the ownership of something like Node237 is set correctly. As it stands (with the hack mentioned above), if we disabled the check, the problem will run, but any synchronization of something like Node237 would require 2 calls to sync. One to get the correct value from "green" to "dark blue", then one from "dark blue" to "light blue". Does this make sense?? |
If we decide rank0 and rank2 are neighbors because our simulation routines work like that, then yes. I don't know if it is possible to enforce that in METIS, but I can have a look
Yes I understand... Is it a big problem to do 2 calls to sync ? |
I would pose the question like this. If two partitions share an object (i.e. a node), are they neighbors?
It is not good to do such a thing. However there are no variables kept at the node in flow calculations, so I think you wouldn't have to do anything....only when you need to have up to date variables at the nodes. |
Describe the bug
I reproduced the staircase_3d test with a full tet mesh. I am trying to integrate it to the integrated tests. I ran it with 1, 2, 3 MPI cores without issue. But when using 4 MPI cores, I have this error message :
To Reproduce
Steps to reproduce the behavior:
feature/mazuyer/integratedPAMELAtest
src/coreComponents/physicsSolvers/fluidFlow/integratedTests/singlePhaseFlow
mpirun -np 4 geosx -i staircase_3d_tet.xml
Expected behavior
It shoulds run !
Screenshots
Expected result
The text was updated successfully, but these errors were encountered: