Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError and Points missing in VertexOnlyMesh #2178

Closed
lrtfm opened this issue Aug 14, 2021 · 37 comments · Fixed by #2773
Closed

TypeError and Points missing in VertexOnlyMesh #2178

lrtfm opened this issue Aug 14, 2021 · 37 comments · Fixed by #2773
Assignees

Comments

@lrtfm
Copy link
Contributor

lrtfm commented Aug 14, 2021

When I use the VertexOnlyMesh, there are some errors and sometimes points are missing. See the below example.

from firedrake import *

# logging.set_level(logging.DEBUG)
def test_vom(n1=8, n2=16, quadrilateral=False):

    m = UnitSquareMesh(n1, n1, quadrilateral=quadrilateral)
    m_ref = UnitSquareMesh(n2, n2, quadrilateral=quadrilateral)
    coords_ref = m_ref.coordinates.dat.data
    vm = VertexOnlyMesh(m, coords_ref)
    W = FunctionSpace(vm, "DG", 0)

    print('\tThere are %d points'%len(coords_ref))
    print('\tThere are %d dofs in space W'%W.dof_count)

print('Test with quadrilateral=False')
test_vom(n1=8, n2=16, quadrilateral=False)

print('Test with quadrilateral=True')
test_vom(n1=8, n2=16, quadrilateral=True)

print('Test with larger n1 and quadrilateral=False')
test_vom(n1=10, n2=16, quadrilateral=False)

The errors are list below:

(firedrake) firedrake@c0a3031a6d9f:~$ python test.py
firedrake:WARNING OMP_NUM_THREADS is not set or is set to a value greater than 1, we suggest setting OMP_NUM_THREADS=1 to improve performance
Test with quadrilateral=False
/home/firedrake/firedrake/src/firedrake/firedrake/parloops.py:130: LoopyWarning: 'lang_version' was not passed to make_function(). To avoid this warning, pass lang_version=(2018, 2) in this invocation. (Or say 'from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2' in the global scope of the calling frame.)
  knl = loopy.make_function(kernel_domains, instructions, kargs, seq_dependencies=True,
        There are 289 points
        There are 289 dofs in space W
Test with quadrilateral=True
/home/firedrake/firedrake/src/firedrake/firedrake/parloops.py:130: LoopyWarning: 'lang_version' was not passed to make_function(). To avoid this warning, pass lang_version=(2018, 2) in this invocation. (Or say 'from loopy.version import LOOPY_USE_LANGUAGE_VERSION_2018_2' in the global scope of the calling frame.)
  knl = loopy.make_function(kernel_domains, instructions, kargs, seq_dependencies=True,
        There are 289 points
        There are 256 dofs in space W
Test with larger n1 and quadrilateral=False
Traceback (most recent call last):
  File "test.py", line 23, in <module>
    test_vom(n1=10, n2=16, quadrilateral=False)
  File "test.py", line 10, in test_vom
    vm = VertexOnlyMesh(m, coords_ref)
  File "PETSc/Log.pyx", line 115, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "PETSc/Log.pyx", line 116, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "/home/firedrake/firedrake/src/firedrake/firedrake/mesh.py", line 2002, in VertexOnlyMesh
    dmcommon.label_pic_parent_cell_info(swarm, mesh)
  File "firedrake/cython/dmcommon.pyx", line 2710, in firedrake.cython.dmcommon.label_pic_parent_cell_info
TypeError: an integer is required

After changing the default tolerance value of locate_cell_and_reference_coordinate to 1e-6 by

import firedrake
firedrake.mesh.MeshGeometry.locate_cell_and_reference_coordinate.__defaults__ = (1e-6,)

the error gone, but some points still missing

Test with quadrilateral=False
	There are 289 points
	There are 289 dofs in space W
Test with quadrilateral=True
	There are 289 points
	There are 256 dofs in space W
Test with larger n1 and quadrilateral=False
	There are 289 points
	There are 289 dofs in space W

By the way, another question here is the VertexOnlyMesh seems only find the points locally when using it in parallel. It seems related to the parameters of DMSwarmSetPointCoordinates in PETSc.

@ReubenHill
Copy link
Contributor

ReubenHill commented Aug 17, 2021

I'm not able to recreate your error but I see the missing points you mention.

What's happening is the VertexOnlyMesh is dropping the points which it believes are "outside" the mesh m within which it is embedded.
You can see this by printing max(vm.coordinates.dat.data_ro.flatten()) and comparing it to max(coords_ref.flatten()): in the case where you lose points the missing coordinates are those at the edge of the parent mesh m.

The trouble is that this is happening silently (see #2035).

Mathematically it's of course hard to decide what should happen when a point is on a cell boundary - you have to pick one cell or another. And then when that decision is about a point at a cell boundary... well that's tricky. I'll need to have a think about what the best way of dealing with this is. Certainly dealing with #2035 is the first step so that this is less unexpected.

By the way, another question here is the VertexOnlyMesh seems only find the points locally when using it in parallel. It seems related to the parameters of DMSwarmSetPointCoordinates in PETSc.

This is expected behaviour - the points that make it into a parallel distributed VertexOnlyMesh are those which are locally within the cells of the parent mesh.
This is clearly not what you were expecting: could you let me know what you were expecting to happen?

@Ig-dolci
Copy link
Contributor

Ig-dolci commented Aug 17, 2021

On my laptop, I don't have any problem with VertexOnlyMesh. Take a look at the test:

from firedrake import *
import numpy as np
mesh = RectangleMesh(100, 200, 1.0, 2.0)
num_rec = 10
δs = np.linspace(0.4, 0.9, num_rec)
X, Y = np.meshgrid(δs*0.999,0.5*0.999)
print(X,Y)
xs = np.vstack((X.flatten(), Y.flatten())).T

point_cloud = VertexOnlyMesh(mesh, xs)

However, I had a problem running the same test at the equal Firedrake version on another computer. The error was:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    point_cloud = VertexOnlyMesh(mesh, xs)
  File "PETSc/Log.pyx", line 115, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "PETSc/Log.pyx", line 116, in petsc4py.PETSc.Log.EventDecorator.decorator.wrapped_func
  File "/home/dolci/run_firedrake/firedrake/src/firedrake/firedrake/mesh.py", line 2006, in VertexOnlyMesh
    dmcommon.label_pic_parent_cell_info(swarm, mesh)
  File "firedrake/cython/dmcommon.pyx", line 2711, in firedrake.cython.dmcommon.label_pic_parent_cell_info
TypeError: an integer is required

I ran a print at the mesh.py, inside the function locate_cell_and_reference_coordinate. I printed print(X.ctypes.data_as(ctypes.POINTER(ctypes.c_double))._arr). I observed that on the case where an error occurs, the print print(X.ctypes.data_as(ctypes.POINTER(ctypes.c_double))._arr) is [0,0] and then the run break due the error. On my laptop, the print is

[0.04 0.91]
[0.49 0.46]
[0.94 0.01]
[0.39 0.56]
[0.84 0.11]
[0.29 0.66]
[0.74 0.21]
[0.19 0.76]
[0.64 0.31]
[0.09 0.86]

and everything works. So, my question is why this error does not appears on my laptop? Maybe some relation with a library installation?

@ReubenHill
Copy link
Contributor

@lrtfm note I may be wrong - I am investigating nonetheless!

@lrtfm
Copy link
Contributor Author

lrtfm commented Aug 17, 2021

This is clearly not what you were expecting: could you let me know what you were expecting to happen?

@ReubenHill I think the points which are not located locally should be searched in other processes.

If I would like to compute the norm of two functions defined on two different meshes, I need to interpolate one function into space on another mesh. In parallel, the vertex of the first mesh may not be found locally in the second mesh.

@ReubenHill
Copy link
Contributor

ReubenHill commented Aug 17, 2021

What's happening is the VertexOnlyMesh is dropping the points which it believes are "outside" the mesh m within which it is embedded.
You can see this by printing max(vm.coordinates.dat.data_ro.flatten()) and comparing it to max(coords_ref.flatten()): in the case where you lose points the missing coordinates are those at the edge of the parent mesh m.

The trouble is that this is happening silently (see #2035).

Mathematically it's of course hard to decide what should happen when a point is on a cell boundary - you have to pick one cell or another. And then when that decision is about a point at a cell boundary... well that's tricky. I'll need to have a think about what the best way of dealing with this is. Certainly dealing with #2035 is the first step so that this is less unexpected.

To follow up on this, it seems that PETSc is dropping the out-of-mesh points when firedrake calls

swarm.setPointCoordinates(coords, redundant=False, mode=PETSc.InsertMode.INSERT_VALUES)

which is a wrapper for DMSwarmSetPointCoordinates.
For one reason or another, it's only deciding that the on-the-edge points are outside the parent mesh topology for a parent mesh made of quads.
Ideally there would be a way of passing a tolerance to this function but it doesn't currently appear to be in the API.

I suppose we could manually add such points in cases like this (maybe via an optional kwarg to VertexOnlyMesh but I maintain that these are on-mesh-boundary points so it's not unreasonable that they might be considered as being outside of the mesh. Like I said, needs some thought.

EDIT: maybe we could also figure out a way to warn a user when a point coordinate is on the parent mesh boundary too

@ReubenHill
Copy link
Contributor

This is clearly not what you were expecting: could you let me know what you were expecting to happen?

@ReubenHill I think the points which are not located locally should be searched in other processes.

If I would like to compute the norm of two functions defined on two different meshes, I need to interpolate one function into space on another mesh. In parallel, the vertex of the first mesh may not be found locally in the second mesh.

I think I understand what you're trying to get at but to make sure I don't go barking up the wrong tree do you have a MFE that demonstrates what you think is wrong?

@ReubenHill
Copy link
Contributor

Okay rereading what you've said it sounds like you're trying to use VertexOnlyMesh as a go-between for moving between two meshes. This is difficult because a VertexOnlyMesh respects the parallel distribution of its single parent mesh. If you have a second mesh which has a different parallel distribution (but where some of the points are the same) then you get a VertexOnlyMesh with a different parallel distribution.

Sound about right?

@lrtfm
Copy link
Contributor Author

lrtfm commented Aug 17, 2021

Yes, please consider the following example:

from firedrake import *

def make_fun(n):
    m = UnitSquareMesh(n, n)
    V = FunctionSpace(m, 'CG', 1)
    f = Function(V)
    x, y = SpatialCoordinate(m)
    f.interpolate(x**2 + y**2)
    
    return f

f1 = make_fun(n=8)
f2 = make_fun(n=16)

m1 = f1.ufl_domain()
m2 = f2.ufl_domain()
coords2 = m2.coordinates.dat.data
vm = VertexOnlyMesh(m1, coords2)
V = FunctionSpace(vm, "DG", 0)
f1_on_vm = Function(V)
f1_on_vm.interpolate(f1)

f1_on_m2 = Function(f2)
f1_on_m2.dat.data[:] = f1_on_vm.dat.data

PETSc.Sys.Print('L2 norm of |f1 - f2|:', errornorm(f1_on_m2, f2))

This will not work in parallel.

(firedrake) firedrake@8acd7b759496:~/workdir/coding-notes$ mpiexec -n 2 python  test-vm.py
Traceback (most recent call last):
  File "test-vm.py", line 24, in <module>
    f1_on_m2.dat.data[:] = f1_on_vm.dat.data
ValueError: could not broadcast input array from shape (117,) into shape (153,)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
Traceback (most recent call last):
  File "test-vm.py", line 24, in <module>
    f1_on_m2.dat.data[:] = f1_on_vm.dat.data
ValueError: could not broadcast input array from shape (108,) into shape (136,)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

@ReubenHill
Copy link
Contributor

@Ig-dolci I've moved your issue to #2186

@ReubenHill
Copy link
Contributor

Note to self: email PETSc users asking for the ability to specify a tolerance for mesh edges and to ask if DMSwarmSetPointCoordinates is guaranteed to pick a cell when the point is on cell boundaries.

@ReubenHill
Copy link
Contributor

@lrtfm Fundamentally what you want to do is not possible with the current VertexOnlyMesh implementation because, for now at least, these meshes respect the parallel distribution of their parent meshes (and, I think, have good reason to do so to allow interpolation from their parent mesh).

It seems the real goal here is interpolation of a function from one mesh to another. This can already be done to some extent using multigrid and supermeshing and is on my radar for the future (e.g. going from a parent mesh to, say, a mesh of disconnected lines or vertices). The general case is non-trivial, particularly in parallel, as you have discovered.

Very open to thought and ideas on how we might deal with this going forwards and please do correct me if I have misunderstood.

Regarding the issue about points missing: see previous messages about mathematical difficulty (made extra hard thanks to floating point arithmetic as @wence- pointed out to me earlier). I will, in the meantime, contact the PETSc mailing list to find out about adding a tolerance argument to DMSwarmSetPointCoordinates.

@jwallwork23
Copy link
Contributor

@ReubenHill Interpolation between meshes in parallel is something I am interested in, too. Would be good to discuss what would be required at some point.

@lrtfm
Copy link
Contributor Author

lrtfm commented Aug 19, 2021

@lrtfm Fundamentally what you want to do is not possible with the current VertexOnlyMesh implementation because, for now at least, these meshes respect the parallel distribution of their parent meshes (and, I think, have good reason to do so to allow interpolation from their parent mesh).

I don't get what you mean by "have good reason to do so". Could you explain this? Thanks.

For the interpolation in parallel, could we insert the missing points (which are not located locally) to the corresponding rank's dmswarm, and then keep track of them in some way?

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

@ReubenHill Interpolation between meshes in parallel is something I am interested in, too. Would be good to discuss what would be required at some point.

I assume that you have two meshes that cover the same domain (potentially with different parallel distributions). Let's call them source and target. You'd like to transfer a function from source to target, which you'll do by consistent interpolation (I'll pick point evaluation spaces, but the same idea works for any dual I think) [rather than L2 projection].

So on the target mesh, your target function defines a set of points at which you'd like to evaluate the source function. So we do something like:

source_mesh = Mesh(...)
target_mesh = Mesh(...)

source_function = Function(..., source_mesh)
target_function = Function(..., target_mesh)

point_locations = target_function.node_locations

vmesh = VertexOnlyMesh(source_mesh, point_locations)

source_values = interpolate(source_function, FunctionSpace(vmesh, "DG", 0))

So now we have all the correct values corresponding the target nodes, but they potentially live on the wrong process.

So updating the target_function dofs can't be done by target_function.data[:] = source_values[:] because we need to perform some parallel communication to figure out which values go where.

In fact, my line vmesh = VertexOnlyMesh(source_mesh, point_locations) doesn't work correctly right now because the DMSwarm creation doesn't migrate points to the owning process.

So in pseudo-code you need to do something like:

points_on_target = some_function_of(target_mesh)

distributor = locate_owning_process(points, source_mesh)

points_on_source = distributor.distribute(points, mode="forward")
vmesh = VertexOnlyMesh(source_mesh, points_on_source)
values_on_source = interpolate(source_function, FunctionSpace(vmesh, "DG", 0))

values_on_target = distributor.distribute(values_on_source, mode="backward")

target_function.dat.data[:] = values_on_target

The locate_owning_process functionality was something that #1389 was trying to do. So that might be somewhere to start. The idea I had was:

  1. build a kd-tree for the bounding boxes of each subdomain (this way you can identify candidate processes for each point)
  2. Try local point location (using a kd-tree for your local part of the mesh)
  3. For any points you don't find locally, send to candidates, they locate and someone votes to own the point
  4. Build an SF for the migration of the points

@ReubenHill
Copy link
Contributor

ReubenHill commented Aug 19, 2021

In fact, my line vmesh = VertexOnlyMesh(source_mesh, point_locations) doesn't work correctly right now because the DMSwarm creation doesn't migrate points to the owning process.

To be clear this line, in parallel, will place each of point_locations on the owning processes of the cells of source_mesh within which point_locations are found.
This is what

swarm.setPointCoordinates(coords, redundant=False, mode=PETSc.InsertMode.INSERT_VALUES)

does by looking at the rank-local coordinates field of source_mesh's DMPlex and filtering as necessary.

If we wrote vmesh = VertexOnlyMesh(taget_mesh, point_locations) then we would place each of point_locations on the owning processes of the cells of taget_mesh within which point_locations are found.

It therefore does work correctly given what you have asked it to do.

To get locate_owning_process functionality for point_locations in source_mesh you just need to make VertexOnlyMesh(source_mesh, point_locations), see what cells (vertices) the new mesh has, and see what rank you are on.
Similarly to get locate_owning_process functionality for point_locations in target_mesh you just need to make VertexOnlyMesh(target_mesh, point_locations), see what cells (vertices) the new mesh has, and see what rank you are on.
That should be enough information to build whatever SF you need for the migration of the points.

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

In fact, my line vmesh = VertexOnlyMesh(source_mesh, point_locations) doesn't work correctly right now because the DMSwarm creation doesn't migrate points to the owning process.

To be clear this line, in parallel, will place each of point_locations on the owning processes of the cells of source_mesh within which point_locations are found.
This is what

swarm.setPointCoordinates(coords, redundant=False, mode=PETSc.InsertMode.INSERT_VALUES)

does by looking at the rank-local coordinates field of source_mesh's DMPlex and filtering as necessary.
If we wrote vmesh = VertexOnlyMesh(taget_mesh, point_locations) then we would place each of point_locations on the owning processes of the cells of taget_mesh within which point_locations are found.

It therefore does work correctly given what you have asked it to do.

I guess I meant that you might hope that points not located locally are migrated to the correct rank.

To get locate_owning_process functionality for point_locations in source_mesh you just need to make VertexOnlyMesh(source_mesh, point_locations), see what cells (vertices) the new mesh has, and see what rank you are on.

Are you assuming that point_locations is the same array on all ranks? I thought that since points not found are dropped if I have (say)

rank1_points = [a, b, c]
rank2_points = [d, e, f]

and rank1 only locates a and b, then I have no information about where c should actually go, or do I have it wrong?

Similarly to get locate_owning_process functionality for point_locations in target_mesh you just need to make VertexOnlyMesh(target_mesh, point_locations), see what cells (vertices) the new mesh has, and see what rank you are on.
That should be enough information to build whatever SF you need for the migration of the points.

@ReubenHill
Copy link
Contributor

Are you assuming that point_locations is the same array on all ranks?

Yes I am - clearly that's the source of confusion here.
I don't currently enforce this but all my tests use the same list of points on all ranks.
I don't think I envisioned a case where you would have different points on different ranks but I can now see that this might be useful!

I guess that target_function.node_locations is, of course, rank local.
Does it include halos?

I thought that since points not found are dropped if I have (say)

rank1_points = [a, b, c]
rank2_points = [d, e, f]

and rank1 only locates a and b, then I have no information about where c should actually go, or do I have it wrong?

I'll need to check

@ReubenHill
Copy link
Contributor

ReubenHill commented Aug 19, 2021

Even if they are dropped you could get around this in a hacky way by either

  • (a) doing the necessary MPI communication to get rank1_points == rank2_points or
  • (b) doing the necessary MPI communication to get all the points on rank0_points == [rank1_points, rank2_points] and setting redundant = PETSC_TRUE in DMSwarmSetPointCoordinates

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

Are you assuming that point_locations is the same array on all ranks?

Yes I am - clearly that's the source of confusion here.
I don't currently enforce this but all my tests use the same list of points on all ranks.
I don't think I envisioned a case where you would have different points on different ranks but I can now see that this might be useful!

I guess that target_function.node_locations is, of course, rank local.
Does it include halos?

It can do, but I think that's a minor detail. We can instead just think, "each rank has cooked up some set of points it wishes to evaluate at, but doesn't actually know which rank 'owns' the point location"

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

Even if they are dropped you could get around this in a hacky way by either

  • (a) doing the necessary MPI communication to get rank1_points == rank2_points or
  • (b) doing the necessary MPI communication to get all the points on rank0_points == [rank1_points, rank2_points] and setting redundant = PETSC_TRUE` in DMSwarmSetPointCoordinates

These are a reasonable interim approach I think, though they require that we gather all points on all processes which is not scalable in memory cost if I have a constant number of points per process.

@jwallwork23
Copy link
Contributor

Really useful discussion, thanks!

@ReubenHill
Copy link
Contributor

Are you assuming that point_locations is the same array on all ranks?

Yes I am - clearly that's the source of confusion here.
I don't currently enforce this but all my tests use the same list of points on all ranks.
I don't think I envisioned a case where you would have different points on different ranks but I can now see that this might be useful!

I guess that target_function.node_locations is, of course, rank local.
Does it include halos?

I thought that since points not found are dropped if I have (say)

rank1_points = [a, b, c]
rank2_points = [d, e, f]

and rank1 only locates a and b, then I have no information about where c should actually go, or do I have it wrong?

I'll need to check

Can confirm, c is indeed lost.
Try running the below on with mpiexec -n 2 and you'll see:

from firedrake import *
from mpi4py import MPI

mesh = UnitSquareMesh(1,1)

comm = MPI.COMM_WORLD

a = [0.1, 0.1] # on rank 0
b = [0.9, 0.9] # on rank 1
c = [0.2, 0.2] # on rank 0
d = [0.8, 0.8] # on rank 1

if comm.rank == 0:
    points = [a, b] # b should go to rank 1
elif comm.rank == 1:
    points = [c, d] # c should go to rank 0

vm = VertexOnlyMesh(mesh, points)

print(comm.rank, vm.coordinates.dat.data_ro) # should get 0 [a, c] and 1 [b, d]

@ReubenHill
Copy link
Contributor

  1. build a kd-tree for the bounding boxes of each subdomain (this way you can identify candidate processes for each point)
  2. Try local point location (using a kd-tree for your local part of the mesh)
  3. For any points you don't find locally, send to candidates, they locate and someone votes to own the point
  4. Build an SF for the migration of the points

This could be done without needing to build an SF at swarm/VertexOnlyMesh creation time methinks.
We could perhaps make use of DMSwarmMigrate.
Definitely worth asking the PETSc mailing list about - this functionality might already be in there.

@ReubenHill
Copy link
Contributor

Another thought: at the moment when I call

swarm.setPointCoordinates(coords, redundant=False, mode=PETSc.InsertMode.INSERT_VALUES)

it places points everywhere in the local DMPlex, including in the ghost cells.
At the moment I just remove such points from the ghost cells

firedrake/firedrake/mesh.py

Lines 2121 to 2122 in 9e7c358

# Remove PICs which have been placed into ghost cells of a distributed DMPlex
dmcommon.remove_ghosts_pic(swarm, plex)

however, presumably I could look at which ranks the ghost cells correspond to and, before removing them, send the points in those ghost cells to the relevant ranks to be added to the local DMPlex.
I presume there's some fancy PETSc SF way of doing this (everything about SFs left my brain some time ago).
Thoughts?

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

Another thought, the vertexonly mesh places points everywhere in the local DMPlex, including in the ghost cells.
At the moment I just remove such points from the ghost cells
however, presumably I could look at which ranks the ghost cells correspond to and, before removing them, send the points in those ghost cells to the relevant ranks to be added to the local DMPlex.

This would work if the provided points are in the halo region of the local process (if not, they'll still disappear).

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

  1. build a kd-tree for the bounding boxes of each subdomain (this way you can identify candidate processes for each point)
  2. Try local point location (using a kd-tree for your local part of the mesh)
  3. For any points you don't find locally, send to candidates, they locate and someone votes to own the point
  4. Build an SF for the migration of the points

This could be done without needing to build an SF at swarm/VertexOnlyMesh creation time methinks.
We could perhaps make use of DMSwarmMigrate.
Definitely worth asking the PETSc mailing list about - this functionality might already be in there.

My reading of DMSwarmMigrate is that it will move points on the local process that have been updated (e.g. you're doing some kind of lagrangian particles method and you've advected your particles and now need them live on a new process). However, that still leaves the problem of how you put your particles in the DMSwarm to start with.

@ReubenHill
Copy link
Contributor

Another thought, the vertexonly mesh places points everywhere in the local DMPlex, including in the ghost cells.
At the moment I just remove such points from the ghost cells
however, presumably I could look at which ranks the ghost cells correspond to and, before removing them, send the points in those ghost cells to the relevant ranks to be added to the local DMPlex.

This would work if the provided points are in the halo region of the local process (if not, they'll still disappear).

Well if you first expunged any out-of-mesh points then you could do it iteratively until all points are no longer in ghost cells (i.e. keep passing them around as necessary).
Probably slower than would be ideal

@ReubenHill
Copy link
Contributor

  1. build a kd-tree for the bounding boxes of each subdomain (this way you can identify candidate processes for each point)
  2. Try local point location (using a kd-tree for your local part of the mesh)
  3. For any points you don't find locally, send to candidates, they locate and someone votes to own the point
  4. Build an SF for the migration of the points

This could be done without needing to build an SF at swarm/VertexOnlyMesh creation time methinks.
We could perhaps make use of DMSwarmMigrate.
Definitely worth asking the PETSc mailing list about - this functionality might already be in there.

My reading of DMSwarmMigrate is that it will move points on the local process that have been updated (e.g. you're doing some kind of lagrangian particles method and you've advected your particles and now need them live on a new process). However, that still leaves the problem of how you put your particles in the DMSwarm to start with.

I guess you could bypass DMSwarmSetPointCoordinates altogether and manually:

  1. Get rid of the out-of-parent-mesh points,
  2. add points on all ranks manually such that you avoid duplicate points (or remove duplicates later),
  3. make sure the points are MPI rank labelled as appropriate
  4. call DMSwarmMigrate to move all the points around

sounds more faff than it's worth and we'd be better off supplementing DMSwarmSetPointCoordinates to deal with this case.

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

Oh, you mean just try and put all your points in the swarm, and the ones that didn't make it, you send to someone else until everything is accounted for? I guess that works. The kd-tree approach is O(log N) in the number of processes and the local number of cells, which is asymptotically optimal for this setup I think.

@ReubenHill
Copy link
Contributor

Oh, you mean just try and put all your points in the swarm, and the ones that didn't make it, you send to someone else until everything is accounted for? I guess that works. The kd-tree approach is O(log N) in the number of processes and the local number of cells, which is asymptotically optimal for this setup I think.

Yep. Is this the same as the kd-tree approach?

I think your original suggestion

  1. build a kd-tree for the bounding boxes of each subdomain (this way you can identify candidate processes for each point)
  2. Try local point location (using a kd-tree for your local part of the mesh)
  3. For any points you don't find locally, send to candidates, they locate and someone votes to own the point
  4. Build an SF for the migration of the points

along with first asking the PETSc mailing list for their collective thoughts is probably the way to go though.
It might turn out @knepley has already thought about this and worked something out.

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

Oh, you mean just try and put all your points in the swarm, and the ones that didn't make it, you send to someone else until everything is accounted for? I guess that works. The kd-tree approach is O(log N) in the number of processes and the local number of cells, which is asymptotically optimal for this setup I think.

Yep. Is this the same as the kd-tree approach?

Yes, effectively you're using a tree to speed up finding candidate processes. The number of bounding boxes for the processes is N, so you can intersect a point with matching bounding boxes in logN time if you use a spatial tree lookup.

I think your original suggestion

  1. build a kd-tree for the bounding boxes of each subdomain (this way you can identify candidate processes for each point)
  2. Try local point location (using a kd-tree for your local part of the mesh)
  3. For any points you don't find locally, send to candidates, they locate and someone votes to own the point
  4. Build an SF for the migration of the points

along with first asking the PETSc mailing list for their collective thoughts is probably the way to go though.
It might turn out @knepley has already thought about this and worked something out.

I don't think anything is there in the petsc code right now.

@knepley
Copy link

knepley commented Aug 19, 2021 via email

@wence-
Copy link
Contributor

wence- commented Aug 19, 2021

I have an implementation of this right now, minus the tree among processes. Since each process only has a bounding box, rather than a more sophisticated boundary, the O(N) box lookup is still pretty trivial for 1M processes.

Where do we look for the implementation? There's some grid hashing in various places but seemingly only coded for 2D?

I think the rendezvous to determine point owners is probably similar to some of the mesh distribution, but I don't quite see it.

@ReubenHill
Copy link
Contributor

@wence- note I have sketched out ideas for dealing with the issues raised here in #2223 and #2224

@knepley
Copy link

knepley commented Sep 21, 2021 via email

@knepley
Copy link

knepley commented Sep 23, 2021

This is the preliminary implementation. It passes simple tests I wrote: https://gitlab.com/petsc/petsc/-/merge_requests/4346

ReubenHill added a commit that referenced this issue Mar 17, 2023
This introduces a global index numbering and rank labelling for DMSwarm
points. Note that we can't use the in-build DMSwarm numbering in the
field named 'DMSwarmField_pid' because there's a bug in petsc4py which
makes it inaccessible (it uses PETSC_INT64 which is not in petsc4py).

At the moment we leave halos empty to match existing behaviour and fix
the case where we were getting point duplication.

This also adds point redistribution which fixes #2178.

See code for how the algorithm works

Also includes tests
ReubenHill added a commit that referenced this issue Mar 20, 2023
This introduces a global index numbering and rank labelling for DMSwarm
points. Note that we can't use the in-build DMSwarm numbering in the
field named 'DMSwarmField_pid' because there's a bug in petsc4py which
makes it inaccessible (it uses PETSC_INT64 which is not in petsc4py).

At the moment we leave halos empty to match existing behaviour and fix
the case where we were getting point duplication.

This also adds point redistribution which fixes #2178.

See code for how the algorithm works

Also includes tests
ReubenHill added a commit that referenced this issue Mar 21, 2023
This introduces a global index numbering and rank labelling for DMSwarm
points. Note that we can't use the in-build DMSwarm numbering in the
field named 'DMSwarmField_pid' because there's a bug in petsc4py which
makes it inaccessible (it uses PETSC_INT64 which is not in petsc4py).

At the moment we leave halos empty to match existing behaviour and fix
the case where we were getting point duplication.

This also adds point redistribution which fixes #2178.

See code for how the algorithm works

Also includes tests
ReubenHill added a commit that referenced this issue Mar 21, 2023
This introduces a global index numbering and rank labelling for DMSwarm
points. Note that we can't use the in-build DMSwarm numbering in the
field named 'DMSwarmField_pid' because there's a bug in petsc4py which
makes it inaccessible (it uses PETSC_INT64 which is not in petsc4py).

At the moment we leave halos empty to match existing behaviour and fix
the case where we were getting point duplication.

This also adds point redistribution which fixes #2178.

See code for how the algorithm works

Also includes tests
@ReubenHill
Copy link
Contributor

This has now been fixed - see #2773

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants