Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework c function pointers #693

Merged
merged 15 commits into from
Dec 16, 2022
Merged

rework c function pointers #693

merged 15 commits into from
Dec 16, 2022

Conversation

t-bltg
Copy link
Collaborator

@t-bltg t-bltg commented Dec 16, 2022

Fix #688.

Not yet ready for review ;)

@t-bltg t-bltg marked this pull request as draft December 16, 2022 12:24
gen/src/MPIgenerator.jl Outdated Show resolved Hide resolved
gen/src/MPIgenerator.jl Outdated Show resolved Hide resolved
@t-bltg t-bltg force-pushed the char branch 2 times, most recently from 048598c to a6216a4 Compare December 16, 2022 13:15
@Gnimuc
Copy link
Contributor

Gnimuc commented Dec 16, 2022

Is it still a problem after merging JuliaInterop/Clang.jl#406 ?

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

JuliaInterop/Clang.jl#406 is only for fixing debugging and printing stuff.

The question I want to ask here is: "is it possible to modify node function arguments" ?

Or do we have to modify the underlying node exprs generated (this seems unfeasible, since information is lost on the way (like name(argument)) ?

@Gnimuc
Copy link
Contributor

Gnimuc commented Dec 16, 2022

(this seems unfeasible, since information is lost on the way ...)

The only "elegant" way is to overload _get_func_arg .

@t-bltg t-bltg marked this pull request as ready for review December 16, 2022 14:19
@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

This was fun actually.
Monkey patching _get_func_arg is horrible, but it works ...

Don't merge this ;)

@Gnimuc
Copy link
Contributor

Gnimuc commented Dec 16, 2022

I'm trying to understand the root cause of #688. To me, it seems like Clang.jl doesn't handle typedef function prototype well for MPI's special usage.

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

By the way, is the output_ignorelist option from generator.toml broken ?

It doesn't filter the function definitions anymore ...

@Gnimuc
Copy link
Contributor

Gnimuc commented Dec 16, 2022

Only on the master branch? This could be related: JuliaInterop/Clang.jl#403

@giordano
Copy link
Member

Worked for me in #692

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

Only on the master branch?

Ha ok, we have to adjust the generator.toml regexes.

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

I'm trying to understand the root cause of #688. To me, it seems like Clang.jl doesn't handle typedef function prototype well for MPI's special usage.

Right, I think we'd better fix this (I'm going to try):
The prorotype in mpi.h is:

typedef int (MPI_Type_copy_attr_function)(MPI_Datatype, int, void *, void *, 
					  void *, int *);

gen/src/MPIgenerator.jl Outdated Show resolved Hide resolved
@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

I also need to add a test for #688.

Hum I still got:

julia> MPI.get_name(MPI.Datatype(Char))
""

@giordano, what is in your test.jl from #688 ?

@giordano
Copy link
Member

I also need to add a test for #688.

Revert #689 😛

@giordano
Copy link
Member

@giordano, what is in your test.jl from #688 ?

#684 (comment), which is simply extracted from

MPI.Init()
comm = MPI.COMM_WORLD
size = MPI.Comm_size(comm)
rank = MPI.Comm_rank(comm)
for T in Base.uniontypes(MPI.MPIDatatype)
# test vector input
A = ArrayType{T}([rank + 1])
synchronize()
C = MPI.Allgather(A, comm)
and before #689 that was running on Char as well. Also, MPIDatatype should be moved to the tests.

test/test_issues.jl Outdated Show resolved Hide resolved
@giordano
Copy link
Member

I can confirm the test script works with this branch of MPI.jl on Ookami with MVAPICH:

[mosgiordano@fj004 mvapich]$ srun -n 2 julia --project test.jl 
(rank, C) = (0, ['\x01', '\x02'])
(rank, C) = (1, ['\x01', '\x02'])
[mosgiordano@fj004 mvapich]$ srun -n 4 julia --project test.jl 
(rank, C) = (1, ['\x01', '\x02', '\x03', '\x04'])
(rank, C) = (2, ['\x01', '\x02', '\x03', '\x04'])
(rank, C) = (3, ['\x01', '\x02', '\x03', '\x04'])
(rank, C) = (0, ['\x01', '\x02', '\x03', '\x04'])
[mosgiordano@fj004 mvapich]$ srun -n 8 julia --project test.jl 
(rank, C) = (3, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (2, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (5, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (6, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (7, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (1, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (0, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])
(rank, C) = (4, ['\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\a', '\b'])

🚀

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

I can confirm the test script works with this branch of MPI.jl on Ookami with MVAPICH:

So

julia> MPI.get_name(MPI.Datatype(Char))
""

remains expected ?

test/common.jl Outdated Show resolved Hide resolved
@giordano
Copy link
Member

So

julia> MPI.get_name(MPI.Datatype(Char))
""

remains expected ?

Sadly yes:

julia> using MPI

julia> MPI.Init(;threadlevel=:single)
MPI.ThreadLevel(0)

julia> MPI.get_name(MPI.Datatype(Char))
""

I get the same with MPICH_jll (and MVAPICH has the MPICH ABI, so this shouldn't be surprising).

I don't know if there is a way to query the "MPI equivalent type" for Char.

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

Let's reserve that fix (if there is any) for another PR.

@giordano
Copy link
Member

I spoke too soon, I hadn't closed the julia session:

julia> using MPI

julia> MPI.Init(;threadlevel=:single)
MPI.ThreadLevel(0)

julia> MPI.Datatype(Char)
MPI.Datatype(Char): 

julia> MPI.Finalize()

[813476] signal (11.1): Segmentation fault
in expression starting at REPL[4]:1
MPIR_Call_attr_delete at /lustre/software/mvapich2/gcc11/2.3.6/lib/libmpi.so (unknown line)
MPIR_Attr_delete_list at /lustre/software/mvapich2/gcc11/2.3.6/lib/libmpi.so (unknown line)
Allocations: 644473 (Pool: 643990; Big: 483); GC: 1
Segmentation fault (core dumped)
[mosgiordano@fj-debug2 mvapich]$

As I mentioned in #688 (comment), lots of segfaults happen at MPI_Finalize-time.

Also, this PR does not solve #684, there's a load of other segmentation faults I get there, with both MVAPICH2 and Open MPI, this is just the first one I got with MVAPICH.

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

Also, this PR does not solve #684, there's a load of other segmentation faults I get there

Ok, I will remove the fix ....

EDIT: @giordano, you're too fast 😆

@t-bltg
Copy link
Collaborator Author

t-bltg commented Dec 16, 2022

By the way, buildkite should have some timeouts, there are jobs running for a week now:
https://buildkite.com/julialang/mpi-dot-jl/builds/999
https://buildkite.com/julialang/mpi-dot-jl/builds/1001

@simonbyrne
Copy link
Member

By the way, buildkite should have some timeouts, there are jobs running for a week now: https://buildkite.com/julialang/mpi-dot-jl/builds/999 https://buildkite.com/julialang/mpi-dot-jl/builds/1001

Not currently possible, unfortunately:
https://forum.buildkite.community/t/timeout-waiting-for-agent/188/4

@simonbyrne simonbyrne merged commit 4b9b05f into JuliaParallel:master Dec 16, 2022
@t-bltg t-bltg deleted the char branch December 16, 2022 22:41
@simonbyrne
Copy link
Member

Ah, sorry i merged too early.

@giordano doesn't this include the change from #688 (comment)?

@simonbyrne
Copy link
Member

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Char datatype isn't handled correctly
4 participants