Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OMP critical for make_archive #708

Merged
merged 2 commits into from
Jun 17, 2022
Merged

Conversation

zoziha
Copy link
Contributor

@zoziha zoziha commented Jun 12, 2022

Fixes #707 .

Description

When building an archive, it involves generating and deleting the .rsp file. When compiling in parallel, this will cause an error to be reported. It can be solved by adding !$OMP critical.

This PR ensures that when generating the archive file, only a single thread is used, and it is set as a critical process, ensuring that this part of the code is not affected by other threads.

@zoziha
Copy link
Contributor Author

zoziha commented Jun 12, 2022

See .rsp file operation in make_archive.

It would be great if this PR could also be included in 0.6.0. (It is a tiny modification)

@zoziha zoziha added the regression Something isn't working that previously did work label Jun 12, 2022
@LKedward
Copy link
Member

Thanks for fixing this @zoziha - I agree it would be good to get this in for v0.6.0! I encounter this error sporadically but I don't entirely understand the cause. I wonder if the critical section should go in fpm_filesystem::delete_file (around the entire contents) to narrow it down - can you try this to see if it has the same affect as your fix here?

@zoziha
Copy link
Contributor Author

zoziha commented Jun 12, 2022

I wonder if the critical section should go in fpm_filesystem::delete_file (around the entire contents) to narrow it down.

Move the critical section to fpm_filesystem::delete_file, and the problem will still occur in the actual measurement.
From a process point of view, generating .resp files and deleting .resp files is a closed loop. It seems that moving the critical section from fpm_backend::build_target(first commit, eb3cd47) to fpm_compiler::make_archive(second commit, e0f1674) can narrow it down a little bit.

@awvwgk awvwgk added this to the v0.6.0 milestone Jun 12, 2022
@LKedward
Copy link
Member

Thanks for testing that out, okay yeah that's interesting. The reason I'm perplexed is that there should only be one archive object and hence only one thread that ever executes that code. So surely a critical section shouldn't have any effect? I'm now wondering if the critical section is indirectly fixing the issue by introducing a delay or synchronisation.

@zoziha
Copy link
Contributor Author

zoziha commented Jun 13, 2022

I can't really understand this issue. It stands to reason that only one archive file needs to be created in the build_target loop, and only one thread should be executing.
Tested in a limited number of examples, it seems that #707 is only occurred generally on Windows; and tested on WSL2, no critical section is added and no error is occurred. (Perhaps this could be a bug only in Windows-GFortran-Openmp?)

However, in any case, this PR ensures that when generating the archive file, only a single thread is used, and it is set as a critical process, ensuring that this part of the code is not affected by other threads.

I encounter this error sporadically.

@LKedward , do you also encounter this problem on Windows?

@LKedward
Copy link
Member

@LKedward , do you also encounter this problem on Windows?

Sorry yes, I only encounter it on Windows (msys2). So I think you're right that it might be a bug with the Windows filesystem or with mingw64.

@awvwgk - how important is it that we delete the response file, could we simply leave it in place to avoid this error?

@awvwgk
Copy link
Member

awvwgk commented Jun 13, 2022

how important is it that we delete the response file, could we simply leave it in place to avoid this error?

Preferably we can remove temporary files we create, especially if they are not part of the overall build state (like object files or the digests). But if it is easier, we can just leave it around.

@zoziha
Copy link
Contributor Author

zoziha commented Jun 15, 2022

During the use of these days, this PR has indeed avoided the occurrence of #707, and similar errors have not occurred again on my computer, but indeed this may be a bug related to Msys2-gfortran-openmp, the specific reason is unknown.

There is often only one archive file that needs to be created at present, and it is acceptable to set a critical section for it, which makes opening the .resp file and deleting the .resp file form a whole. I don't really understand the actual effect of this PR :\

Since it's a tiny modification, we can improve it in the future when the real reason is found, also because it's small enough to let it pass now if we can.

Copy link
Member

@LKedward LKedward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both. Okay good point, in that case I'm happy to move forward with this. Can you add a comment to the function to explain the need for the critical section and perhaps reference the Github issue number?

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Something isn't working that previously did work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fpm fails to compile projects in parallel: Fortran runtime error: File cannot be deleted
3 participants