Skip to content

Lock package directories with lock-files (fixes #957) #1116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

emmabastas
Copy link

@emmabastas emmabastas commented Mar 25, 2025

Synopsis

This pull request fixes #957 using a very simple lock-file approach where the presence of a special .fpm-package-lock indicates that an fpm process is currently working on the package directory and that subsequent fpm processes should wait for the lock-file to be removed.

Checklist

  • There are unit-tests for the new module
  • There is documentation for the new module
  • All tests are passing on Linux+gfort and Windows+Mysys2+gfort, have not tested other setups.

Issues

  • I think it would be nice with some sort of integration-testing for this feature, but I haven't added that because I was unsure if it would be appropriate or not (as I understand there's only a unittesting setup right for fpm right now)
  • I wrote C functions that pass strings to Fortran and that was very painful for me to implement, and I cannot claim to understand what the precise differences between character(:), allocatable, character(len=1), pointer and type(c_ptr) are..

Implementation notes

The mere presence of .fpm-package-lock is enough for indicate a lock. This means that there are scenarios where a lock-file exists but there is no way of telling if the process that created it is still going or if it crashed. For this reason, if an fpm process sees that a lock-file exists already it prints an informative message to the user instructing them how to manually remove the lock if need be. (this is how git does it)

I went down the path of implementing a more "sophisticated" locking mechanism (see the https://github.com/emmabastas/fpm/tree/concurrent-invocations for my struggles x)) but I ended up concluding that it wouldn't be worth implementing. here's what I wrote in a comment in src/fpm_lock.f90 about that:

A common attempt at improving this situation might be to write the PID +
process start time into the lock-file, that way other processes can verify
that the lock-file wasn't left behind on accident. However this adds quite a
bit of complexity to the code, and it is very difficult / maybe even
impossible to do without race-conditions: There is no atomic way to tell the
OS to:

Open this file for reading with a shared lock in case it exists and
if it doesn't exists then create the file with for writing with an
exclusive lock.

Even if we we're able to do it there is a conceptual flaw: Distributed file
systems. If the package lives on another machine that you're accessing through
something like NFS then you'll end up writing your PID to a file living on
another machine that's not running the process, and so processes on that
machine can't know whether the lock is valid or not.

Now we might turn to actual OS file-locking primitives such as fcntl on UNIX
and LockFile on Windows. This is again a step-up in complexity, and I don't
know about LockFile but fcntl is fraught with problems:
https://chris.improbable.org/2010/12/16/everything-you-never-wanted-to-know-about-file-locking/

My conclusion is that anything more advanced than the current implementation
might just not be worth it, but I'm happy to be proven wrong! :-)

fixes fortran-lang#957

Squashed commit of the following:

commit 998ebba0c55cdd681e8575c1cc8d951e699c1907
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Tue Mar 25 13:17:38 2025 +0100

    Remove nix flake

commit b35852a94fb3cb2444ed973b7f9c28b46964e93d
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Tue Mar 25 13:09:34 2025 +0100

    Use fpm_lock in fpm commands

commit 0f72adccb1c4bdf4c9b49b6336b550fb705fb2f0
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Tue Mar 25 11:40:27 2025 +0100

    Better unitest fixtures

commit 49dbf5e2d4ace90b98dda959f903af41b42f94ed
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Mon Mar 24 18:48:55 2025 +0100

    Fix memory issues

commit 45d8b1454d99277fad78fb26def34773d41d01b0
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Mon Mar 24 16:39:59 2025 +0100

    Fix a typo

commit 0d86b10c76b6594d7363e6b2b697b08c9eccb92d
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Mon Mar 24 16:39:31 2025 +0100

    Add implementation notes

commit b90dad7fdbd0c9c66700732c5af0f5919bb0ea64
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Mon Mar 24 16:16:14 2025 +0100

    Remove lock-files with atexit

commit 4b6fea47f3771e280f89538f5435cb6e2b510b19
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Sun Mar 23 16:20:13 2025 +0100

    Simplify logic

commit 66ff71676e3b8f5f744b146d9f6515e16c1b83fe
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Fri Mar 21 15:49:02 2025 +0100

    All tests pass on Linux and Windows!!

commit 019ffc96ee4c9b0d388271543ff295a77eebd192
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Fri Mar 21 15:42:57 2025 +0100

    Almost everthing works

commit b7e2c1af4589a427e925fb98068cecf598cb2366
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Fri Mar 21 11:54:29 2025 +0100

    Implement for Windows.

commit 588fcf7f5674dfd92b9bfd485222f71ca1e49db0
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Fri Mar 21 11:54:00 2025 +0100

    Kill processes without resorting to C.

commit 688a38d5e7076fb1f71cbd33fe828691058d3443
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 20:08:17 2025 +0100

    More documentation and comments

commit f39ef3ef5ca498920212c1470808cef6ee385032
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 18:03:10 2025 +0100

    Handle more errors properly

commit b831f652b26166e7f2c124ec9634214f0fb3b527
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 17:13:05 2025 +0100

    Add a unit test

commit fd7ffb8f5e8d76bb60545c79ac3c0a14c8dd4abd
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 17:00:27 2025 +0100

    Reap child process after kill

commit 6f12e87f7d9a92e10efc51922b3d6ba27646c85a
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 17:00:11 2025 +0100

    Change some comments

commit 757300003dacc095724885ccbedb2a5f98006b9e
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 16:59:54 2025 +0100

    Implement a process_alive function

commit 26eb598ab124c338e38751a61bf5a8c21a412a21
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 15:02:36 2025 +0100

    Fail test early if package-locking fails

commit 107dcec984d3a34b7dc340e82f189b30acdf27f2
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Wed Mar 19 14:56:32 2025 +0100

    Make it better

commit 82e9224c458ad899660ed94866c87161cde8111c
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Sat Mar 15 17:59:40 2025 +0100

    Add tests

commit e1169ca897df690309941d83872447515656234d
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Sat Mar 15 17:58:04 2025 +0100

    WIP: Pid in lock-file + error handling

commit 903ec6fc758d50944a95acdf10a7bc7237ed5834
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Sat Mar 1 18:59:32 2025 +0100

    Start work on package locking

commit 65f0905a4ed6f043eaabdb2d08bfd4f9654e161e
Author: Emma Bastås <emma.bastas@protonmail.com>
Date:   Sat Mar 1 13:42:46 2025 +0100

    TODO: Remove. nix dev enviroment
Copy link
Member

@perazz perazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR @emmabastas.
It's a nice implementation. I've put some comments, please let me know if you have questions and let's move on from here.

src/fpm_lock.f90 Outdated
private
public :: fpm_lock_acquire, fpm_lock_acquire_noblock, fpm_lock_release

logical :: has_lock = .false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @emmabastas. I have some questions regarding the implementation.
Regarding has_lock and has_atexit_handler, I believe there is no need to store them as static variables. They won't be thread-safe, so I am not sure if there are cases where this might be a problem.
Perhaps a better solution would be to always get those values by querying the lock file each time.

Copy link
Author

@emmabastas emmabastas Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, I didn't consider thread-safety in this context, nice catch.

My rationale for having these statics in the first place:

  • has_lock: This is to catch programmer errors; I don't think it makes sense to for, instance, call fpm_lock_acquire twice, or fpm_lock_release twice. It wouldn't cause problems per-se, but I think it's indicative that the programmer has done something wrong.
  • has_atexit_handler: I only want the atexit handler to be called once (and so it should only be registered once), and so this variable exists to keep track of that.

I don't think any of these variables are essential, and now that you've pointed them out, maybe they just add unnecessary complexity? has_lock as stated is simply for catching programmer errors, but I think it's pretty trivial to catch missuses by just looking at the source code.. As for has_atexit_handler, since the fpm_lock_acquire shouldn't be called more than once there isn't a need for it.

Thinking this through I think:

  • has_atexit_handler should definitely be removed without issue.
  • has_lock can also be removed without causing issues.

EDIT: Removed usage of has_lock and has_atexit_handler (1d3301d and f1f14e4)


interface
! This function is defined in `fpm_lock.c`.
subroutine c_create(path, iostat, iomsg, exists) bind(c, name='c_create')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding c sources, I would advise to not use them whenever possible. If C is only used to create or delete a file, I think we should do it from Fortran. (see zoziha/MINGW-packages#1 for a recent issue)

The reason is that fpm is distributed as a single-file .F90 source, that is used to "bootstrap" fpm (fpm builds fpm). In such version, the C functions are not available (in your case, they should be safeguarded with #ifdef FPM_BOOTSTRAP). That is OK for simple builds, but could lead to unintended behavior during bootstrapping.

So I would advise that we replace these C functions with Fortran equivalent. Please look at the fpm_filesystem and fpm_os modules, and if there is missing functionality, you can look into the Fortran stdlib implementation, or other open-source packages i.e. M_io (@urbanjost might be willing to help)

Copy link
Author

@emmabastas emmabastas Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out, I wasn't aware of this complication.

Have I understood you correctly that binding to C isn't problematic in this regard, rather, it's writing custom C, so the problematic code is c_create and c_remove, but not free and atexit?

I think c_remove can be replaced by calls to frotran's open and close (it will introduce a race condition that doesn't exist in c_remove, but that race condition is nothing more but slightly annoying to the programmer, it won't cause bugs)

c_create is more difficult though, the reason I feel that I "need" c_create is to avoid a problematic race-condition: We want to create .fpm-package-lock only if it doesn't exist. A naïve approach that can be done in Fortran is something like this:

if (.not. exists('.fpm-package-lock')) then
    open(file='.fpm-pakage-lock', ...) // has lock
else
   ... // does not have lock
end if

However, another Fortran process might create the lock-file after the not_exists check but before the open, and so now two processes think that they have a lock at the same time.

Another approach is to call open without first having a not_exists check, and then try to discern whether or not the file was created or simply opened, but I don't think this is possible (See this note)

Ways forward

  • c_remove. Remove usage of this function, pretty straightforward
  • c_create. Here I can see some alternatives
    1. Implement this in Fortran and accept the race condition.
    2. Make it so that bootstrap:ed builds don't have the lock-file feature at all.
    3. Find some clever way to implement this in Fortran without a race condition.
    4. Call the C function open directly from Fortran instead of via c_create.
      I think this is doable. Maybe we'll lose the ability to get the error code/message if something goes wrong on the C side, but that's not the end of the world. It can also be finicky to find the right Fortran integers corresponding to O_RDONLY, O_CREAT, etc.

EDIT: Removed c_remove in 2abe27d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open(file='.fpm-package-lock', ..., action='write', status='NEW', iostat=ios, iomsg=msg)
if (ios/=0) stop 'file exists or is not readable: '//trim(msg)

Should be enough for a single-call evaluation imho. So yes, I believe all the C calls in the current proposal should be replaceable with Fortran

Copy link
Author

@emmabastas emmabastas Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are totally right that will work.. Idk why I got it in my head that that that wasn't possible, anyways I will see to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a potential problem: If open(status='new') fails then we can't know if it failed because the file already existed (expected behavior, simply means we didn't get the lock) or it failed for some other reason (unexpected behavior, something is wrong). The consequence of this is that fpm might end up printing the warning message Warning: file ".fpm-lock" exists. to the user, but then the user looks and sees that .fpm-lock doesn't exist, now fpm is lying to the user!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it would definitely be nice to differentiate between the two failures. Both the error code (iostat) and message (iomsg) will be compiler dependent though. For example XLF has iostat==107. Mapping all possible compiler/error code combinations does not seem feasible. Perhaps one option would be to call inquire(filename,exist=is_old), to check if the file existed, only after we know that file creation failed?

! iomsg=iomsg)
!inquire(unit=lock_unit, exist=exists)

call c_create('.fpm-package-lock'//c_null_char, iostat, c_iomsg_ptr, exists)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like .fpm-package-lock is created in $cwd. Sometimes, fpm is executed from places that are not necessarily the project's main folder. If two projects are being built from the same path, .fpm-package-lock will crash one of them. So I suggest it be placed in the project's build folder instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking about a situation like running these two commands at the same time?

  • cd ~ && fpm build --directory ./pkg-a
  • cd ~ && fpm build --directory ./pkg-b

And that the two processes will try to create .fpm-package-lock inside ~?

If so I'm under the impression that fpm changes the working directory to pkg-a resp. pkg-b before calling cmd_build and friends. So assuming that $cwd == package root should be ok? But maybe it's better to explicit about that anyways.

Here's the change-cwd logic that I'm refering to:

fpm/app/main.f90

Lines 36 to 51 in bb76d6a

call get_working_dir(cmd_settings, working_dir)
if (allocated(working_dir)) then
! Change working directory if requested
if (len_trim(working_dir) > 0) then
call change_directory(working_dir, error)
call handle_error(error)
call get_current_directory(pwd_working, error)
call handle_error(error)
write(output_unit, '(*(a))') "fpm: Entering directory '"//pwd_working//"'"
else
pwd_working = pwd_start
end if
else
pwd_working = pwd_start
end if

call setup()

! Some other process acquires a lock to work on the package briefly.
call run('touch .fpm-package-lock && sleep 1 && rm .fpm-package-lock')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run is a synchronous command afaict.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah yes, I forgot the & at the end, fixed in 810c98c

Also it wasn't even used in the first place, although it should have been.
Also change some unittests to account for this removal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Safely handle parallel/concurrent invocations
2 participants