Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution on self-hosted runners #191

Closed
reichhartd opened this issue Jun 15, 2021 · 12 comments · Fixed by #477
Closed

Parallel execution on self-hosted runners #191

reichhartd opened this issue Jun 15, 2021 · 12 comments · Fixed by #477

Comments

@reichhartd
Copy link

We run this action on a self-hosted ubuntu and mac machine. If the action is run dedicatedly, everything runs smoothly. However, we have several runners installed on the machine, so ruby/setup-ruby action can be run in parallel. This leads to errors because they all access the same ruby version in /opt/hostedtoolcache/

Among others we got the following:

Run ruby/setup-ruby@v1
Error: Command failed: rm -rf "/Users/runner/hostedtoolcache/Ruby/3.0.1/x64"
rm: fts_read: No such file or directory
Installing Bundler
  Using Bundler 2.2.20 from Gemfile.lock BUNDLED WITH 2.2.20
  /opt/hostedtoolcache/Ruby/3.0.1/x64/bin/gem install bundler -v 2.2.20 --no-document
  `/opt/hostedtoolcache/Ruby/3.0.1/x64/lib/ruby/gems/3.0.0/gems/bundler-2.2.20/exe/bundle` does not exist, maybe `gem pristine bundler` will fix it?
  `/opt/hostedtoolcache/Ruby/3.0.1/x64/lib/ruby/gems/3.0.0/gems/bundler-2.2.20/exe/bundler` does not exist, maybe `gem pristine bundler` will fix it?
  ERROR:  While executing gem ... (Errno::ENOENT)
      No such file or directory @ rb_sysopen - /opt/hostedtoolcache/Ruby/3.0.1/x64/lib/ruby/gems/3.0.0/specifications/bundler-2.2.20.gemspec
  Took   0.75 seconds
Error: The process '/opt/hostedtoolcache/Ruby/3.0.1/x64/bin/gem' failed with exit code 1
Run ruby/setup-ruby@v1
Error: Command failed: rm -rf "/opt/hostedtoolcache/Ruby/3.0.1/x64"
rm: cannot remove '/opt/hostedtoolcache/Ruby/3.0.1/x64/lib/ruby/gems/3.0.0/gems/bundler-2.2.20/lib/bundler/man': Directory not empty

Other setup actions like node cache the version in the sub-folder _work/_tool of the respective runner. What I got from the source code so far is that you can't change the path because of the pre-built ruby version. Since you are much more familiar with Ruby and this repository, I would be happy if you have a solution for this problem.

@eregon
Copy link
Member

eregon commented Jun 15, 2021

Does the GHA runner generally support being run in parallel on the same filesystem?
GH-hosted runners don't do that at least.

For ruby releases, the toolcache is used if possible:

setup-ruby/common.js

Lines 116 to 118 in 517a3b1

export function shouldUseToolCache(engine, version) {
return engine === 'ruby' && !isHeadVersion(version)
}

if (common.shouldUseToolCache(engine, version)) {
inToolCache = tc.find('Ruby', version)
if (inToolCache) {
rubyPrefix = inToolCache
} else {
rubyPrefix = common.getToolCacheRubyPrefix(platform, version)
}

So actually one thing that might work would be to run it once (not in parallel), and then the next execution should find it in the tool cache.
For tc.find('Ruby', version) to return a path, we need to create some marker file to say it's complete: #98 (comment)
That's already done in

common.createToolCacheCompleteFile(rubyPrefix)

So I think it should work fine if you execute the first job not in parallel (e.g., maybe you can make it part of the initial runner setup), or you manually install the Ruby version you use in the toolcache + create the x64.complete file.

We indeed cannot choose where to install the Ruby, so there is no choice there. And the current setup is already useful for self-hosted runners as it will only download the Ruby once and not multiple times.

File locking or other workarounds seem inherently fragile and complicated, so I'd rather this to be solved at runner installation time than in this action.

@eregon
Copy link
Member

eregon commented Sep 25, 2021

Closing, file locking seems too tricky and changing the path of the installed Ruby is not possible.
Workaround is run it not in parallel for the versions you use once per machine before running anything in parallel, or implement your own locking on top.

@eregon eregon closed this as completed Sep 25, 2021
@grossadamm
Copy link

ruby/setup-ruby@v1 is fabulous and I use it successfully in many workflows. However, I too wanted a self-hosted parallel execution and was willing to tradeoff the awesome support here in a very niche case.

Caveat, this is completely unsupported. If you don't understand what is happening here and cannot self-maintain, don't use it. Don't bother the maintainers of this action with this either.

If someone wants to take this and run with it, be my guest. I'm aware there are optimizations left on the table but I am comfortable with this workaround in my own use case.

@eregon if this causes problems for you, I'm happy to remove this comment. I have no desire to undermine your amazing work and I agree with your comments and opinions stated above.

      - name: Set up Homebrew
        uses: Homebrew/actions/setup-homebrew@master
      - name: Setup local ruby
        run: |
          DIRECTORY="${{ runner.tool_cache }}/ruby/3.1.3"
          if [ -d "$DIRECTORY" ]; then
            echo "$DIRECTORY does exist, skipping ruby install"
            echo "$DIRECTORY/bin" >> $GITHUB_PATH
            exit 0
          fi
          brew install ruby-install
          ruby-install --install-dir "$DIRECTORY" ruby 3.1.3
          echo "$DIRECTORY/bin" >> $GITHUB_PATH
          which gem

@eregon
Copy link
Member

eregon commented Dec 22, 2022

Right, actually we should support people building their own Rubies in the tool cache, since anyway that's necessary for architectures & runner images not available on GitHub-hosted runners. I think it already works but needs better documentation.

So yeah that's one way to solve this issue, make sure it's installed before using it, potentially can also just run setup-ruby as a single job to set the ruby version before any parallel job.

@eregon
Copy link
Member

eregon commented Mar 3, 2023

With #473 this action now gives instructions for how to make it work for self-hosted runners not matching a GitHub-hosted runner image.

But there is still the issue of parallel extraction for self-hosted runners which use an image matching a GitHub-hosted runner image.
The error in the description is likely caused by the await io.rmRF(rubyPrefix).
We can probably replace that by a check that the directory does not exist or is empty.

@eregon eregon reopened this Mar 3, 2023
@eregon
Copy link
Member

eregon commented Mar 3, 2023

We can probably replace that by a check that the directory does not exist or is empty.

That doesn't fully solve the problem, but at least it would avoid removing files which can fail other workflows which otherwise might work fine.

To really solve it I'd guess we'd need something like https://www.npmjs.com/package/proper-lockfile

@Leee-xx
Copy link

Leee-xx commented Mar 3, 2023

I've been using v1 with parallel workflows on self-hosted runners and never had an issue until today.

@eregon
Copy link
Member

eregon commented Mar 3, 2023

@Leee-xx I don't think recent changes change anything about this, so it's probably just you were lucky before with the ordering.

@Leee-xx
Copy link

Leee-xx commented Mar 4, 2023 via email

@eregon
Copy link
Member

eregon commented Mar 5, 2023

Giving it more thoughts, I think this is enough: #477
File locking doesn't seem necessary if we can assume extraction is idempotent and that people don't put random things into their toolcache/*Ruby that would cause problems.

FWIW here was a try at file locking, I suspect it could cause tricky issues in practice: master...eregon:setup-ruby:file-locking

eregon added a commit that referenced this issue Mar 5, 2023
@eregon
Copy link
Member

eregon commented Mar 5, 2023

@grossadamm
Copy link

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants