Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partitioning: fix race condition on loop device allocation #6729

Merged
merged 1 commit into from
Jun 22, 2024

Conversation

alex3d
Copy link
Contributor

@alex3d alex3d commented Jun 11, 2024

Description

Find and take loop device in one shot to prevent race condition

How Has This Been Tested?

  • Build image

Checklist:

  • I have performed a self-review of my own code

@alex3d alex3d requested a review from a team as a code owner June 11, 2024 20:59
@github-actions github-actions bot added the size/small PR with less then 50 lines label Jun 11, 2024
@igorpecovnik
Copy link
Member

I know initial hack was not the best, but will this change survive building of lets say 50 images in parallel on one build host?

@alex3d
Copy link
Contributor Author

alex3d commented Jun 12, 2024

From losetup man:

The loop device setup is not an atomic operation when used with --find, and losetup does not protect this operation by any lock. The number of attempts is internally restricted to a maximum of 16. It is recommended to use for example flock1 to avoid a collision in heavily parallel use cases.

So I think this is more or less equivalent to the current solution (I was wrong that it is truly atomic).
I presume parallel builds are executed in docker containers started from different git working directories, so there is no common file to flock.
I could add a few attempts in loop to increase chances, if you think 16 re-tries is not enough.
The good thing with proposed solution is that it will allocate new loop device on demand when all are used (and it is a little bit shorter of course)

@alex3d
Copy link
Contributor Author

alex3d commented Jun 12, 2024

It looks like #6568 will be also fixed by this PR

@AaronNGray
Copy link

@igorpecovnik This patch works and should really have been in the last release : |

Copy link
Contributor

@SteeManMI SteeManMI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this has been preventing me from building on any ubuntu platform, I've tested and have been using this fix for a while. It works well.

@SteeManMI SteeManMI merged commit bbd5699 into armbian:main Jun 22, 2024
@SteeManMI
Copy link
Contributor

I realize this fix may not be perfect, but the number of people reporting that they can't build due to the broken existing code on Ubuntu hosts with snaps taking up all the pre-allocated loop devices is a big problem. This fixes the basic build case. Will it have issues in the pararllel build environment? maybe, but the trade off is for many users of the build framework being able to build vs the one case of the armbian infrastructure building in parallel failing. I personally vote for the users builds working. Thus why I approved and merged this in.

@AaronNGray
Copy link

Na its a great fix, worked first time and removes a rather dodgy loop structure.

@szdayisi
Copy link

All my 11 loops are occupied by snapd, so I made some change to lib/functions/image/partitioning.sh.

    FIND_LOOP_CYCLES=1
    losetup -f      #try best to get a free loop dev,  it will create a one if all existed loop devs are busy
    while : ; do
            LOOP=$(find /dev/loop* -type b | sort -R | head -1)
            LOOP_COMPARE=$(losetup -l --noheadings --raw --output=NAME | grep $LOOP || true)
            [[ -z $LOOP_COMPARE ]] && break
            [[ $FIND_LOOP_CYCLES -gt 100 ]] && exit_with_error "Unable to find free loop device"
            FIND_LOOP_CYCLES=$(( FIND_LOOP_CYCLES + 1 ))
            sleep 0.01
    done

@igorpecovnik
Copy link
Member

On host machine try adding max_loop=xx to the kernel CMDLINE. This is mine:

GRUB_CMDLINE_LINUX_DEFAULT=" console=tty1 splash=verbose max_loop=96 i915.force_probe=* resume=UUID=bef2d749-9136-481e
-9934-d9038c6bd0b3 resume_offset=12032000"

@AaronNGray
Copy link

AaronNGray commented Jul 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/small PR with less then 50 lines
Development

Successfully merging this pull request may close these issues.

5 participants