Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tox fails when run in parallel due to distdir #412

Closed
ssbarnea opened this issue Nov 22, 2016 · 6 comments
Closed

tox fails when run in parallel due to distdir #412

ssbarnea opened this issue Nov 22, 2016 · 6 comments
Labels
area:testenv-creation bug:normal affects many people or has quite an impact

Comments

@ssbarnea
Copy link
Member

I tried to run tox in parallel from Jenkins, with different jobs for each environment but it seems that it fails at the setup.py sdist part because it uses the same temp directory.

Initially I imagined that this was caused by pbr usage but now I suspect that's only a tox bug because it does not use a random --dist-dir folder and multiple executions overlap, ending up breaking each other.

Have a look at https://gist.github.com/ssbarnea/7829537a60592ba67567a67b946df1ea with a full log here

Before you would propose me to use detox instead please note that this CI this is not an option. I want to be able to use real pipelines in Jenkins so they would appear correctly and allow people to check the (eventually live) logs of each execution thread. If I would use tox it would not be able to do either of these.

If we would be able to use expansions when conifguring distdir or toxworkdir we would be able to workaround this concurrency limitation but as far as I know expansions are not supported or they are not documented.

@ssbarnea ssbarnea changed the title tox fails when run in parallel tox fails when run in parallel due to distdir Nov 22, 2016
@obestwalter
Copy link
Member

obestwalter commented Sep 4, 2017

Is that not a problem with the CI you are using? Shouldn't that do the job of isolating the different runs from each other? Unless I understand the problem wrong this is working perfectly on all the CI systems I know (Bamboo, Travis, Appveyor, TeamCity). So could you clarify why this is going wrong?

@obestwalter obestwalter added needs:discussion It's not quite clear if and how this should be done and removed enhancement labels Sep 4, 2017
@ssbarnea
Copy link
Member Author

ssbarnea commented Sep 5, 2017

This has nothing to do with CI, is a generic issue which you could replicate by trying to run multiple tox commands in parallel. We need to find a way to avoid these errors, even if this would mean that tox could use a semaphore (file?) to avoid parallel execution problem in the few areas that are sensitive (command execution is assumed to be ok to run in parallel, or at least is up to the user to assure that).

This was a long time ago but if I remember well both parallelization approaches are currently flawed

  • multiple tox in parallel on the same codebase (speed, no extra cloning needed) - fails because distdir is not performed inside the isolated tox target environment, they clash.
  • detox fails when you try to re-use the same target environment, which is a very good practice for avoiding the resource intensive environment creation.

Please understand that I have like 5-10 testing commands that run in parallel inside the same codebase and virtualenv, there is no problem with this. The problem is what happens with tox before it starts to run the commands, that's the part that can fail due to parallelization.

If we can find a way to address this we could speed-up building considerably.

@ssbarnea
Copy link
Member Author

@obestwalter I observed that many people are reimplementing detox in bash and running something like (simplified):

tox -e job[1234] &
wait $(jobs -p)

This is similar to detox and works as well with one big exception: if you ever try to optimize the testing by sharing the same virtualenv across multiple targets.

If you do this you lose the ability to run tox in parallel (either detox or bash) because tox will attempt to (re)create the same venv in parallel. If the venv is already created and it does not need any changes you may be lucky and run successfully.

We really need to find a way to do both: because both approaches are essential for speeding up the build process.

The irony is that they both (parallel and shared) reduce considerably the execution time like ~40-50% but we currently cannot use both of them.

Doe this explain it? How can we find a workaround for this issue? Mainly I would say that tox needs to get a lock on the virtualenv when is changing it. If the lock is present, tox will wait for it to be released. This should probably fix both detox and bash parallelization.

@obestwalter
Copy link
Member

Hi @ssbarnea, thanks for explaining this. Now I understand the problem :)

Let's mark this as a bug that can hang around until #641 is implemented.

@obestwalter obestwalter added area:testenv-creation bug:normal affects many people or has quite an impact detox and removed needs:discussion It's not quite clear if and how this should be done labels Sep 21, 2017
@awiddersheim
Copy link

awiddersheim commented Oct 5, 2017

I've actually run into a similar issue but just when running any job. Not just those with --dist-dir. It's actually a race condition so it doesn't manifest itself all of the time for me but it was enough that it became problematic.

Basically, what I was hoping to do, is setup the tox virtual environment with tox --notest. Then, in my tox configuration, I have several jobs that all reuse the same virtual environment that gets created so they should be able to just run without any kind of setup.

Here is an example Jenkinsfile snippet running each job in parallel:

parallel {
    stage('Bandit') {
        steps {
            sh 'tox -e bandit'
        }
    }
    stage('Style') {
        steps {
            sh 'tox -e style'
        }
    }
    stage('Unit Tests') {
        steps {
            sh 'tox -- --addopts="--color yes"'
        }
    }
}

Going back to what someone said earlier, yes it is possible to run each of these in their own separate world. That said, it would be nice to run these on the same worker or whatever.

The problem I ran into is that even though I pre-created the virtual environment, tox wants to delete and recreate a shared log directory here. This ends up running this py code to create the directory. Here, you now have a race condition where when one job is creating the directory another job may be deleting it and if the timing is right the job will see it being deleted and raise an exception.

One simple solution that I can see is instead of deleting the entire {toxworkdir}/log directory you just delete it's contents or some such.

@gaborbernat
Copy link
Member

This will be fixed by #849

@tox-dev tox-dev locked and limited conversation to collaborators Jan 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area:testenv-creation bug:normal affects many people or has quite an impact
Projects
None yet
Development

No branches or pull requests

4 participants