Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFoam error when using more than one thread in Docker #497

Open
santiagoMonedero opened this issue Jul 24, 2023 · 29 comments
Open

OpenFoam error when using more than one thread in Docker #497

santiagoMonedero opened this issue Jul 24, 2023 · 29 comments
Assignees

Comments

@santiagoMonedero
Copy link
Contributor

Hi, I am trying to run the example file cli_momentumSolver_diurnal.cfg using the Dockerfile in the respository (v3.9) but it fails when using more than 1 thread. All I have done is 1) clone repository, 2) build dockerfile and 3) run the docker interactively to access WindNinja_cli.

Surprisingly it gives different errors when creating and using the image in Ubuntu 20.04 through WSL2 in windows 11, and directly using Ubuntu 18.04 without WSL

  1. Ubuntu 18.04
root@15a139c48048:/home/wind/example# WindNinja_cli cli_momentumSolver_diurnal.cfg
Run 0: Reading elevation file...
Run 0: Simulation time is 2011-Sep-23 13:30:00 MDT
Run 0: Run number 0 started with 2 threads.
Run 0: Writing OpenFOAM files...
Run 0: Converting DEM to STL format...
Run 0: Transforming surface points to output wind height...
Run 0: Generating mesh...
Run 0: Running blockMesh...
Run 0: Decomposing domain for parallel mesh calculations...
Run 0: Running moveDynamicMesh...
Run 0: Reconstructing domain...
Exception caught: Error during reconstructPar().
Exception caught: Error during reconstructPar().
  1. Ubuntu 20.04 with WSL on windows 11
root@463e5936b024:/home/wind/example# WindNinja_cli cli_momentumSolver_diurnal.cfg
Run 0: Reading elevation file...
Run 0: Simulation time is 2011-Sep-23 13:30:00 MDT
Run 0: Run number 0 started with 3 threads.
Run 0: Writing OpenFOAM files...
Run 0: Converting DEM to STL format...
Run 0: Transforming surface points to output wind height...
Run 0: Generating mesh...
Run 0: Running blockMesh...
ERROR 1: posix_spawnp() failed
Exception caught: Error during blockMesh().
Exception caught: Error during blockMesh().

PS: I know this was a known issue in a previous windninja version but just wanted to give it a try on the new release and check if there is no work around it yet. Thanks!!

@santiagoMonedero
Copy link
Contributor Author

Hi, after some digging into this I think I have a working solution. I have done it directly in the container but I guess you can add it to the Dockerfile or the source code.

  1. MPI seems to require root privilege to run. Apparently this can be done by any of the following methods:
    1.A adding --allow-run-as-root into the mpirun command. Probably here:
    const char *const papszArgv[] = { "mpiexec",

    1.B exporting OMPI_ALLOW_RUN_AS_ROOT=1 and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 which is what I did. Before doing so however run the docker interactively and execute mpirunto get the following scary message which, to be honest, I have absolutely no clue how relevant it is or it is not being inside a Docker (but it is definitely very scary):
root@1d33a1a7e569:/data# mpirun
--------------------------------------------------------------------------
mpirun has detected an attempt to run as root.

Running as root is *strongly* discouraged as any mistake (e.g., in
defining TMPDIR) or bug can result in catastrophic damage to the OS
file system, leaving your system in an unusable state.

We strongly suggest that you run mpirun as a non-root user.

You can override this protection by adding the --allow-run-as-root option
to the cmd line or by setting two environment variables in the following way:
the variable OMPI_ALLOW_RUN_AS_ROOT=1 to indicate the desire to override this
protection, and OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 to confirm the choice and
add one more layer of certainty that you want to do so.
We reiterate our advice against doing so - please proceed at your own risk.
--------------------------------------------------------------------------
  1. Once you have root privileges you can run OpenFOAM with MPI but you will get a convergence problem when trying their test cases like $FOAM_TUTORIALS/incompressible/simpleFoam/motorBike, or you will get stucked into the (moveDynamicMesh) 100% complete... in Windninja without ever getting into the domain reconstruction part. Apparently this is an MPI known issue (see Vader in a Docker Container open-mpi/ompi#4948) which can be fixed by adding the environmental variable export OMPI_MCA_btl_vader_single_copy_mechanism=none

And that's it !, we have the Docker container running with momentum conservation and 25 threads

root@d0a02ed7692b:/home/wind/example# WindNinja_cli cli_momentumSolver_diurnal.cfg
Run 0: Reading elevation file...
Run 0: Simulation time is 2011-Sep-23 13:30:00 MDT
Run 0: Run number 0 started with 25 threads.
Run 0: Writing OpenFOAM files...
Run 0: Converting DEM to STL format...
Run 0: Transforming surface points to output wind height...
Run 0: Generating mesh...
Run 0: Running blockMesh...
Run 0: Decomposing domain for parallel mesh calculations...
Run 0: Running moveDynamicMesh...
Run 0: (moveDynamicMesh) 4% complete...
Run 0: (moveDynamicMesh) 12% complete...
Run 0: (moveDynamicMesh) 20% complete...
Run 0: (moveDynamicMesh) 26% complete...
Run 0: (moveDynamicMesh) 34% complete...
Run 0: (moveDynamicMesh) 42% complete...
Run 0: (moveDynamicMesh) 48% complete...
Run 0: (moveDynamicMesh) 56% complete...
Run 0: (moveDynamicMesh) 62% complete...
Run 0: (moveDynamicMesh) 70% complete...
Run 0: (moveDynamicMesh) 78% complete...
Run 0: (moveDynamicMesh) 84% complete...
Run 0: (moveDynamicMesh) 92% complete...
Run 0: (moveDynamicMesh) 98% complete...
Run 0: Reconstructing domain...
Run 0: Refining surface cells in mesh...
Run 0: (refineMesh) 10% complete...
Run 0: (refineMesh) 99% complete...
Run 0: Renumbering mesh...
Run 0: Applying initial conditions...
Run 0: Decomposing domain for parallel flow calculations...
Run 0: Solving for the flow field...
Run 0 (solver): 2% complete
Run 0 (solver): 2% complete
Run 0 (solver): 2% complete

Notice that I am not an expert on this and this is simply a working solution I got by googling and playing a bit with the Docker.

PS: The Docker approach is important for us to run Azure HPC Batching with momentum conservation. For our local HPC I will have to do some testing because we use Singularity and it is more restrictive with privileges ... in case it fails maybe using openFoam image as base instead of Ubuntu 20.04 may be a solution (well..... just thinking out loud)

@nwagenbrenner
Copy link
Member

@santiagoMonedero Glad you got something working for now. I'm also not a Docker or MPI expert so am not sure off the top of my head if there is a better way to handle this. I'll leave this ticket open and will try to take a closer look soon. Thanks for reporting your fix here.

@bnordgren
Copy link

Meeting notes 12/5/2024:

  • Sathwik is creating a run.sh script to place inside the container which will set the above-mentioned environment variables, and then start windninja, forwarding all the command line arguments it receives.
  • Sathwik's python script will call the script he writes in the above bullet instead of directly calling windninja.

@bnordgren
Copy link

Sathwik is also going to check in the scripts he's including in his local docker builds so that others attempting to replicate the results will have the same container image he is working with.

@latwood
Copy link
Collaborator

latwood commented Dec 5, 2024

I would recommend keeping the example as small as possible. So just one or two total WindNinja simulations on one node, or one or two total WindNinja simulations per node on just two nodes. Also, it would be good to go back to keeping the NINJAFOAM directory for such cases, helps for debugging.

And heck, maybe two WindNinja simulations per node might be overkill as well. But sometimes it helps when testing the ability to run multiple simulations at once to have more than one, hence I thought maybe two.

Much appreciated :).

@bnordgren
Copy link

bnordgren commented Dec 6, 2024

Testing required before further progress is recorded:

  • Testing specified on Ticket OpenFOAM MPI in the Docker/Singularity container. #542 is complete and successful. (If anything fails there, you stand exactly zero chance of making progress on this ticket.)
  • Using only your ninja.sh script and NOT the python script, can you manually start a windninja run that specifies OpenFOAM must run in multi-threaded mode?
  • Using only your ninja.sh script and NOT the python script, can you submit a job to slurm that starts a windninja run which specifies OpenFOAM must run in multi-threaded mode?

After all three of these succeed, you may use what you learned above to adapt the python script to correctly launch windninja_cli in the slurm environment.

@bnordgren
Copy link

Meeting notes from 12/6/2024:

  • Sathwik tested the multicore/multithreaded functionality by changing his run.sh file to define the above two environment variables.
  • His test run successfully completed.
  • Sathwik has to construct the ninja.sh file and install it in the container's /usr/local/bin. Making sure it has execute permissions.
  • Sathwik has to modify the python script to run ninja.sh instead of directly calling windninja_cli.
  • We re-run the same multi-threaded/multi-core test case.

@bnordgren
Copy link

Meeting notes 12/9/2024/

On Ultron:

  • The decomposePar executable segfaults when a multi-core run is initiated. We are taking the fact that decomposePar runs when there's more than one thread, but does not attempt to run when the simulation is single-threaded as the difference.
  • In the logs, the decomposePar executable cannot load libWindNinja.so.
  • In the container, libWindNinja.so is under /root/..., indicating that it did not get installed to a system directory with the rest of the executables and libraries (e.g. /usr/local/lib/...)
  • To test, we will run a multi-core simulation as the root user (sudo singularity ....), compare with running the same simulation as a non-priveleged user (singularity...)
    • Stage a simulation directory locally on the headnode in /data (because this allows both the non-privileged and privileged users to access it.
  • When the /opt/openfoam8/etc/bashrc is run, then many paths are set with the user's home directories, which are mounted to the host. This includes LD_LIBRARY_PATH. It appears as if decomposePar succeeds when root runs it by accident, because that's the directory in which windninja was built. The problem we now need to solve is "Why did libWindNinja.so not get installed to the container with the rest of windninja.

Sathwik will be solving the problem of why libWindNinja.so did not install to the container with the rest of the wind ninja executables. We will have another meeting after this. Next meeting will be Thursday.

@bnordgren
Copy link

Just spitballing, but shouldn't there be a wmake install somewhere around:

windninja/Dockerfile

Lines 41 to 43 in a1340ee

wmake libso && \
cd utility/applyInit && \
wmake &&\

Or since it looks like there's all of two files, you could maybe add a couple of install or cp commands yourself?

@latwood
Copy link
Collaborator

latwood commented Dec 10, 2024 via email

@nwagenbrenner
Copy link
Member

@sathwikreddy56 What is the status on this? @bnordgren's suggestion is correct:

"Or since it looks like there's all of two files, you could maybe add a couple of install or cp commands yourself?"

You need to copy the custom built binaries (applyInit and libWindNinja.so) to the location of all of the other OpenFOAM binaries. This should ensure that OpenFOAM can find them at runtime. Please add this to the Dockerfile and test and report back here what you find.

Also I see there is a problem in the Dockerfile -- our custom OpenFOAM applications are being built twice, once with build_libs.sh (https://github.com/firelab/windninja/blob/master/Dockerfile#L48) and again with lines 50-58 below that.

Can you please make these changes to the Dockerfile and commit them once things are working?

@latwood
Copy link
Collaborator

latwood commented Dec 12, 2024

So I just tried out the fix that we are trying to do on the server, to my local machine, and no dice, I get all the fun missing libWindNinja.so problems that Sathwik is having on the server.

Two things that I tried.

  1. rename $FOAM_RUN/../platforms to be $FOAM_RUN/../platforms_copy, then blockMesh runs but says can't find libWindNinja.so. From there, I then did sudo cp $FOAM_RUN/../platforms_copy/linux64GccDPInt32Opt/lib/libWindNinja.so /opt/openfoam8/platforms/linux64GccDPInt32Opt/lib, and sudo cp $FOAM_RUN/../platforms_copy/linux64GccDPInt32Opt/bin/applyInit /opt/openfoam8/platforms/linux64GccDPInt32Opt/bin, and tried blockMesh again, still getting the missing libWindNinja.so problems. Then I manually looked at the files in the filesystem, and saw they went weird somehow, was treating applyInit like just a text file, and libWindNinja.so had different file permissions though it was seen as a library. sudo chmod -x libWindNinja.so, no dice.
  2. put back $FOAM_RUN/../platforms_copy to $FOAM_RUN/../platforms, but now change $FOAM_RUN/../../atw09001-8 to be $FOAM_RUN/../../atw09001-8-copy. Looks like blockMesh again says can't find libWindNinja.so. I then edited the foam paths to find $FOAM_RUN/../../atw09001-8-copy instead of $FOAM_RUN/../../atw09001-8, by doing sudo vim /opt/openfoam8/etc/bashrc, find $USER-$WM_PROJECT_VERSION on line 152 and change it to be $USER-$WM_PROJECT_VERSION-copy. Run source ~/.bashrc and blockMesh now finds libWindNinja.so again.

Note that with 2), I had originally posted that it WASN'T working, apparently I had messed up the paths. Using $FOAM_RUN/../../atw09001-8_copy caused problems because $USER-$WM_PROJECT_VERSION_copy was being interpretted as renaming the variable $WM_PROJECT_VERSION to now be a new variable $WM_PROJECT_VERSION_copy which resulted in an empty string there, rather than what was intended, to append the string _copy to the end of the result of calls to the $WM_PROJECT_VERSION variable.

Also note that my idea for 2) came from looking at /windninja/scripts/build_libs.sh, which runs a command that does similar stuff to 2), to build the BCs into a different directory: sed -i 's/$USER-$WM/$WM/g' /opt/openfoam8/etc/bashrc. It edits the exact same line, but for a different reason and with a different result, $USER-$WM_PROJECT_VERSION becomes $WM_PROJECT_VERSION.

Why 1) doesn't work, I'm not sure, maybe OpenFOAM got more strict with changes from foam 2.2.0 to foam 8? But usually just putting .dll/.so files into the appropriate place is good enough, I'll look more into it because that fix would be much easier than editing paths each time after checking out the docker image.

I also wish that OpenFOAM provided an additional variable $WM_USER or something, rather than using $USER, then fix 2) would work just by running something like export WM_USER=<new_user_name> rather than editing the /opt/openfoam8/etc/bashrc file. Messing with $USER seems a bit too risky. I guess maybe that's why /windninja/scripts/build_libs.sh drops the $USER variable from the paths? But why didn't that work for us already? Not sure.

@nwagenbrenner
Copy link
Member

nwagenbrenner commented Dec 12, 2024

I just tested on my machine and copying applyInit and libWindNinja.so to

/opt/openfoam8/platforms/linux64GccDPInt32Opt/bin

and

/opt/openfoam8/platforms/linux64GccDPInt32Opt/lib

works for me, as expected.

@sathwikreddy56
Copy link
Contributor

sathwikreddy56 commented Dec 12, 2024

Quick update on the the issue,

I have made the necessary changes to the docker which are

  1. copy the required file to systemwide available locations as mentioned in the above comment
  2. I have removed the redundant installation for open foam. where I have removed it in the docker and only using the build_libs.sh
    and building the new image to test the changes once it is done and tested I will update the docker file to the master branch.

@latwood
Copy link
Collaborator

latwood commented Dec 12, 2024

Did some more testing on my local machine, and it looks like 1) from the above comment actually works after all? But I got confused because it still throws the warnings of not able to find libNinjaSampling.so when running. But it DOES run, I got blockMesh, applyInit, decomposePar and simpleFoam to run all with such warnings still going on, though geeze decomposePar throws the warning for each and every single processor. Removing the /opt/openfoam8/platforms/linux64GccDPInt32Opt/lib/libWindNinja.so and /opt/openfoam8/platforms/linux64GccDPInt32Opt/bin/applyInit files while in such a state and it goes back to warning of not able to find libNinjaSampling.so but also breaks without running.

@masonwillman
Copy link
Contributor

I am a little confused by @sathwikreddy56's statement. From my understanding, when we containerization something, the host's environment should not affect whether it runs or not. The container itself should be separate, so when @sathwikreddy56 says "copy the required file to systemwide available locations" I feel this goes against the principles of why we containerize projects. I could be misunderstanding so feel free to correct me, but I wanted to add this as more of an outsider looking in.

@sathwikreddy56
Copy link
Contributor

@masonwillman, It seems that the containers binds the home of the host to its home in order for it to access the files in the home directory. what I meant by the systemwide available location is that the root locations such as /opt /usr /bin kinda directories in the container are self contained i.e. it is not bound from the host machine only /home is bound for some reasons. when we copy the shared object files from /home to /opt it allows any user to access them rather than a particular user who is the standard concept in packaging the containers.

@latwood
Copy link
Collaborator

latwood commented Dec 12, 2024

If I'm understanding correctly, part of the issue is changing folder names when going to the containers. So the variable $WM_PROJECT_USER_DIR defined in /opt/openfoam8/etc/bashrc (called when running source /opt/openfoam8/etc/bashrc, which on a local installation is usually placed into the user ~/.bashrc, called by running source ~/.bashrc), used to find/define $FOAM_USER_LIBBIN and $FOAM_USER_APPBIN where libWindNinja.so and applyInit are normally compiled to, gets messed up. The attempted fix is to manually place libWindNinja.so into $FOAM_LIBBIN and applyInit into $FOAM_APPBIN, as those folder names remain consistent regardless of user. This fix attempt SHOULD behave as if just putting the additional .dll/.so files alongside a final executable, leaving the container finding libWindNinja.so and applyInit regardless of changes to the $USER part of $WM_PROJECT_USER_DIR, without needing to edit $WM_PROJECT_USER_DIR in /opt/openfoam8/etc/bashrc for each and every checked out container.

Though I agree with Mason, I'm confused why the container changes $USER type directories and environment stuff. I'm still not understanding why it isn't just a copy of whatever was built, everything built and ready to run as is.

@latwood
Copy link
Collaborator

latwood commented Dec 12, 2024

So just talked to Bryce, and in the process of understanding stuff, he had me rerun the fix attempt on my local machine one more time. And I found that I was misreading the warnings I was getting, I had accidentally been running a case that was defined such that it required an additional libNinjaSampling.so, the name was so similar that I mixed up the warning messages.

Dropping that requirement from my case dropped all the warnings that I was getting. So yes fix 1) from the above comment should work great. Man, next time I need to squint harder, and be more careful not to mix up work projects when debugging/testing.

@latwood
Copy link
Collaborator

latwood commented Dec 12, 2024

If I'm understanding correctly, the reason $USER changes, is that the docker image is checked out by a given user, with the WindNinja openfoam BCs prebuilt into a root $USER directory, but the user of the docker image would naturally not be running things in the root $USER directory but in their own user $USER directory. This results in the paths being mixed up unless the user manually copies the WindNinja openfoam BCs from the root $USER directory to their own user $USER directory, and/or potentially edits the defined OpenFOAM paths for the change in $USER variable. Putting the WindNinja openfoam BCs directly into the OpenFOAM installation paths should avoid dealing with a user $USER directory.

@nwagenbrenner
Copy link
Member

@sathwikreddy56 Where are we at on this? Have you been able to modify the Dockerfile to properly install applyInit and libWindNinja.so? Have you tested it on the the HPC? Please update this ticket with details.

@sathwikreddy56
Copy link
Contributor

@nwagenbrenner I have tested the code and checked the logs now the windninja is able to find the required files in the container but the logs show no details about why reconstructPar() is failing. I still trying to debug the issueas of why multicore runs are failing

@nwagenbrenner
Copy link
Member

Bryce and I met with @sathwikreddy56 last Friday and did some more troubleshooting. Running the case by hand, we found out that the moveDynamicMesh executable was failing because mpiexec was being run as root, which is not allowed by default. Apparently the spawned process is being spawned as root. For an immediate fix we added the mpiexec option --allow-run-as-root to the CPLSpawn call in https://github.com/firelab/windninja/blob/master/src/ninja/ninjafoam.cpp#L1277. It will also need to be added to the simpleFoam call here: https://github.com/firelab/windninja/blob/master/src/ninja/ninjafoam.cpp#L1664. The container should have been rebuilt with these additions and then tested on the HPC. @masonwillman @sathwikreddy56 Please update this issue once you're able to meet and this has been tested.

@masonwillman
Copy link
Contributor

Meeting Notes 12/18/2024

  • Sathwik tested what was described in the above comment, works on his local machine, but does not work outside of that.
  • moveDynamicMesh was causing an issue, tried command mpiexec -np 8 --allow-run-as-root moveDynamicMesh -parallel. This was outputting that number of available slots was not 8.
  • mpiexec -np 8 --allow-run-as-root --oversubscribe moveDynamicMesh -parallel allows moveDynamicMesh ran without error. Oversubscribe means adding more processors than you have cores. Seems this is bad practice within an HPC cluster. Additionally, this may have ran moveDynamicMesh as only 1 processor and not 8, not what we want. Got the command to run with 4 processors, will only run with 8 on the oversubscribe.
  • Headnode may be causing issues with the processors. When ssh'ing into an individual node and running windninja seems to allocate the processors properly. This is with the command without -oversubcribe.
  • Sathwik will add comments for users wanting to use the dockerfile and containerize windninja.
  • @sathwikreddy56 I would recommend adding a comment of an action plan for what to do next now that we have addressed the issues. Stuff related to testing, running windninja with slurm, etc.

@latwood feel free to add to these items.

@latwood
Copy link
Collaborator

latwood commented Dec 18, 2024

If I'm understanding correctly:

  • Note that $(nproc) returned 12, which was more than the 8 processors mpiexec -np 8 --allow-run-as-root moveDynamicMesh -parallel was called with
  • to clarify, mpiexec -np 4 --allow-run-as-root moveDynamicMesh -parallel ran fine, we couldn't get 8 processors to run without the additional --oversubscribe argument. So it appears that the apptainer thought it had 12 processors, but it actually had somewhere between 4 and 8 processors being allocated to it
  • sounds like we think the number of processors mismatch came from running outside of slurm, where the user needs to be careful that the actual resources being asked for, are actually available at the time of checking out an apptainer

We've moved on to where Sathwik is rebuilding the docker image with the latest WindNinja code, the vtk fix as well as the slope_aspect_grid and flow_separation_grid utility scripts. Once we confirm that the new docker image works as we saw in our meeting, we will then see if the WindNinja runs go all the way to completion, then we should be ready for the slurm script tests.

@latwood
Copy link
Collaborator

latwood commented Dec 19, 2024

Sathwik got a new docker image to run to completion on the head node, with 8 threads, though it's still having problems running with slurm. The new docker image has all the latest WindNinja master branch code merged into it, as well as his latest changes to run on the server.

We're planning on meeting again tomorrow to see if we can get more headway. In the meantime, Sathwik is going to do some cleanup and testing to see if he can get it to work on slurm before our meeting. Also, seems like a good time to make a copy of the most up to date files and commands needed to get stuff to run up to this point.

@sathwikreddy56
Copy link
Contributor

I have Updated the singularity container with the updated WindNinja code from the master.

Also comming to mpi runs singularity container is running properly with 8 threads when run using singularity exec

I have updated the container with the new changes the singularity container is working fine when executed manually with singularity exec command but when I use slurm to initialize the container for a scheduled run it fails I am trying to find the reason for that. The main issue I face is that reconstruct par error No Times Selected. here is a copy of the files I used to run the singulairty container.


sims3.sbatch 

#!/bin/bash
#SBATCH --job-name=windninja_simulations
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=2G
#SBATCH --time=96:00:00
#SBATCH --output=windninja_output.log
#SBATCH --error=windninja_output.err

# Print job info for debugging
echo "Job started on $(date)" >> windninja_output.log
echo "Job ID: $SLURM_JOB_ID" >> windninja_output.log
echo "Allocated Nodes: $SLURM_NODELIST" >> windninja_output.log
echo "Total Tasks: $SLURM_NTASKS" >> windninja_output.log
echo "CPUs per Task: $SLURM_CPUS_PER_TASK" >> windninja_output.log

# Proceed with the main script if checks pass
echo "Starting Windninja runs on each clip..." >> windninja_output.log

# Debug print: Check if directory is accessible
echo "Checking if /mnt/ohpc/WN_sims is accessible..." >> windninja_output.log
ls -ld /mnt/ohpc/WN_sims >> windninja_output.log 2>&1

# Debug print: Check if the Singularity image exists
echo "Checking if Singularity image exists at /mnt/ohpc/WN_src/WN_updated.sif..." >> windninja_output.log
if [ ! -f /mnt/fsim/windninja/src/wn_latest6.sif ]; then
    echo "Singularity image not found!" >> windninja_output.log
    exit 1
fi

# Debug print: Start the parallel container launches
echo "Launching 200 tasks in parallel..." >> windninja_output.log

# Launch 200 containers in parallel using srun
#srun --ntasks=1 --cpus-per-task=4 --exclusive bash -c 
echo "Starting parallel container launches..." >> windninja_output.log
srun --ntasks=1 singularity exec -B /mnt/ohpc/WN_sims/59:/output  -B /mnt/fsim/windninja/src:/data /mnt/fsim/windninja/src/wn_latest6.sif /mnt/fsim/windninja/src/scripts/run.sh 
#for simulation in {0..200}; done
#    echo "Starting simulation $simulation..." >> windninja_output.log
#    srun --ntasks=1 singularity exec -B /mnt/ohpc/WN_sims/$simulation:/output /mnt/ohpc/WN_src/WN_updated.sif python3 /mnt/ohpc/WN_src/scripts/run.sh &
#    srun --ntasks=1 singularity exec -B /mnt/fsim/windninja/sims/$simulation:/output  -B /mnt/fsim/windninja/src:/data /mnt/fsim/windninja/src/wn_latest6.sif /mnt/fsim/windninja/src/scripts/run.sh &
#                   singularity exec -B /mnt/ohpc/WN_sims/200:/output -B /home:/home -B /mnt:/mnt /mnt/ohpc/WN_src/wn_latest1.sif /mnt/ohpc/WN_src/scripts/run.sh
#done
#wait

# Debug print: End of script
echo "All simulations completed. Job finished at $(date)" >> windninja_output.log

# Final wait to ensure all background tasks are completed
wait
```

```
`run.sh 

#!/bin/bash
export CPL_DEBUG=NINJAFOAM
source /opt/openfoam8/etc/bashrc
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
export FOAM_USER_LIBBIN=/usr/local/lib/
# /usr/local/bin/WindNinja_cli $*
#WindNinja_cli /output/dems_folder/dem0/momentum/grass/0o0deg/cli.cfg

OUTPUT_FOLDER="/output"
LOG_FILE="${OUTPUT_FOLDER}/simulation.log"
python3 /data/scripts/run_varyWnCfg3.py > "${LOG_FILE}" 2>&1

```

@nwagenbrenner
Copy link
Member

nwagenbrenner commented Dec 19, 2024 via email

@sathwikreddy56
Copy link
Contributor

Hey Natalie,

Yes that's what is happening the slurm is giving required processors but just some kind of communication with the containers I am still working on what is root cause of the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants