Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{bio}[fosscuda/2019b] RoseTTAFold v1.0.0 w/ Python 3.7.4 #13795

Conversation

zemu-unile
Copy link
Contributor

(created using eb --new-pr)

…orch-1.8.1.eb and patches: RoseTTAFold-1.0.0_cpu_mem_from_env.patch, RoseTTAFold-1.0.0_db_paths_from_env.patch, RoseTTAFold-1.0.0_fix_cache_directory.patch, RoseTTAFold-1.0.0_lddt_path.patch, RoseTTAFold-1.0.0_no_conda.patch, RoseTTAFold-1.0.0_use_eb_paths.patch
@zemu-unile
Copy link
Contributor Author

zemu-unile commented Aug 23, 2021

Depends on #13794 #13793 #13792 #13798

@branfosj branfosj added this to the 4.x milestone Aug 23, 2021
@branfosj branfosj added the new label Aug 23, 2021
@zemu-unile
Copy link
Contributor Author

I'm not sure if there is anything we could add as sanity check. Maybe something like python -c 'import ...
This is the slurm job script i used for testing:

#!/bin/bash
#SBATCH --job-name=RoseTTAFold
#SBATCH --output=rosettafold-job-out.%J
#SBATCH --time=48:00:00
#SBATCH --nodes=1
#SBATCH --cpus-per-task=16
#SBATCH --gres=gpu:rtx2080ti:1

module load RoseTTAFold/1.0.0-fosscuda-2019b-Python-3.7.4-PyTorch-1.8.1

export BFDPATH=/scratch/pdbs/bfd
export PDB100PATH=/scratch/pdbs/pdb100_2021Mar03
export UNIREF30PATH=/scratch/pdbs/UniRef30_2020_06
export MEM=$SLURM_MEM_PER_NODE
export CPU=$SLURM_CPUS_PER_TASK

run_pyrosetta_ver.sh input.fa $PWD

@easybuilders easybuilders deleted a comment from boegelbot Sep 4, 2021
@easybuilders easybuilders deleted a comment from boegelbot Sep 4, 2021
@easybuilders easybuilders deleted a comment from boegelbot Sep 4, 2021
@easybuilders easybuilders deleted a comment from boegelbot Sep 8, 2021
@easybuilders easybuilders deleted a comment from boegelbot Sep 8, 2021
@zemu-unile
Copy link
Contributor Author

Test report by @zemu-unile
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
login01.sc.uni-leipzig.de - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7551P 32-Core Processor, Python 3.6.8
See https://gist.github.com/791d88cbfeb3c8e7cfbe0a73cc8a62c4 for a full test report.

@SebastianAchilles
Copy link
Member

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on login1

PR test command 'EB_PR=13795 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_13795 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6934

Test results coming soon (I hope)...

- notification for comment with ID 922869452 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 9 out of 14 (1 easyconfigs in total)
cnx1 - Linux rocky linux 8.4, x86_64, Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/820decc93a153891a7a3ceae18c2477c for a full test report.

@SebastianAchilles
Copy link
Member

@boegelbot please test @ generoso

Test report by @boegelbot
FAILED
Build succeeded for 9 out of 14 (1 easyconfigs in total)
cnx1 - Linux rocky linux 8.4, x86_64, Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz (haswell), Python 3.6.8
See https://gist.github.com/820decc93a153891a7a3ceae18c2477c for a full test report.

Failed with SIGKILL in PyTorch-1.8.1-fosscuda-2019b-Python-3.7.4.eb.
https://gist.github.com/boegelbot/92e258ef2737604701192ebb01cdd6b3#file-pytorch-1-8-1-fosscuda-2019b-python-3-7-4_partial-log-L491

@SebastianAchilles
Copy link
Member

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@SebastianAchilles: Request for testing this PR well received on login1

PR test command 'EB_PR=13795 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_13795 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 6937

Test results coming soon (I hope)...

- notification for comment with ID 923717491 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 3 out of 5 (1 easyconfigs in total)
cnx2 - Linux rocky linux 8.4, x86_64, Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz (broadwell), Python 3.6.8
See https://gist.github.com/5f3580ce243719e888d996883a7225ea for a full test report.

@SebastianAchilles
Copy link
Member

Test report by @boegelbot
FAILED
Build succeeded for 3 out of 5 (1 easyconfigs in total)
cnx2 - Linux rocky linux 8.4, x86_64, Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz (broadwell), Python 3.6.8
See https://gist.github.com/5f3580ce243719e888d996883a7225ea for a full test report.

Failed because generoso has no sources for PyRosetta (no license).

@seb45tian
Copy link
Contributor

@zemu-unile Could you change the HH-Suite dependency from 3.2.0 -> 3.3.0? HH-Suite 3.2.0 falsely detects AVX2 support on our ivybridge nodes (only AVX) as their CMake check (https://github.com/soedinglab/hh-suite/blob/v3.2.0/cmake/CheckSSEFeatures.cmake) does not return an "Illegal instruction" when -mavx2 and -O2 are enabled - -O2 is set by EasyBuild by default. They fixed this in 3.3.0 and RoseTTAfold does not specify a specific HHSuite version they depend on.

@zemu-unile
Copy link
Contributor Author

One problem here is that HH-Suite 3.3.0 is in 2020 toolchains but RoseTTAFold needs 2019b due to requiring old TensorFlow 1.x. I am not sure if we should port HH-Suite 3.3.0 to 2019b. @Micket any thoughts on this?

@seb45tian
Copy link
Contributor

Good point, however I don't see any issue having an EC for 3.2.0 and 3.3.0 in 2019b. But you are right, the proper way would be to patch 3.2.0 to make it work.

@SebastianAchilles
Copy link
Member

Good point, however I don't see any issue having an EC for 3.2.0 and 3.3.0 in 2019b.

We would need to add an exception for the CI to allow multiple version in the same toolchain. But yes that is possible.
Alternatively we could also add TensorFlow 1.x to a newer toolchain with an exception. This depends on which toolchain you think will be more useful for RoseTTAFold.

@zemu-unile
Copy link
Contributor Author

The Problem with TensorFlow 1.x is that it needs an older Cuda version. I'm still hoping that RoseTTAFold will switch to TensorFlow 2.x

@SebastianAchilles
Copy link
Member

The Problem with TensorFlow 1.x is that it needs an older Cuda version. I'm still hoping that RoseTTAFold will switch to TensorFlow 2.x

Good point! Okay, then I suggest to add HH-Suite 3.3.0 to 2019b with an exception. Do you agree?

@zemu-unile
Copy link
Contributor Author

Opened a PR for HH-Suite #14191

@easybuilders easybuilders deleted a comment from boegelbot Jan 29, 2022
@easybuilders easybuilders deleted a comment from boegelbot Jan 29, 2022
Copy link
Member

@jfgrimm jfgrimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As decided in issue #16330, we have deprecated the use of True to signify a system-toolchain dependency (#16384), in favour of the more intuitive SYSTEM template constant. Due to the change in the test suite, please run eb --sync-pr-with-develop 13795 and update the PR to use SYSTEM instead.

@boegel
Copy link
Member

boegel commented Jan 13, 2024

closing this since fosscuda/2019b is no longer supported, see https://docs.easybuild.io/policies/toolchains

@boegel boegel closed this Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants